I finish my semester in the distributed programming course at my master’s degree program. From this semester I really appreciate the time I spend working on Hadoop.
I resume my experience as a basic way to start on Hadoop. Definitely there is much to explore and much more to learn, but for a beginner I think that I accomplished my mission.
For my last project of the course I developed a word polarizing tool taking as input Facebook comments. Our professor MSc. Edgar Casasola provided the class with the comments. There were about one or two million of comments that were tokenized into words to search for positive, negative or neutral words.
For start I install on a partition in my Laptop the latest version of Ubuntu. After following a nice tutorial of how to set up Hadoop*, I develop my solution in Eclipse. After some coding I was running my first Hadoop code (YEAAAAAAH!, that was a nice feeling).
After some tune up I was ready to go to test my solution with all the dataset. unfortunately this wasn’t possible in my laptop because it’s not a hard workstation. So, it was my time to try Amazon Web Services.
This was a challenging experience. But after the tutorials provided by Amazon it turns on an easy task. Once I had my environment configured I was running my solution of Hadoop in Amazon Web Services (another nice feeling).
After all, the data wasn’t so good and it requires more cleaning, it’s not the same to test a project with a sample of a 5%, and then jump to the 100%. After this, the course was over, and now depends from me to continue this project. I have a good performance on my course and really love the task of working with Hadoop.
I expect this becomes the first experience of much more work on this powerful platform.