Choice Doesn't Matter...: Hadoop n Map-Reduce...

Hi. I am writing after a long break!! This time I am going to discuss about Hadoop. Y Hadoop??? bez need of the market is "BIG DATA"...

I will not talk much about the theory part as why Hadoop and what is Hadoop etc. i just assume that you all know the pros and cons of Hadoop. We will quickly start by installing and working with Hadoop...

Installing Hadoop as with Cloudera on Windows as Demo Version Of VM

First, download the Hadoop version of Cloudera VM(i have used with version 0.3.7 but now recently i found that this version is no longer available for use) here.

Then you will have to download any virtual machine player to run it. So, download VMWare Player at this location.

Now, since you have downloaded it with Cloudera, u needn’t do anything to configure the environment variables or JavaHome. You will need to do this if you do it using Cygwin or anything else (that will be covered in next part).

Now yr setup is ready and so just start VMWare player and provide it the extracted file of Hadoop version of Cloudera VM which u downloaded just now.

The version 0.3.7 looks like :

Assuming that u have net on windows machine, u don’t need to configure anything to run it as well on VM. So, download Eclipse on VM to run your first program.

If you have eclipse already downloaded in your Win box, u can get it on your VM as: In VM, go to Places --> Connect To Server --> Service Type --> Windows Share --> Give IP adder. --> Done

Den make a java application project in eclipse. Include the jars for Hadoop. They must be in folder /usr/lib. Include Hadoop 0.20 files and its lib files.

Now run any map reduce program. For simplicity, u can run this word count program.

If your program needs any argument as the files, then include it as Run eclipse and go to Run --> Run Configuration --> Java Application --> Arguments And include the file name as (the files are in HDFS system) hdfs://localhost/user/Cloudera/…/filename1.txt hdfs://localhost/user/Cloudera/…/filename2.txt

To run it from terminal, u will need to first make the jar file of the main class. To do it, u rt. click on the main class --> export --> as jar --> browse and give name.

Then on terminal, first make your working directory as the workspace of your eclipse and then to the location of the jar. Type the following command to run it on terminal:

hadoop jar {jar file name} {main class name} {parameters}

I hope now u must be feeling proud after running your first Map Reduce program in Cloudera’s version of Hadoop VM.

Please write if any doubts. I will cover the installing of Hadoop and running it on windows by using Cygwin in my next post. This is important because this Cloudera’s version is single node based, i.e., u can’t have more than one node. For having more than one node, u need to have Hadoop installed. Also, since it is single node based, whatever replication factor u give in configuration files, it will not bother for it and it will not replicate it more than once.

9 comments:

AnonymousSeptember 26, 2011 at 6:43 AM
hey!!! just saw it. i might b the first viewer for it. it is really cool. keep writing..
-SAM
AnonymousSeptember 26, 2011 at 11:30 PM
Cool.... let me try this....
Shiva ManjaSeptember 26, 2011 at 11:36 PM
superb man. v r trying. will tell u as soon as v hav done
SomiSeptember 26, 2011 at 11:41 PM
Hey Saurabh ... Really gud wrk... Thanks fr d info... Cheers.......!!!!
BharathSeptember 27, 2011 at 12:03 AM
Nice Post dude!! Will try it out
SiddiqSeptember 27, 2011 at 12:55 AM
good work. but i need more info..
Hangover - A State!September 27, 2011 at 6:05 AM
ya Siddiq, how can i help u?? r u struck anywhere?
AnonymousSeptember 28, 2011 at 5:12 AM
Gd.. do write regularly.. n update us...
-PC
Hangover - A State!September 28, 2011 at 5:23 AM
ya surely. soon i will write about its QL and also installing Hadoop completely with more than one node..

Choice Doesn't Matter...

Saturday, September 24, 2011

Hadoop n Map-Reduce...

9 comments:

U R Reader Number..

Blog Archive