Saturday, September 24, 2011

Hadoop n Map-Reduce...

Hi. I am writing after a long break!! This time I am going to discuss about Hadoop. Y Hadoop???  bez need of the market is    "BIG DATA"...

I will not talk much about the theory part as why Hadoop and what is Hadoop etc. i just assume that you all know the pros and cons of Hadoop. We will quickly start by installing and working with Hadoop...


Installing Hadoop as with Cloudera on Windows as Demo Version Of VM

  •  First, download the Hadoop version of Cloudera VM(i have used with version 0.3.7 but now recently i found that this version is no longer available for use) here.
  •  Then you will have to download any virtual machine player to run it. So, download VMWare Player at this location.
  •  Now, since you have downloaded it with Cloudera, u needn’t do anything to configure the environment variables or JavaHome. You will need to do this if you do it using Cygwin or anything else (that will be covered in next part).
  •  Now yr setup is ready and so just start VMWare player and provide it the extracted file of Hadoop version of Cloudera VM which u downloaded just now.
The version 0.3.7 looks like :


  •  Assuming that u have net on windows machine, u don’t need to configure anything to run it as well on VM. So, download Eclipse on VM to run your first program.

If you have eclipse already downloaded in your Win box, u can get it on your VM as: In VM, go to Places --> Connect To Server --> Service Type --> Windows Share --> Give IP adder. --> Done

  •  Den make a java application project in eclipse. Include the jars for Hadoop. They must be in folder /usr/lib. Include Hadoop 0.20 files and its lib files.

If your program needs any argument as the files, then include it as Run eclipse and go to Run --> Run Configuration --> Java Application --> Arguments And include the file name as (the files are in HDFS system) hdfs://localhost/user/Cloudera/…/filename1.txt hdfs://localhost/user/Cloudera/…/filename2.txt

  •  To run it from terminal, u will need to first make the jar file of the main class. To do it, u rt. click on the main class --> export --> as jar --> browse and give name.
  •  Then on terminal, first make your working directory as the workspace of your eclipse and then to the location of the jar. Type the following command to run it on terminal:
            hadoop jar {jar file name} {main class name} {parameters}

 


I hope now u must be feeling proud after running your first Map Reduce program in Cloudera’s version of Hadoop VM. 


Please write if any doubts. I will cover the installing of Hadoop and running it on windows by using Cygwin in my next post. This is important because this Cloudera’s version is single node based, i.e., u can’t have more than one node. For having more than one node, u need to have Hadoop installed. Also, since it is single node based, whatever replication factor u give in configuration files, it will not bother for it and it will not replicate it more than once.

9 comments:

  1. hey!!! just saw it. i might b the first viewer for it. it is really cool. keep writing..
    -SAM

    ReplyDelete
  2. Cool.... let me try this....

    ReplyDelete
  3. superb man. v r trying. will tell u as soon as v hav done

    ReplyDelete
  4. Hey Saurabh ... Really gud wrk... Thanks fr d info... Cheers.......!!!!

    ReplyDelete
  5. good work. but i need more info..

    ReplyDelete
  6. ya Siddiq, how can i help u?? r u struck anywhere?

    ReplyDelete
  7. Gd.. do write regularly.. n update us...
    -PC

    ReplyDelete
  8. ya surely. soon i will write about its QL and also installing Hadoop completely with more than one node..

    ReplyDelete