zero to Hadoop in 15 minutes

11 Sep

zero to Hadoop in 15 minutes

in analytics, big data, Hadoop

In this post, I will show you how to setup a single node Hadoop on your machine and copy a file to the Hadoop FS.
In the next tutorial we will run an actual java program using Hadoop.

You can download Hadoop from one of its mirror sites. If you are getting Hadoop 1.0.3, you can go directly to http://apache.mesi.com.ar/hadoop/common/hadoop-1.0.3/

1. Download the hadoop-1.0.3.tar.gz (for linux pr OS X)

2. Untar the file :

tar xvfz hadoop-0.20.2.tar.gz

 

3. Edit your hadoop/conf/hadoop-env.sh and set the JAVA_HOME. For me the setting was:

export JAVA_HOME=/Library/Java/Home

4. Under the hadoop/conf edit the following files and the changes:

  • mapred-site.xml: Map Reduce properties go into this file

<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:54311</value>
</property>
</configuration>

  • hdfs-site.xml

<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>

  • core-site.xml

<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>[tmp-dir-for-hadoop]</value>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:54310</value>
</property>
</configuration>

 

5. Create a ssh key that Hadoop uses to ssh into the local host:

ssh-keygen -t rsa -P ""
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

6. Format the Hadoop file site. Run this from you hadoop install root. This format will not effect your file system. It creates a sandbox for Hadoop.

./bin/hadoop namenode -format

7. Now you can run Hadoop with the following script:

./bin/start-all.sh

If you get and error message : “Unable to load realm mapping info from SCDynamicStore” then add the following setting to the hadoop-env.sh for HADOOP_OPTS.

export HADOOP_OPTS="-Djava.security.krb5.realm=OX.AC.UK -Djava.security.krb5.kdc=kdc0.ox.ac.uk:kdc1.ox.ac.uk"

8. You can copy some data from your local fs to the hadoop fs:
./bin/hadoop dfs –put local_machine_path hdfs_path

And see the listing.
./bin/hadoop dfs -ls

e.g. : Create a directory called hadoop_input

./bin/hadoop dfs –mkdir hadoop_input
./bin/hadoop dfs –put local_machine_path/myinput.txt hadoop_input/myinput.txt

You can also Browse the web interface for the NameNode and the JobTracker; by default they are available at:

Good luck and keep exploring!!

copyright 2012 10jumps Llc.

copyright 2012 10jumps LLC.