Hadoop Single node cluster quick setup

Installing Java
a. apt-get update.
b. apt-get install openjdk-8-jre.
c. apt-get install openjdk-8-jdk.

Installing ssh (Secure Shell)
a. sudo apt-get -y install openssh-server
b. ssh-keygen -t rsa
c. cd .ssh <press enter>
d. cp id_rsa.pub authorized_keys
e. sudo service ssh restart
f. ssh <hostname>

Installing Hadoop
a. Make a setup directory e.g. /home/hdfs/setups.
b. Download and copy all tar files inside setups. (Link provided below)
c. Untar the hadoop tar using tar –xf hadoop-3.3.0.tar.gz
d. Update the configuration files.

Hadoop-env.sh
a. Uncomment and update the JAVA_HOME variable with current installed Java
path.
b. Update the HADOOP_LOG_DIR variable with location of log directory.
c. For e.g. log location /home/hdfs/tmp/logs

Core-site.xml
a. Update the core-site.xml as follows
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hdfs/tmp/hadoop</value>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>

Hdfs-site.xml
a. Update the hdfs-site.xml as follows
<configuration> <property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.name.dir</name> <value>file:///home/hadoop/hadoopdata/hdfs/namenode</value> </property> <property> <name>dfs.data.dir</name> <value>file:///home/hadoop/hadoopdata/hdfs/datanode</value> </property> </configuration>

Mapred-site.xml
a. Update the mapred-site.xml as follows
<configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> </configuration>

Yarn-site.xml
a. Update the yarn-site.xml as follows

<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>

Add the Java home path and Hadoop home path in /home/hdfs/.bashrc at the end of
the file.

export HADOOP_HOME=/home/hdfs/setups

export HADOOP_INSTALL=$HADOOP_HOME

export HADOOP_MAPRED_HOME=$HADOOP_HOME

export HADOOP_COMMON_HOME=$HADOOP_HOME

export HADOOP_HDFS_HOME=$HADOOP_HOME

export YARN_HOME=$HADOOP_HOME

export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native

export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin

export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64

export PATH=$PATH:$JAVA_HOME/bin

Format the namenode
a. Go to $HADOOP_HOME directory by “cd $HADOOP_HOME”
b. And then hit “bin/hadoop namenode -format”.

Start all services –
a. Be in the $HADOOP_HOME directory.
b. Enter “bin/start-all.sh”
c. This will start all the Hadoop services on your machine.
d. Hit command “jps” to check which hadoop services are running.

Check Portals
a. Open browser.
b. Enter localhost:8088 to open application manager portal.
c. Enter localhost:9000 to open namenode portal.

Here we go... the Hadoop cluster is ready on your machine...

I will be back with the Multinode Hadoop cluster and YARN Hadoop cluster setup very soon...