How do you rate the contents of this Blog

Wednesday, 28 November 2012

Getting started with Hadoop Configuration # 1

I was planning to continue my series on Business Performance & OSSM however I could not do due to various reasons (one being very hectic in my work); I am extemely sorry about that.  

Recently one of my friend was discussing with Hadoop configuration on Ubuntu in a VM environment on Windows platform and were sharing our individual experiences; which made me to write this blog.

Before getting into the details of configuration, would like to cover some of the basic terminologies.

Apache Hadoop
Apache Hadoop is an open-source framework, designed for cluster architecture built out of commodity hardware supporting data-intensive distributed applications through its 2 main kernels - MapReduce and HDFS (Hadoop Distributed File System).  Hadoop is one of the top-notch Apache project supported by community of developers written in the Java Programming language.  For more details refer the following web sites,
  • apache.org
  • wikipedia - Hadoop, MapReduce, HDFS, etc.,
Virtual Machine

A Virtual Machine, as many of you know, it is a simulation of a computer implemented by software that executes program as a physical machine. Two categories of virtualization - Hardware & Application virtualization.

Hardware virtualization enables to have multiple OS environments co-exist in the same computer however each of them are isolated completely; for example have multiple environments (Windows & Linux) existing on the same computer.   There are many VM tools that can be used; in this blog, I am referring to the usage of VMware Player.

Application virtualization, enables normal application to run inside host OS by a single process; mainly to provide Platform-independent programming environment.  For example, Java Programming language running on a separate JVM (Java Virtual Machine).

VMware's Player (VMware Player) is a freeware from VMware Inc. 

Ubuntu
Ubuntu is an operating system based on the Debian Linux distribution and distributed as free and open source software. Ubuntu is an operating system based on the Debian Linux distribution and distributed as free and open source software. Ubuntu as a platform provide multiple versions/categories of OS such as Desktop, Server, Cloud, Ubuntu for Android, Ubuntu TV etc., One of the key thing about Ubuntu is that the Ubuntu team committing to provide scheduled releases on a predictable six-month basis and every fourth release, on a two-year basis; above all providing long-term support (LTS).

The blog shall be primarily about configuration of Hadoop in the following environment,
  • VMware Player 4.0.4
  • Ubuntu 12.0.4 Desktop version
  • Hadoop 0.23 version
Note: the above configuration is recommended for self-learning purpose only and for any development/production setup, you might need to reach vmware for the appropriate products to setup a virtualized environment and similarly Ubuntu for respective products (though Ubuntu is free).  Also Hadoop is an open source however there are hadoop based distribution systems such as Cloudera etc., is to be considered for such development/production environment)

The following are the high level steps and/or pre-requisites involved for the configuration,
  1. Download VMware Player 4.0.4 from the site www.vmware.com under the Desktop & End-user computing based on your windows machine configuration (i.e., 32 bit or 64 bit)
  2. Download the Ubuntu 12.0.4 Desktop version (ubuntu-12.04.1-desktop-i386.iso) from www.ubuntu.com/download
  3. Install the VMWare Player
  4. In the windows machine, identify a Drive / partition which has the larger available / empty space (say around 60+ GB space); create an empty folder say for example : G:\UbuntuVM
(Note: the version would have been updated for the above ones as VMware player has 5.0 available right now, you may desire to download whichever that is convenient for you as well as based on availability).

Configuration of Ubuntu on VM environment

Open the VMWare Player (Windows Start -> Programs -> VMware -> VMware Player); the following screen is opened,




Click on "Create a New Virtual Machine"


If you have obtained Ubuntu in the form of CD, use the first option, else select the "Installer disc image file (iso):" option and select the already downloaded CD image file i.e., iso file (ubuntu-12.04.1-desktop-i386.iso)


Click Next


Specify the above details and Click Next; please note that this user shall be the username & password for your virtual OS login.


Specify the Virtual Machine name and the Location (say as per our example - G:\UbuntuVM i.e., an empty folder created in the drive which has 60+ GB space). Click Next


Click Next

Click Finish.
Note: In case if you want to specify any Customized Hardware specific options such as Memory, Processors, etc., you can provide them).

It might take some time, after you which you are ready with Ubuntu setup..

Further updates shall continue very soon....

No comments:

Post a Comment