Big Data on a Shoestring (2 page)

Read Big Data on a Shoestring Online

Authors: Nicholas Bessmer

BOOK: Big Data on a Shoestring
5.82Mb size Format: txt, pdf, ePub

 

Select 1 instance of the MICRO – as shown in example above.
  I do not want to swallow you in technical details but you will see screens indicating your progress as follows:

 

 

You will need to store a special file called a KEY PAIR to your local computer. Save this! You will be prompted by Amazon to remember this.

 

Finally:

 

 

 

Start your
instance – your virtual server.
You can connect to your new instance by downloading these tools: You can right click on your instance name and select “CONNECT”:

 

 

In about 20 minutes you can have a LINUX server running in The Cloud!

Getting Our Tools Running on Our New Big Data Server

 

Remember that a lot of computer technology is a series of a repetitive and often cook book like steps. It is the 80/20 rule: 20% of the work is challenging and imaginative – the other 80% very predictable. But predictable can be good especially when trying to avoid techno-speak if you are for example a small business owner.

 

Let’s get our tool installed! In your command terminal fired off when you CONNECT to your new EC2 LINUX installation, use SFTP plugin:

 

 

It lets you see your local computer files and you remote EC2 instance. Now download the following
to your computer:

 

http://www.sai.msu.su/apache/hadoop/core/stable/

 

for the latest stable version of HADOOP and download CASSANDRA:

 

http://cassandra.apache.org/download/

 

“PIG” is a query language designed for Big Data. We will use this query our Big Data dataset.

 

http://www.sai.msu.su/apache/pig

 

Now copy these over to your new EC2 Linux Server:

 

 

Once the files have been copied copy and paste the following command:

 

tar
-xvf hadoop-0.20.2.tar.gz

tar
-xvf apache-cassandra-1.2.1-bin.tar.gz

tar
-xvf pig-0.10.1.tar.gz

Please also be sure to run this command in this directory by typing these commands:

 

»
       
cd pig-0.10.1 (cd changes

»
       
tar –xvf tutorial.tar (also can use utility gunzip)

 

This extracts the files which are compressed much like a ZIP file.

 

It is possible to choose MS Windows Server as your preferred EC2 server. We installed Linux here (it is cheaper to run than Windows Server)… so editing files with the VI editing tool is a bit harder to do. Lookup VI on the Internet – it is like a very powerful Windows notepad but is command line driven.

 

Getting The Linux Environment Set Up – Basic Steps

 

Type the following:

 

»
       
cd   (changes to the main directory)

»
       
vi .bash_profile (vi is the editor and you will be modifying a simple text configuration file) – please see this helpful link from University of San Diego

 

http://acms.ucsd.edu/info/vi_tutorial.html

 

»
       
copy and paste the following into your file

# .
bash_profile
# Get the aliases and functions
if [ -f ~/.bashrc ]; then
        . ~/.bashrc
fi
# User specific environment and startup programs
PATH=$PATH:$HOME/bin:/home/ec2-user/hadoop-0.20.2/bin:/home/ec2-user/pig-0.10.1/bin
sh_profile
# Get the aliases and functions
if [ -f ~/.bashrc ]; then
        . ~/.bashrc
fi
# User specific environment and startup programs
PATH=$PATH:$HOME/bin:/home/ec2-user/hadoop-0.20.2/bin:/home/ec2-user/pig-0.10.1/bin
export PATH

Editing Our Hadoop Configuration Files

 

We need to edit the following files and run these commands next following up on step #1 above of downloading the Hadoop TAR file:

 

Edit /conf/core-site.xml. I have used localhost in the value of fs.default.nam
[1]
e

 

       fs.default.name

      
hdfs://localhost:9000

 

 

Edit /
conf/mapred-site.xml.

 

         mapred.job.tracker

        
localhost:9001

 

 

Edit /
conf/hdfs-site.xml. Since this test cluster has a single node, replication factor should be set to 1.

 

dfs.replication = “1”

 

Format the name node (one per install).

 

$ bin/hadoop namenode –format

 

It should print out something like the following message:

 

12/07/15 15:54:20 INFO namenode.NameNode: STARTUP_MSG:

/************************************************************

STARTUP_MSG: Starting NameNode

STARTUP_MSG:   host = Shamim-2.local/192.168.0.103

STARTUP_MSG:   args = [-format]

STARTUP_MSG:   version = 0.20.2

STARTUP_MSG:   build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010

************************************************************/

12/07/15 15:54:21 INFO namenode.FSNamesystem: fsOwner=samim,staff,com.apple.sharepoint.group.1,everyone,_appstore,localaccounts,_appserverusr,admin,_appserveradm,_lpadmin,_lpoperator,_developer,com.apple.access_screensharing

12/07/15 15:54:21 INFO
namenode.FSNamesystem: supergroup=supergroup

12/07/15 15:54:21 INFO
namenode.FSNamesystem: isPermissionEnabled=true

12/07/15 15:54:21 INFO
common.Storage: Image file of size 95 saved in 0 seconds.

12/07/15 15:54:21 INFO
common.Storage: Storage directory /tmp/hadoop-samim/dfs/name has been successfully formatted.

12/07/15 15:54:21 INFO
namenode.NameNode: SHUTDOWN_MSG:

/************************************************************

SHUTDOWN_MSG: Shutting down NameNode at Shamim-2.local/192.168.0.103

************************************************************/

 

Start all
Hadoop components $ bin/hadoop-daemon.sh start namenode

 

hadoop-daemon.sh start jobtracker

hadoop-daemon.sh
start datanode

h
adoop-daemon.sh start tasktracker

hadoop-daemon.sh
start secondarynamenode

 

starting namenode, logging to /Users/samim/Development/NoSQL/hadoop/core/hadoop-0.20.2/bin/../logs/hadoop-samim-namenode-Shamim-2.local.out

starting
jobtracker, logging to /Users/samim/Development/NoSQL/hadoop/core/hadoop-0.20.2/bin/../logs/hadoop-samim-jobtracker-Shamim-2.local.out

starting
datanode, logging to /Users/samim/Development/NoSQL/hadoop/core/hadoop-0.20.2/bin/../logs/hadoop-samim-datanode-Shamim-2.local.out

you
can check all the log file to make sure that everything goes well.

 

Use the hadoop command-line tool to test the file system: $ hadoop dfs
-ls /

 

hadoop dfs -mkdir /test_dir

echo
"A few words to test" > /tmp/myfile

hadoop
dfs -copyFromLocal /tmp/myfile /test_dir

Other books

The View From Here by Cindy Myers
Blue Labyrinth by Douglas Preston, Lincoln Child
The Stepsister Scheme by Jim C. Hines
Wolfe Wanting by Joan Hohl
Made of Stars by Kelley York
Moan For Uncle by Terry Towers
The Fire Inside by Virginia Cavanaugh