Read Big Data on a Shoestring Online
Authors: Nicholas Bessmer
Select 1 instance of the MICRO – as shown in example above.
I do not want to swallow you in technical details but you will see screens indicating your progress as follows:
You will need to store a special file called a KEY PAIR to your local computer. Save this! You will be prompted by Amazon to remember this.
Finally:
Start your
instance – your virtual server.
You can connect to your new instance by downloading these tools: You can right click on your instance name and select “CONNECT”:
In about 20 minutes you can have a LINUX server running in The Cloud!
Remember that a lot of computer technology is a series of a repetitive and often cook book like steps. It is the 80/20 rule: 20% of the work is challenging and imaginative – the other 80% very predictable. But predictable can be good especially when trying to avoid techno-speak if you are for example a small business owner.
Let’s get our tool installed! In your command terminal fired off when you CONNECT to your new EC2 LINUX installation, use SFTP plugin:
It lets you see your local computer files and you remote EC2 instance. Now download the following
to your computer:
http://www.sai.msu.su/apache/hadoop/core/stable/
for the latest stable version of HADOOP and download CASSANDRA:
http://cassandra.apache.org/download/
“PIG” is a query language designed for Big Data. We will use this query our Big Data dataset.
http://www.sai.msu.su/apache/pig
Now copy these over to your new EC2 Linux Server:
Once the files have been copied copy and paste the following command:
tar
-xvf hadoop-0.20.2.tar.gz
tar
-xvf apache-cassandra-1.2.1-bin.tar.gz
tar
-xvf pig-0.10.1.tar.gz
Please also be sure to run this command in this directory by typing these commands:
»
cd pig-0.10.1 (cd changes
»
tar –xvf tutorial.tar (also can use utility gunzip)
This extracts the files which are compressed much like a ZIP file.
It is possible to choose MS Windows Server as your preferred EC2 server. We installed Linux here (it is cheaper to run than Windows Server)… so editing files with the VI editing tool is a bit harder to do. Lookup VI on the Internet – it is like a very powerful Windows notepad but is command line driven.
Type the following:
»
cd (changes to the main directory)
»
vi .bash_profile (vi is the editor and you will be modifying a simple text configuration file) – please see this helpful link from University of San Diego
http://acms.ucsd.edu/info/vi_tutorial.html
»
copy and paste the following into your file
# .
bash_profile
# Get the aliases and functions
if [ -f ~/.bashrc ]; then
. ~/.bashrc
fi
# User specific environment and startup programs
PATH=$PATH:$HOME/bin:/home/ec2-user/hadoop-0.20.2/bin:/home/ec2-user/pig-0.10.1/bin
sh_profile
# Get the aliases and functions
if [ -f ~/.bashrc ]; then
. ~/.bashrc
fi
# User specific environment and startup programs
PATH=$PATH:$HOME/bin:/home/ec2-user/hadoop-0.20.2/bin:/home/ec2-user/pig-0.10.1/bin
export PATH
We need to edit the following files and run these commands next following up on step #1 above of downloading the Hadoop TAR file:
fs.default.name
hdfs://localhost:9000
mapred.job.tracker
localhost:9001
dfs.replication = “1”
$ bin/hadoop namenode –format
It should print out something like the following message:
12/07/15 15:54:20 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = Shamim-2.local/192.168.0.103
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 0.20.2
STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010
************************************************************/
12/07/15 15:54:21 INFO namenode.FSNamesystem: fsOwner=samim,staff,com.apple.sharepoint.group.1,everyone,_appstore,localaccounts,_appserverusr,admin,_appserveradm,_lpadmin,_lpoperator,_developer,com.apple.access_screensharing
12/07/15 15:54:21 INFO
namenode.FSNamesystem: supergroup=supergroup
12/07/15 15:54:21 INFO
namenode.FSNamesystem: isPermissionEnabled=true
12/07/15 15:54:21 INFO
common.Storage: Image file of size 95 saved in 0 seconds.
12/07/15 15:54:21 INFO
common.Storage: Storage directory /tmp/hadoop-samim/dfs/name has been successfully formatted.
12/07/15 15:54:21 INFO
namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at Shamim-2.local/192.168.0.103
************************************************************/
hadoop-daemon.sh start jobtracker
hadoop-daemon.sh
start datanode
h
adoop-daemon.sh start tasktracker
hadoop-daemon.sh
start secondarynamenode
starting namenode, logging to /Users/samim/Development/NoSQL/hadoop/core/hadoop-0.20.2/bin/../logs/hadoop-samim-namenode-Shamim-2.local.out
starting
jobtracker, logging to /Users/samim/Development/NoSQL/hadoop/core/hadoop-0.20.2/bin/../logs/hadoop-samim-jobtracker-Shamim-2.local.out
starting
datanode, logging to /Users/samim/Development/NoSQL/hadoop/core/hadoop-0.20.2/bin/../logs/hadoop-samim-datanode-Shamim-2.local.out
you
can check all the log file to make sure that everything goes well.
hadoop dfs -mkdir /test_dir
echo
"A few words to test" > /tmp/myfile
hadoop
dfs -copyFromLocal /tmp/myfile /test_dir