今天介绍一下hadoop的相关配置。现在hadoop的版本更新比较快,在配置的时候肯定有些不同,大家可以参考官方文档进行配置。安装hadoop有些先决条件:Sun Java6(更高的版本也行,至于OpenJDK我还没有尝试过。),添加专用的hadoop系统用户,配置SSH(这里的ssh是指的OpenServer,用于在多节点下进行远程操作)

1.linux下安装sun-jdk ,下面是具体步骤

参考:

  • 下载sun-jdk-6-bin 
  • 确保文件具有可执行权限
1 chmod +x jdk-6u32-linux-x64.bin
  •      执行bin文件 
1 ./jdk-6u32-linux-x64.bin
  •     移动解压后的文件到指定目录
1 sudo mv jdk1.6.0_32 /usr/lib/jvm/
  •     在系统中安装新的java源
1 sudo update-alternatives --install/usr/bin/javac javac /usr/lib/jvm/jdk1.6.0_32/bin/javac 1
2 sudo update-alternatives --install/usr/bin/java java /usr/lib/jvm/jdk1.6.0_32/bin/java 1
3 sudo update-alternatives --install/usr/bin/javaws javaws /usr/lib/jvm/jdk1.6.0_32/bin/javaws 1
  •    当系统中存在多个java版本时,需要配置系统默认的java
1 sudo update-alternatives --config javac
2 sudo update-alternatives --config java
3 sudo update-alternatives --config javaws
  •     验证java版本
1 java -version

 

2.添加hadoop的系统用户

我们需要使用一个hadoop用户来运行hadoop.

 

$ sudo addgroup hadoop   //添加用户组$ sudo adduser --ingroup hadoop hduser //在组内添加用户

 

3.SSH配置

SSH的功能已经给大家介绍了。这里我们直接进行SSH的配置。注意:为了在远程访问的时候避免每次都输入密码,我们在生成密钥的时候一般不输入密码或者密码为空。

 

user@ubuntu:~$ su - hduserhduser@ubuntu:~$ ssh-keygen -t rsa -P ""Generating public/private rsa key pair.Enter file in which to save the key (/home/hduser/.ssh/id_rsa):Created directory '/home/hduser/.ssh'.Your identification has been saved in /home/hduser/.ssh/id_rsa.Your public key has been saved in /home/hduser/.ssh/id_rsa.pub.The key fingerprint is:9b:82:ea:58:b4:e0:35:d7:ff:19:66:a6:ef:ae:0e:d2 hduser@ubuntuThe key's randomart p_w_picpath is:[...snipp...]hduser@ubuntu:~$

 

接下来我们要让SSH能使用新生成的密钥。需要做一下事情。

 

hduser@ubuntu:~$ cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys

 

最后测试链接到本机是否成功:

 

hduser@ubuntu:~$ ssh localhostThe authenticity of host 'localhost (::1)' can't be established.RSA key fingerprint is d7:87:25:47:ae:02:00:eb:1d:75:4f:bb:44:f9:36:26.Are you sure you want to continue connecting (yes/no)? yesWarning: Permanently added 'localhost' (RSA) to the list of known hosts.Linux ubuntu 2.6.32-22-generic #33-Ubuntu SMP Wed Apr 28 13:27:30 UTC 2010 i686 GNU/LinuxUbuntu 10.04 LTS[...snipp...]hduser@ubuntu:~$

 

当你看到上面的信息时候。说明已经成功了。

4.Hadoop安装,你需要从Apache的官方网站下载Hadoop的文件。当前安装的文件是0.2的版本。

下载以后的操作:

 

$ cd /usr/local$ sudo tar xzf hadoop-1.0.3.tar.gz$ sudo mv hadoop-1.0.3 hadoop$ sudo chown -R hduser:hadoop hadoop

 

更新$HOME/.bashrc文件,在文件的末尾添加以下内容:

 

# Set Hadoop-related environment variablesexport HADOOP_HOME=/usr/local/hadoop# Set JAVA_HOME (we will also configure JAVA_HOME directly for Hadoop later on)export JAVA_HOME=/usr/lib/jvm/java-6-sun# Some convenient aliases and functions for running Hadoop-related commandsunalias fs &> /dev/nullalias fs="hadoop fs"unalias hls &> /dev/nullalias hls="fs -ls"# If you have LZO compression enabled in your Hadoop cluster and# compress job outputs with LZOP (not covered in this tutorial):# Conveniently inspect an LZOP compressed file from the command# line; run via:## $ lzohead /hdfs/path/to/lzop/compressed/file.lzo## Requires installed 'lzop' command.#lzohead () {    hadoop fs -cat $1 | lzop -dc | head -1000 | less}# Add Hadoop bin/ directory to PATHexport PATH=$PATH:$HADOOP_HOME/bin

 

下面对hadoop的文件进行配置:

首先是/usr/local/hadoop/conf/hadoop-env.sh文件

将${JAVA_HOME}改成你的jdk安装路径

 

# The java implementation to use.  Required.# export JAVA_HOME=${JAVA_HOME}

to

# The java implementation to use.  Required.export JAVA_HOME=/usr/lib/jvm/java-6-sun

 

其次是改变conf/core-site.xml文件:

 

hadoop.tmp.dir
/app/hadoop/tmp
A base for other temporary directories.
fs.default.name
hdfs://localhost:54310
The name of the default file system. A URI whose scheme and authority determine the FileSystem implementation. The uri's scheme determines the config property (fs.SCHEME.impl) naming the FileSystem implementation class. The uri's authority is used to determine the host, port, etc. for a filesystem.

 

接下来是conf/mapred-site.xml文件:

 

mapred.job.tracker
localhost:54311
The host and port that the MapReduce job tracker runs at. If "local", then jobs are run in-process as a single map and reduce task.

 

最后是conf/hdfs-site.xml:

 

dfs.replication
1
Default block replication. The actual number of replications can be specified when the file is created. The default is used if replication is not specified in create time.

 

在启动hadoop之前我们需要对HDFS文件系统进行格式化,执行一下命令即可。

 

hduser@ubuntu:~$ /usr/local/hadoop/bin/hadoop namenode -format

 

 

10/05/08 16:59:56 INFO namenode.NameNode: STARTUP_MSG:/************************************************************STARTUP_MSG: Starting NameNodeSTARTUP_MSG:   host = ubuntu/127.0.1.1STARTUP_MSG:   args = [-format]STARTUP_MSG:   version = 0.20.2STARTUP_MSG:   build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010************************************************************/10/05/08 16:59:56 INFO namenode.FSNamesystem: fsOwner=hduser,hadoop10/05/08 16:59:56 INFO namenode.FSNamesystem: supergroup=supergroup10/05/08 16:59:56 INFO namenode.FSNamesystem: isPermissionEnabled=true10/05/08 16:59:56 INFO common.Storage: Image file of size 96 saved in 0 seconds.10/05/08 16:59:57 INFO common.Storage: Storage directory .../hadoop-hduser/dfs/name has been successfully formatted.10/05/08 16:59:57 INFO namenode.NameNode: SHUTDOWN_MSG:/************************************************************SHUTDOWN_MSG: Shutting down NameNode at ubuntu/127.0.1.1************************************************************/hduser@ubuntu:/usr/local/hadoop$

 

启动单节点集群:

 

hduser@ubuntu:/usr/local/hadoop$ bin/start-all.shstarting namenode, logging to /usr/local/hadoop/bin/../logs/hadoop-hduser-namenode-ubuntu.outlocalhost: starting datanode, logging to /usr/local/hadoop/bin/../logs/hadoop-hduser-datanode-ubuntu.outlocalhost: starting secondarynamenode, logging to /usr/local/hadoop/bin/../logs/hadoop-hduser-secondarynamenode-ubuntu.outstarting jobtracker, logging to /usr/local/hadoop/bin/../logs/hadoop-hduser-jobtracker-ubuntu.outlocalhost: starting tasktracker, logging to /usr/local/hadoop/bin/../logs/hadoop-hduser-tasktracker-ubuntu.outhduser@ubuntu:/usr/local/hadoop$

 

你可以使用以下命令来查看hadoop的监听端口:

 

hduser@ubuntu:~$ sudo netstat -plten | grep javatcp   0  0 0.0.0.0:50070   0.0.0.0:*  LISTEN  1001  9236  2471/javatcp   0  0 0.0.0.0:50010   0.0.0.0:*  LISTEN  1001  9998  2628/javatcp   0  0 0.0.0.0:48159   0.0.0.0:*  LISTEN  1001  8496  2628/javatcp   0  0 0.0.0.0:53121   0.0.0.0:*  LISTEN  1001  9228  2857/javatcp   0  0 127.0.0.1:54310 0.0.0.0:*  LISTEN  1001  8143  2471/javatcp   0  0 127.0.0.1:54311 0.0.0.0:*  LISTEN  1001  9230  2857/javatcp   0  0 0.0.0.0:59305   0.0.0.0:*  LISTEN  1001  8141  2471/javatcp   0  0 0.0.0.0:50060   0.0.0.0:*  LISTEN  1001  9857  3005/javatcp   0  0 0.0.0.0:49900   0.0.0.0:*  LISTEN  1001  9037  2785/javatcp   0  0 0.0.0.0:50030   0.0.0.0:*  LISTEN  1001  9773  2857/javahduser@ubuntu:~$

 

停止单节点集群:

 

hduser@ubuntu:/usr/local/hadoop$ bin/stop-all.shstopping jobtrackerlocalhost: stopping tasktrackerstopping namenodelocalhost: stopping datanodelocalhost: stopping secondarynamenodehduser@ubuntu:/usr/local/hadoop$