1. Connect server

Use CRT or Xshell to connect to the server remotely.

2. Network config

First, configure DNS for the server and check whether it can access the internet.

Enter the following command:

1
ping [www.baidu.com][1]

If you receive a reply from [www.baidu.com][1], it means the server can access the internet. If you get “unknown host,” the DNS configuration is incorrect and needs to be set up.

There are two ways to configure DNS:

Temporary solution: edit the resolv.conf file and add “nameserver 8.8.8.8”.
1 2
vi /etc/resolv.conf nameserver 8.8.8.8
Permanent solution: edit the ifcfg_eth0 file and add “DNS1=8.8.8.8”.
1 2
vi /etc/sysconfig/network-scripts/ifcfg-eth0 DNS1=8.8.8.8

After configuration, run the command “service network restart”.

The second approach is recommended, as the first one will be lost when the server restarts.

Disable the firewall

Enter the command:

1
service iptables stop

3. File transfer

During the installation process, you may need to upload local files to the server. Here is one approach.

Enter the command:

1
sudo yum install lrzsz

After installing the lrzsz package, enter:

1
rz

Then select the file you want to upload.

Yum (Yellow dog Updater Modified) is an RPM-based package manager that can automatically download and install RPM packages from specified servers, handling dependency relationships automatically and installing all dependent packages at once, eliminating the tedious process of downloading and installing them one by one.

4. Install CDH

HDFS: A distributed file storage system, the open-source Java implementation of GFS.

MapReduce: A parallel computing programming model for large-scale datasets.

HBase: NoSQL column database

Hive: Data warehouse

Zookeeper: Distributed lock service

Pig: Big data analysis platform interface

Home link: http://www.cloudera.com/content/cloudera/en/products-and-services/cdh.html

Setup:

1. Java

CentOS 6.7 comes with JDK 1.7, so this step can be skipped. Any JDK-related steps during installation can be ignored.

2. Cloudera Manager Server

Cloudera Manager makes it easy to manage Hadoop deployments of any scale in production. Quickly deploy, configure, and monitor your cluster through an intuitive UI – complete with rolling upgrades, backup and disaster recovery, and customizable alerting.

Download:
http://www.cloudera.com/content/cloudera/en/downloads/cloudera_manager/cm-5-4-7.html
Download the latest version 5.4.7.
After downloading, you will get a file of about 503KB named cloudera-manager-installer.bin. Upload this file (using “rz”) to a designated folder on the server.

Close selinux:

1
2
/etc/selinux/config
selinux=disabled

Hostname
You need to modify the hostname of the machine. To check the current hostname, simply type hostname.
Enter:
1
vi /etc/hosts
Edit the file and add the line “192.168._._ master.com master”.
Note: when adding datanode nodes, do not name them “master” – use names like “datanode1” instead.
Run the “reboot” command to restart the server.

3. CDH

Using cloudera manager

Navigate to the directory containing cloudera-manager-installer.bin and run:

1
2
chmod u+x cloudera-manager-installer.bin
./ cloudera-manager-installer.bin

Keep clicking next. The selected items will be highlighted in bold.

This may take a while, so please be patient…

Choose express free version.
Display the packages to be installed
Specify CDH cluster hosts
Install CDH
Enter username and password
Install the specified Parcel
Select services

4. Storm

A distributed real-time computation system for processing high-speed, large-volume data streams. It adds reliable real-time data processing capabilities to Hadoop.

5. Spark

Spark uses in-memory computing. Starting from multi-iteration batch processing, it allows data to be loaded into memory for repeated queries. It also integrates multiple computing paradigms such as data warehousing, stream processing, and graph computing. Spark is built on top of HDFS and integrates well with Hadoop.

6. Problems

Refreshing metadata packages

1
2
Another app is currently holding the yum lock; waiting for it to exit...
rm -f /var/run/yum.pid

Uninstall Cloudera Manager

1
$ sudo /usr/share/cmf/uninstall-cloudera-manager.sh

Installation failed: Unable to receive heartbeat from Agent

1
2
3
4
5
cat /etc/hosts
ip    域名       主机名
192.168.*.1    master.com     master
cat   /etc/sysconfig/network
HOSTNAME=master.com

Uninstall Cloudera Manager

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
sudo rm -Rf /usr/share/cmf /var/lib/cloudera* /var/cache/yum/cloudera*
sudo /usr/share/cmf/uninstall-cloudera-manager.sh
sudo service cloudera-scm-server stop
sudo service cloudera-scm-server-db stop
cloudera-manager-server-db
sudo yum remove cloudera-manager-server
sudo yum remove cloudera-manager-server-db-2
sudo service cloudera-scm-agent hard_stop
sudo yum remove 'cloudera-manager-*'
sudo yum clean all
sudo rm -Rf /usr/share/cmf /var/lib/cloudera* /var/cache/yum/cloudera* /var/log/cloudera* /var/run/cloudera*
sudo rm -Rf /var/cache/apt/archives/cloudera*
sudo rm /tmp/.scm_prepare_node.lock
sudo rm -Rf /var/lib/flume-ng /var/lib/hadoop* /var/lib/hue /var/lib/navigator /var/lib/oozie /var/lib/solr /var/lib/sqoop* /var/lib/zookeeper
sudo rm -Rf /dfs /mapred /yarn