Changing port no. of Hadoop Cluster

Symptoms

The machines in my Hadoop cluster cannot connect & communicate each other. But my master/slave configurations are fine.

Solutions

You must change your ssh port number. Assume that the port number that will be used in a Hadoop cluster is 20002. What you have to do is twofold: first, change the port number that will be used in ssh communication. Second, notify updated port number to the Hadoop cluster.

* Disclamer: Following instructions are written based on Hadoop 1.2.1 & Ubuntu 12.04. With different versions & systems, commands can be a little bit differ.

step 1: changing ssh daemon configuration

open /etc/ssh/sshd_config with with your favorite text editor. In this example, I will work with vi:

sudo vi /etc/ssh/sshd_config

update following part:

From:

# What ports, IPs and protocols we listen for
Port 22

To:

# What ports, IPs and protocols we listen for
Port 20002

after updating, restart the ssh daemon:

sudo service sshd restart

step 2: changing Hadoop configuration

update your conf/hadoop-env.sh:

From:

# Extra ssh options.  Empty by default.
# export HADOOP_SSH_OPTS="-o ConnectTimeout=1 -o SendEnv=HADOOP_CONF_DIR"

To:

# Extra ssh options.  Empty by default.
export HADOOP_SSH_OPTS="-p 20002 -o ConnectTimeout=1 -o SendEnv=HADOOP_CONF_DIR"

after updating, restart the cluster:

stop-all.sh
start-all.sh

Description

Hadoop uses ssh protocol for its intra-cluster communication, i.e, master node logs on & run commands in slave machines via ssh. 22 is the default port of ssh communication, which is generally disabled - as malicious attackers can target this channel to acquire administrator rights. It is why most organisations (e.g., university or research institutes) prohibit this port number1.

Changing port number is the only way to work around of it.

  1. For details, refer here.