Networking

Unix and Linux network configuration. Multiple network interfaces. Bridged NICs. High-availability network configurations.

Applications

Reviews of latest Unix and Linux software. Helpful tips for application support admins. Automating application support.

Data

Disk partitioning, filesystems, directories, and files. Volume management, logical volumes, HA filesystems. Backups and disaster recovery.

Monitoring

Distributed server monitoring. Server performance and capacity planning. Monitoring applications, network status and user activity.

Commands & Shells

Cool Unix shell commands and options. Command-line tools and application. Things every Unix sysadmin needs to know.

Home » Commands & Shells, Scripts

Shell Scripting for HPC Clusters, Part 1

Submitted by on October 10, 2009 – 12:59 am One Comment

This is the first installment of a multipart guide for beginner Unix sysadmins supporting HPC clusters.

“For” and “While” Loop Constructs

The main challenge of supporting a Linux cluster is ensuring a homogeneous environment. Aside from small differences – primarily in network configuration – cluster nodes must be identical to achieve optimal performance and to simplify troubleshooting. Scripting is an important tool for administering any Unix system and it is particularly valuable for managing clusters.

“While” Loops

In a “while” loop, we set a variable to the number of the first cluster node and increment this variable by one with every iteration of the loop. This method works well if you need to access a consecutive range of nodes that are numbered without the use of lead-in zeros (i.e. “node1” and not “node01”).

#!/bin/ksh
i=1
while [ $i –le 128 ]
do
	ssh node$i "hostname ; date"&
	(( i = i + 1))
done

In the above example, the variable $i is set to 1 and the script connects to node1 (node$i) and runs the hostname and date commands. The variable $i is then incremented by 1, the script connects to node2 and repeats all the steps for as long as the variable $i is less or equal (-le) to 128, which is the total number of nodes in our cluster.

The following method can be used when node names use lead-in zeros or when there are gaps in the sequence.

cat nodelist.txt
node1
node2
node3
…
node128
#!/bin/ksh
cat nodelist.txt | while read nodename
do
	ssh $nodename "hostname ; date "&
done

“For” Loops

This method is best for accessing a small number of nodes, as it requires you to type every node number. This would not be the best way to access all 128 nodes in our test cluster.

#!/bin/ksh
for i in 1 2 3
do
	ssh node$i "hostname ; date "&
done

The following method is equivalent to the second “while” loop example above, as it also uses a text file containing node names.

cat nodelist.txt
node1 node2 node3 … node128
#!/bin/ksh
for nodename in `cat nodelist.txt`
do
	ssh $nodename "hostname ; date "&
done

It is recommended that you use full path for the ssh, rsh, scp, rcp, etc. The commands to be executed on the remote host must always be enclosed in double-quotes. Multiple commands should be separated by semicolons. The ampersand should follow the remote commands and it should be outside double-quotes. The purpose of the ampersand is to background commands for each node to avoid the script hanging on a single node that may be down or otherwise inaccessible.

To make it easier to control which nodes are being accessed by the loop, it is recommended to use a while loop that reads the names of the nodes from a text file. You can easily comment out any nodes you don’t want to access.

cat nodelist.txt
node1
#node2
node3
#node4
…
node128
#!/bin/ksh
cat nodelist.txt | grep -v "#" | while read nodename
do
	ssh $nodename "hostname ; date "&
done

If your node names use lead-in zeros (i.e. node001), you can still use the incremental while loop. However, it gets a bit complicated. The following loop will access nodes node001 through node128.

#!/bin/ksh
i=1
while [ $i -le 128 ]
do
	if [ $i -lt 10 ]
	then
		ssh node00$i "hostname ; date"&
	elif [ $i -lt 100 ]
	then
		ssh node0$i "hostname ; date"&
	elif [ $i -lt 1000 ]
	then
		ssh node$i "hostname ; date"&
	fi
	(( i = i + 1))
done

In a situation like this it will probably be easier to just generate a list of nodes and save it as a text file to be used as input for the loop.

#!/bin/ksh
i=1
while [ $i -le 128 ]
do
	if [ $i -lt 10 ]
	then
		echo "node00$i" >> nodelist.txt
	elif [ $i -lt 100 ]
	then
		echo "node0$i" >> nodelist.txt
	elif [ $i -lt 1000 ]
	then
		echo "node$i" >> nodelist.txt
	fi
	(( i = i + 1))
done

Practical Loop Examples

When executing complex commands on remote servers, it is a good idea to put all commands into a script and then to put this script into a directory exported via NFS to all the nodes. You can also RCP/SCP or FTP/SFTP the script to each node before running it. This way you can write simple loops that will call on the script and execute it locally on each node.

Loop Example 1

We need to connect to nodes 1 through 128 to add the new file server IP and hostname to the /etc/hosts file. We also need to add a new NFS mount to each node to be mounted at boot time.

First, create a simple script add_nfs_mount.ksh to add the file server name and IP to /etc/hosts, create a mountpoint, add the NFS mount to /etc/fstab, and to mount the new filesystem. Place this script into the shared directory /export/scripts, which is exported via NFS to all nodes.

#!/bin/ksh

fileserver=nfsserver1
serverip=192.168.45.10

echo "192.168.45.10 nfsserver1" >> /etc/hosts

mkdir /nfs_share1

echo "nfsserver1:/share1 /nfs_share1 nfs intr,bg 0 0" >> /etc/fstab

mount /nfs_share1

Since this script is in a directory accessible from all cluster nodes, all you need to do now is to write a simple loop that would execute this script on each node. Don’t forget to make the script executable: chmod +x /export/scripts/add_nfs_mount.ksh

#!/bin/ksh
i=1
while [ $i -le 128 ]
do
	ssh node$i "/export/scripts/add_nfs_mount.ksh"&
	(( i = i + 1 ))
done

Loop Example 2

There may be situations when you cannot mount an NFS share on all the nodes. An alternative would be to use SCP or RCP to copy the script to the nodes and then to execute is locally on each node. Let’s take a look at how this is done.

In this example we need to configure cluster nodes 1 through 128 to use US Eastern timezone and NTP. Let’s create the script /scripts/set_timezone.ksh

#!/bin/ksh
mv /etc/localtime /etc/localtime_orig
ln -sf /usr/share/zoneinfo/US/Eastern /etc/localtime
grep -v TIMEZONE /etc/sysconfig/clock > /tmp/clock

cat <> /tmp/clock
TIMEZONE="US/Eastern"
DEFAULT_TIMEZONE="US/Eastern"
EOF

mv /tmp/clock /etc/sysconfig/clock
/sbin/hwclock --systohc

cat < /etc/ntp.conf
server 192.168.12.12
driftfile /var/lib/ntp/drift/ntp.drift
EOF

/sbin/chkconfig ntp on

Now we need to create a loop to scp this script to nodes 1 through 128 and to execute it locally on each node.

#!/bin/ksh
i=1
while [ $i -le 128 ]
do
	scp /scrips/set_timezone.ksh node${i}:/tmp/
	ssh node$i "chmod +x /tmp/set_timezone.ksh ; /tmp/set_timezone.ksh"&
	(( i = i + 1 ))
done

Loop Example 3

Another way of putting a script on the cluster nodes is to use FTP/SFTP. In the following example we need to install an RPM package on each cluster node. The first step is to FTP the /tmp/package.rpm file to all the nodes.

#!/bin/ksh

ftp_user="mike"
ftp_pass="p@ssw0rd"

i=1
while [ $i -le 128 ]
do
	ftpput() {
		{
			echo "open node$i"
			echo "user $ftp_user $ftp_pass"
			echo "bin"
			echo "lcd /tmp"
			echo "cd /tmp"
			echo "put package.rpm"
			echo "quit"
		} | ftp -nvi -T 3
	}

	ftpput
	(( i = i + 1 ))
done

The final step is easy. All we need to do is to SSH to each node and install the RPM.

#!/bin/ksh
i=1
while [ $i -le 128 ]
do
	ssh node$i "rpm -i /tmp/package.rpm"
	(( i = i + 1 ))
done

The second part of this guide – Searching, Replacing, Comparing – will be published next week. Stay tuned.

Print Friendly, PDF & Email

One Comment »

  • davemc74656 says:

    For a tightly coupled computational fluid dynamics (CFD) code, what parts of cluster should receive priority?

    Hint: You should consider these four parts of a cluster: node speed (i.e. processor speed in a node), memory, network fabric, and storage, and which parts should receive priority.

1 Pingbacks »

Leave a Reply

%d bloggers like this: