Shell Scripting for HPC Clusters, Part 1
This is the first installment of a multipart guide for beginner Unix sysadmins supporting HPC clusters.
“For” and “While” Loop Constructs
The main challenge of supporting a Linux cluster is ensuring a homogeneous environment. Aside from small differences – primarily in network configuration – cluster nodes must be identical to achieve optimal performance and to simplify troubleshooting. Scripting is an important tool for administering any Unix system and it is particularly valuable for managing clusters.
“While” Loops
In a “while” loop, we set a variable to the number of the first cluster node and increment this variable by one with every iteration of the loop. This method works well if you need to access a consecutive range of nodes that are numbered without the use of lead-in zeros (i.e. “node1” and not “node01”).
#!/bin/ksh i=1 while [ $i –le 128 ] do ssh node$i "hostname ; date"& (( i = i + 1)) done
In the above example, the variable $i is set to 1 and the script connects to node1 (node$i) and runs the hostname and date commands. The variable $i is then incremented by 1, the script connects to node2 and repeats all the steps for as long as the variable $i is less or equal (-le) to 128, which is the total number of nodes in our cluster.
The following method can be used when node names use lead-in zeros or when there are gaps in the sequence.
cat nodelist.txt node1 node2 node3 … node128
#!/bin/ksh cat nodelist.txt | while read nodename do ssh $nodename "hostname ; date "& done
“For” Loops
This method is best for accessing a small number of nodes, as it requires you to type every node number. This would not be the best way to access all 128 nodes in our test cluster.
#!/bin/ksh for i in 1 2 3 do ssh node$i "hostname ; date "& done
The following method is equivalent to the second “while” loop example above, as it also uses a text file containing node names.
cat nodelist.txt node1 node2 node3 … node128
#!/bin/ksh for nodename in `cat nodelist.txt` do ssh $nodename "hostname ; date "& done
It is recommended that you use full path for the ssh, rsh, scp, rcp, etc. The commands to be executed on the remote host must always be enclosed in double-quotes. Multiple commands should be separated by semicolons. The ampersand should follow the remote commands and it should be outside double-quotes. The purpose of the ampersand is to background commands for each node to avoid the script hanging on a single node that may be down or otherwise inaccessible.
To make it easier to control which nodes are being accessed by the loop, it is recommended to use a while loop that reads the names of the nodes from a text file. You can easily comment out any nodes you don’t want to access.
cat nodelist.txt node1 #node2 node3 #node4 … node128
#!/bin/ksh cat nodelist.txt | grep -v "#" | while read nodename do ssh $nodename "hostname ; date "& done
If your node names use lead-in zeros (i.e. node001), you can still use the incremental while loop. However, it gets a bit complicated. The following loop will access nodes node001 through node128.
#!/bin/ksh i=1 while [ $i -le 128 ] do if [ $i -lt 10 ] then ssh node00$i "hostname ; date"& elif [ $i -lt 100 ] then ssh node0$i "hostname ; date"& elif [ $i -lt 1000 ] then ssh node$i "hostname ; date"& fi (( i = i + 1)) done
In a situation like this it will probably be easier to just generate a list of nodes and save it as a text file to be used as input for the loop.
#!/bin/ksh i=1 while [ $i -le 128 ] do if [ $i -lt 10 ] then echo "node00$i" >> nodelist.txt elif [ $i -lt 100 ] then echo "node0$i" >> nodelist.txt elif [ $i -lt 1000 ] then echo "node$i" >> nodelist.txt fi (( i = i + 1)) done
Practical Loop Examples
When executing complex commands on remote servers, it is a good idea to put all commands into a script and then to put this script into a directory exported via NFS to all the nodes. You can also RCP/SCP or FTP/SFTP the script to each node before running it. This way you can write simple loops that will call on the script and execute it locally on each node.
Loop Example 1
We need to connect to nodes 1 through 128 to add the new file server IP and hostname to the /etc/hosts file. We also need to add a new NFS mount to each node to be mounted at boot time.
First, create a simple script add_nfs_mount.ksh to add the file server name and IP to /etc/hosts, create a mountpoint, add the NFS mount to /etc/fstab, and to mount the new filesystem. Place this script into the shared directory /export/scripts, which is exported via NFS to all nodes.
#!/bin/ksh fileserver=nfsserver1 serverip=192.168.45.10 echo "192.168.45.10 nfsserver1" >> /etc/hosts mkdir /nfs_share1 echo "nfsserver1:/share1 /nfs_share1 nfs intr,bg 0 0" >> /etc/fstab mount /nfs_share1
Since this script is in a directory accessible from all cluster nodes, all you need to do now is to write a simple loop that would execute this script on each node. Don’t forget to make the script executable: chmod +x /export/scripts/add_nfs_mount.ksh
#!/bin/ksh i=1 while [ $i -le 128 ] do ssh node$i "/export/scripts/add_nfs_mount.ksh"& (( i = i + 1 )) done
Loop Example 2
There may be situations when you cannot mount an NFS share on all the nodes. An alternative would be to use SCP or RCP to copy the script to the nodes and then to execute is locally on each node. Let’s take a look at how this is done.
In this example we need to configure cluster nodes 1 through 128 to use US Eastern timezone and NTP. Let’s create the script /scripts/set_timezone.ksh
#!/bin/ksh mv /etc/localtime /etc/localtime_orig ln -sf /usr/share/zoneinfo/US/Eastern /etc/localtime grep -v TIMEZONE /etc/sysconfig/clock > /tmp/clock cat <> /tmp/clock TIMEZONE="US/Eastern" DEFAULT_TIMEZONE="US/Eastern" EOF mv /tmp/clock /etc/sysconfig/clock /sbin/hwclock --systohc cat < /etc/ntp.conf server 192.168.12.12 driftfile /var/lib/ntp/drift/ntp.drift EOF /sbin/chkconfig ntp on
Now we need to create a loop to scp this script to nodes 1 through 128 and to execute it locally on each node.
#!/bin/ksh i=1 while [ $i -le 128 ] do scp /scrips/set_timezone.ksh node${i}:/tmp/ ssh node$i "chmod +x /tmp/set_timezone.ksh ; /tmp/set_timezone.ksh"& (( i = i + 1 )) done
Loop Example 3
Another way of putting a script on the cluster nodes is to use FTP/SFTP. In the following example we need to install an RPM package on each cluster node. The first step is to FTP the /tmp/package.rpm file to all the nodes.
#!/bin/ksh ftp_user="mike" ftp_pass="p@ssw0rd" i=1 while [ $i -le 128 ] do ftpput() { { echo "open node$i" echo "user $ftp_user $ftp_pass" echo "bin" echo "lcd /tmp" echo "cd /tmp" echo "put package.rpm" echo "quit" } | ftp -nvi -T 3 } ftpput (( i = i + 1 )) done
The final step is easy. All we need to do is to SSH to each node and install the RPM.
#!/bin/ksh i=1 while [ $i -le 128 ] do ssh node$i "rpm -i /tmp/package.rpm" (( i = i + 1 )) done
The second part of this guide – Searching, Replacing, Comparing – will be published next week. Stay tuned.
One Comment »
1 Pingbacks »
-
[…] This is the second installment of a multipart guide for beginner Unix sysadmins supporting HPC clusters. You can view the first part of the guide here. […]
For a tightly coupled computational fluid dynamics (CFD) code, what parts of cluster should receive priority?
Hint: You should consider these four parts of a cluster: node speed (i.e. processor speed in a node), memory, network fabric, and storage, and which parts should receive priority.