| Shell Scripting for HPC Clusters, Part 2KrazyWorks

Networking

Unix and Linux network configuration. Multiple network interfaces. Bridged NICs. High-availability network configurations.

Applications

Reviews of latest Unix and Linux software. Helpful tips for application support admins. Automating application support.

Data

Disk partitioning, filesystems, directories, and files. Volume management, logical volumes, HA filesystems. Backups and disaster recovery.

Monitoring

Distributed server monitoring. Server performance and capacity planning. Monitoring applications, network status and user activity.

Commands & Shells

Cool Unix shell commands and options. Command-line tools and application. Things every Unix sysadmin needs to know.

Home » Commands & Shells, Scripts

Shell Scripting for HPC Clusters, Part 2

Submitted by Igor on October 23, 2009 – 12:30 pm

This is the second installment of a multipart guide for beginner Unix sysadmins supporting HPC clusters. You can view the first part of the guide here.

Searching, Replacing, Comparing

Try to work with a large cluster and you will soon discover that much of your time is spent on searching for text strings in thousands of files across hundreds of nodes. Cluster nodes should be mostly identical, but there are always small configuration differences. And keeping track of these small differences is a big job. Taking some time to write a good search script will save you a lot of time and effort in the long run.

Searching for Text Strings

A text string search script needs to have three search parameters: path, filename, and text string. It is a good idea to limit the search to ASCII text files: searching binary files may take a long time and will stress your CPU. In the exmple below, the script prompts you to specify the path, the filename string, and the text string. The script then searches recursively for ASCII text files that match these criteria.

Everything the script does is separated into functions. The first function – enter_dir_func – prompts you to enter the name of the directory to be searched. It then verifies that the specified directory actually exists and is readable. Next function – enter_find_string_func – prompts you to enter the search string and so on. Breaking down the script into a series of functions greatly simplifies the task of writing complex scripts.

#!/bin/ksh

#---------------------------------------------
# FUNCTIONS
#---------------------------------------------

enter_dir_func() {
        SDIR=
        CDIR=`pwd`
        echo ""
        echo -n "Enter directory to search [$CDIR]: "
        read SDIR
        if [ ! -n "$SDIR" ]
        then
                SDIR="$CDIR"
        else
                if [ ! -d "$SDIR" ]
                then
                        echo "Directory $SDIR does not exist!"
                        exit 1
                elif [ ! -r "$SDIR" ]
                then
                        echo "Directory $SDIR is not readable!"
                        exit 1
                fi
        fi
}

#---------------------------------------------

enter_find_string_func() {
        FINDSTR=
        echo -n "Enter search string: "
        read FINDSTR
        if [ ! -n "$FINDSTR" ]
        then
                echo "Search string cannot be null!"
                exit 1
        fi
}

#---------------------------------------------

enter_filename_string_func() {
        FILESTR=
        echo -n "Enter filename string: "
        read FILESTR
}

#---------------------------------------------

case_sensitive() {
        echo -n "Case sensitive search? [n]:  "
        read CASESENCE

        case "$CASESENCE" in
                y*|Y*) CASESENCE=1 ;;
                n*|N*) CASESENCE=0 ;;
                *)     CASESENCE=0 ;;
        esac
}

#---------------------------------------------

display_hits() {
        echo -n "Display hits? [n]:  "
        read DISPLAYHITS

        case "$DISPLAYHITS" in
                y*|Y*) DISPLAYHITS=1 ;;
                n*|N*) DISPLAYHITS=0 ;;
                *)     DISPLAYHITS=0 ;;
        esac
}

#---------------------------------------------

log_to_file() {
        echo -n "Log to file? [n]: "
        read LOGTOFILE

        case "$LOGTOFILE" in
                y*|Y*) LOGTOFILE=1 ;;
                n*|N*) LOGTOFILE=0 ;;
                *)     LOGTOFILE=0 ;;
        esac
}

#---------------------------------------------

display_hit() {
        if [ $DISPLAYHITS -eq 1 ]
        then
                if [ $LOGTOFILE -eq 1 ]
                then
                        SEARCHLOG=/tmp/searchlog_`date +'%Y-%m-%d_%H-%M'`.txt

                        if [ $HITS -eq 1 ]
                        then
                                echo "Found $HITS match in ${FILENAME}" >> $SEARCHLOG
                        else
                                echo "Found $HITS matches in ${FILENAME}" >> $SEARCHLOG
                        fi

                        if [ $CASESENCE -eq 1 ]
                        then
                                grep "$FINDSTR" "$FILENAME"
                        else
                                grep -i "$FINDSTR" "$FILENAME"
                        fi | while read HITSTR
                        do
                                echo -n "       ${HITSTR}" | cut -d " " -f1-10
                                echo -n "       ${HITSTR}" | cut -d " " -f1-10 >> $SEARCHLOG
                        done
                else
                        if [ $CASESENCE -eq 1 ]
                        then
                                grep "$FINDSTR" "$FILENAME"
                        else
                                grep -i "$FINDSTR" "$FILENAME"
                        fi | while read HITSTR
                        do
                                echo -n "       ${HITSTR}" | cut -d " " -f1-10
                        done
                fi
        else
                if [ $LOGTOFILE -eq 1 ]
                then
                        SEARCHLOG=/tmp/searchlog_`date +'%Y-%m-%d_%H-%M'`.txt

                        if [ $HITS -eq 1 ]
                        then
                                echo "Found $HITS match in ${FILENAME}" >> $SEARCHLOG
                        else
                                echo "Found $HITS matches in ${FILENAME}" >> $SEARCHLOG
                        fi

                        if [ $CASESENCE -eq 1 ]
                        then
                                grep "$FINDSTR" "$FILENAME"
                        else
                                grep -i "$FINDSTR" "$FILENAME"
                        fi | while read HITSTR
                        do
                                echo -n "       ${HITSTR}" | cut -d " " -f1-10 >> $SEARCHLOG
                        done
                fi
        fi
}

#---------------------------------------------

search_func() {
        echo "Performing search..."
        echo "-----------------------------------------------"
        find "$SDIR" -type f -name "*${FILESTR}*" | while read FILENAME
        do
                if [ `file "$FILENAME" | awk '{print $NF}'` == "text" ]
                then
                        HITS=0
                        if [ $CASESENCE -eq 1 ]
                        then
                                HITS=`grep -c "$FINDSTR" "$FILENAME"`
                        else
                                HITS=`grep -i -c "$FINDSTR" "$FILENAME"`
                        fi

                        if [ $HITS -gt 0 ]
                        then
                                if [ $HITS -eq 1 ]
                                then
                                        echo ""
                                        echo "Found $HITS match in ${FILENAME}"
                                        display_hit
                                else
                                        echo ""
                                        echo "Found $HITS matches in ${FILENAME}"
                                        display_hit
                                fi
                        fi
                fi
        done

        if [ $LOGTOFILE -eq 1 ]
        then
                echo ""
                echo "---------------------------------------------"
                echo "See search log file $SEARCHLOG"
        fi
}

#---------------------------------------------
# RUNTIME
#---------------------------------------------

enter_dir_func
enter_filename_string_func
enter_find_string_func
case_sensitive
display_hits
log_to_file
search_func

More features can be added to this script, allowing it, for example, to use wildcards, to control the depth of search, to launch multiple search threads, to look only for files owned by certain users and so on. For most purposes, however, keeping it simple is rarely a bad idea. Here is a sample run of our search script:

icebox:/var/adm/bin # ./find_text.ksh

Enter directory to search [/var/adm/bin]: /etc
Enter filename string:
Enter search string: icebox
Case sensitive search? [n]:
Display hits? [n]:  y
Log to file? [n]:
Performing search...
-----------------------------------------------

Found 1 match in /etc/samba/smb.conf
        workgroup = ICEBOX

Found 1 match in /etc/HOSTNAME
        icebox.jedi

Found 3 matches in /etc/hosts.YaST2save
        127.0.0.2       icebox.jedi icebox
        192.168.123.98  iceboxg iceboxg
        192.168.2.2     iceboxg iceboxg

Found 1 match in /etc/ushare.conf
        USHARE_NAME=ICEBOX

Found 1 match in /etc/postfix/main.cf
        myhostname = icebox.jedi

Found 3 matches in /etc/hosts
        127.0.0.2       icebox.jedi icebox
        192.168.123.98  iceboxg iceboxg
        192.168.2.2     iceboxg iceboxg

In the example above, we searched /etc and all subfolders for files containing the “icebox” string, which is the hostname of this particular system. A search like this may be needed when you want to change the hostname. This makes it easy to find any configuration files that need to be updated.

Replacing Text Strings

Whenever you use a script to modify files – especially configuration files – you must create backup copies of the originals. A small mistake in your script can wreck havoc on your system in a matter of seconds. In the example below, we will search for all *.conf files in /etc and change all occurences of IP address 128.225.1.10 to 128.225.1.12. A script like this may be useful for updating your server’s configuration in case of IP change.

#!/bin/ksh

search_string="128.225.1.10"
replace_string="128.225.1.12"
temp_file=/tmp/search-replace.tmp

find /etc -type f -name "*.conf" | while read file
do
   if [ `fgrep -c "$search_string" "$file"` -gt 0 ]
   then
      file_permissions=$(stat -c "%a" "$file")
      file_owner=$(ls -als "$file" | awk '{print $4}')
      file_group=$(ls -als "$file" | awk '{print $5}')

      cp -p "$file" "${file}_backup_`date +'%Y-%m-%d'`"

      sed "s/${search_string}/${replace_string}/g" "$file" > $temp_file
      mv $temp_file "$file"
      chown ${file_owner}:${file_group} "$file"
      chmod $file_permissions "$file"
   fi
done

As you can see, we took care to preserve ownership and permissions of the files we replaced. We also created backups of these files. One potential problem to keep in mind is the field separator used by the “sed” command. In this example we are using “/”. However, if either search or replacement strings contain a slash, you would need to add escape characters or – a perferred solution – to change the field separator from slash to something else, like “^” or “@”.

Replacing Text Strings in Databases

While we are on the subject of replacing text strings, let’s take a look at string replacement technique for MySQL databases. MySQL and Postgre databases are frequently used by cluster management software. While this is rarely recommended by the software vendors, sometimes you need to get into the database directly and edit things by hand. A day will eventually come when your need to find and replace a string of text in your database. You don’t know which row, or which column, or which table. Heck, you may not even know which database. Your options are: spend the rest of the summer hunting down the elusive table cells, or use the weapon of mass replacement described below. Naturally and as usual, you absolutely must back up your database (or databases) before attempting any far-reaching scripted mumbo jumbo.

#!/bin/bash
echo -n "Enter username: " ; read db_user
echo -n "Enter $db_user password: " ; stty -echo ; read db_passwd ; stty echo ; echo ""
echo -n "Enter database name: " ; read db_name
echo -n "Enter search string: " ; read search_string
echo -n "Enter replacement string: " ; read replacement_string
 
MYSQL="/usr/bin/mysql --skip-column-names -u${db_user} -p${db_passwd}"
 
echo "SHOW TABLES;" | $MYSQL $db_name | while read db_table
do
	echo "SHOW COLUMNS FROM $db_table;" | $MYSQL $db_name| 
	awk -F't' '{print $1}' | while read tbl_column
	do
		echo "update $db_table set ${tbl_column} = replace(${tbl_column}, '${search_string}', '${replacement_string}');" |
		$MYSQL $db_name
	done
done

The script will prompt you for username, password, database name, search string, and replacement string. If will then go through every column of every table in search of your text string. And it will replace it with the new string you specified, potentially saving your hours of work and dozens of typos.

Numeric File Permissions

Suppose you are working with a Web server and your task is to make sure that no files or directories have permissions “777″. It would be easy to just recursively change permissions for all files to something like 644, but this may cause unexpected problems. You only need to change those files and directories that have “777″ permission and leave everything else as it is. The script below will search for files and directories that have “777″ permissions and change files to 644 and directories to 755.

#!/bin/bash
 
find . -type f -exec sh -c '
	if [ `stat -c "%a" "{}"` -eq 777 ]
	then
		chmod 644 "{}"
	fi
' ;
 
find . -type d -exec sh -c '
	if [ `stat -c "%a" "{}"` -eq 777 ]
	then
		chmod 755 "{}"
	fi
' ;

Random Number Generator

Sometimes, in the course of writing shell script, a need arises for some random input. Using the built-in $RANDOM shell variable will give you a random number from 0 to 32767 . Let’s take a look at a few examples making use of this shell variable in a number of handy ways. These examples are presented using “while” loops to better illustrate the variable’s functionality. The basic usage is as follows:

echo $RANDOM

What if you need a random number no greater than 10? No problem:

echo "`expr $RANDOM % 11`"

How about a random number from 1 to 10 (no zeros)? Here you go:

echo "`expr $RANDOM % 10`+1"|bc -l

Same thing for a random number from 1 to 1000:

echo "`expr $RANDOM % 1000`+1"|bc -l

Now, what if you need a random from 1 to 100,000? As you know, the $RANDOM variable only goes to 32767, but there is a simple workaround: just stack two of them side by side (and, no, we will not be diving into a discussion of how random is “random” in this case):

echo "`expr ${RANDOM}${RANDOM} % 100000`+1"|bc -l

So what useful tasks can you perform using the random number generator? You can randomize various lists. For example, you have a long list of URLs and you want to download random ten links. Let’s say our list of URLs is /tmp/url_list.txt:

cat /tmp/url_list.txt

   https://www.krazyworks.com/?p=1
   About

   https://www.krazyworks.com/?p=3
   ...
   https://www.krazyworks.com/?p=1000

Now we randomize the list and grab ten random URLs:

cat /tmp/url_list.txt | while read url
   do
      urls_total=$(wc -l /tmp/url_list.txt | awk '{print $1}')
      random_number=$(echo "`expr $RANDOM % $urls_total`+1"|bc -l)
      echo "${random_number}^$url"
   done | sort -n | sed 's/[0-9]*^//' | head -10

   https://www.krazyworks.com/?p=343
   https://www.krazyworks.com/?p=790
   https://www.krazyworks.com/?p=910
   https://www.krazyworks.com/?p=327
   https://www.krazyworks.com/?p=639
   https://www.krazyworks.com/?p=959
   https://www.krazyworks.com/?p=971
   https://www.krazyworks.com/?p=75
   https://www.krazyworks.com/?p=283
   https://www.krazyworks.com/?p=496

Korn Shell Arrays

Supporting a cluster sometimes means you need to collect and analyze a large amount of diversified data. Depending on the amount of data, you can dump it into a text file or put it into a database. However, there is a faster alternative that does not generate a lot of disk I/O and does not require a database: the array function of the Korn shell. First, we will look at some basic examples of array operations.

Create simple arrays

set -A termnames gl35a t2000 s531 vt99 
# array elements are separated by blanks, TABs, or NEWLINEs

set -A arrayname $(< filename) 
# where "filename" is the file containing array values

typeset -A StateTax
StateTax[New Jersey]=0.06
print ${StateTax[New Jersey]}

Reading arrays

print ${#termnames[*]} 
#shows the number of elements in the array; “*” can be replaced by “@”

print ${termnames[*]} 
#shows all values

print ${termnames[0]} 
#shows the first value

for i in 0 3 4 #show values 0, 3, and 4
do
   print ${termnames[$i]}
done

print ${termnames[3]} is equivalent to print ${termnames[2+1]}

Sample scripts

Read a file into an array, one line at a time:

i=0
cat /var/log/messages | while read LINE
do
   msgarray[$i]=”${LINE}”
   (( i = i + 1 ))
done

Print values from the array:

i=0 #array element count begins with “0″
while [ $i -lt ${#termnames[*]} ]
do
   print ${termnames[$i]}
   (( i = i + 1 ))
done

Put the output of a command into an array:

set -A dt `date`

> print ${#dt[*]}

   6
> print ${dt[*]}

   Wed Jan 30 00:55:04 EST 2008

Practical applications of arrays

Let’s say there is a process on your Unix/Linux system that sometimes tends to consume all CPU resources and become unresponsive. At the same time, you do not want to terminate the process at the first sign of trouble, because momentary high CPU utilization may be legitimate. The solution is to continuously calculate the running average of CPU utilization. Korn shell array is a good tool for storing intermediate values and calculating the average. Below is a sample script that will terminate the monitored process (process.bin) if it exceeds CPU utilization limit during the specified period of time.

#!/bin/ksh
 
configure() {
        COMMAND="process.bin"              # Name of the process being monitored
        CPULIMIT=99                             # CPU threshold
        LOOP=10                                   # Monitor process every so many seconds
        INTERVAL_DURATION=1                # Take CPU utilization readings every so many seconds
        INTERVAL_COUNT="0 1 2 3 4"         # Calculate average based on this many readings
}
 
pid() {
        PID=99999999
        PID=$(ps -ef | grep $COMMAND | grep -v grep | awk '{print $2}' | sort | uniq | tail -1)
}
 
cpu() {
        CPU=0
        CPUAVG=0
        CPU=$(top -b -n 1 -p $PID | grep $COMMAND | awk '{print $9}')
 
        if [ $CPU -ge $CPULIMIT ]
        then
                for i in $INTERVAL_COUNT        # If CPU load exceeds $CPULIMIT, determine average CPU load
                do
                        array[$i]=$(top -b -n 1 -p $PID | grep $COMMAND | awk '{print $9}')
                        sleep $INTERVAL_DURATION
                done
 
                CPUAVG=$(echo "scale = 0 ; (${array[0]}+${array[1]}+${array[2]}+${array[3]}+${array[4]})/5" | bc -l)
        fi
}
 
terminate() {
        if [ $CPUAVG -ge $CPULIMIT ]
        then
                kill -9 $PID
                echo "Killed $PID at `date`"
        fi
}
 
# -------------------------------
# RUNTIME
# -------------------------------
configure               # Configure script parameters
 
i=1
while [ $i -eq 1 ]      # Run script in a loop every $LOOP seconds
do
        pid             # Aquire unique PID of the $COMMAND
        cpu             # Determine current CPU load for the $COMMAND
        terminate       # Kill $COMMAND if $CPULIMIT is exceeded
        sleep $LOOP
done

Passing MySQL Commands from Shell Script

Running MySQL commands from a shell script is a relatively simple task that has a lot of people baffled. Some say its too complicated and suggest using PHP or Perl, others claim doing so is a security risk (a favorite excuse of the ignorant), and some resort to using a shell script to write SQL commands to a text file that MySQL would use as input. Below is a much more simple and direct way of generating and running complex SQL queries directly from a shell script without temporary files and without any security issues.

Let’s start with the basic idea:

echo "SELECT * FROM table_name" | mysql -u -p db_name

Using this method, you can pass any shell variables to MySQL. Aha, some will say, you have to put your password in the shell script and that is definitely not secure. You don’t have to: you can have your script prompt you for a password:

#!/bin/bash
echo -n "Enter username: " ; read db_user
echo -n "Enter $db_user password: " ; stty -echo ; read db_passwd ; stty echo ; echo ""
echo "SHOW DATABASES ;" | mysql --skip-column-names -u$db_user -p$db_passwd
echo -n "Enter database name: " ; read db_name
echo "SHOW TABLES ;" | mysql --skip-column-names -u$db_user -p$db_passwd $db_name
echo -n "Enter table name: " ; read table_name
echo "SELECT * FROM $table_name ;" | mysql -t -u$db_user -p$db_passwd $db_name

The script above will prompt your for the username and password (password will not be visible as you type it). It will then show you the list of available database and prompt you to select one. The script will then show you all the tables in that database and ask you to specify the name of the table you want to use. Finally, the script will select everything from that table. Everything is very simple, secure, and straightforward.

Duplicating MySQL Databases

As we already mentioned, a number of cluster administration tools use databases – usually MySQL or Postgre – to store various configuration details. Before you make any major changes to your cluster’s configuration, you need to backup the database. You should always do a backup before upgrading cluster management software, adding or removing nodes, or installing new software.

The best way to backup a database is to create a fully-functional copy. This way you know for a fact that you have a functional copy and not just some SQL backup file that may or may not work. There are two ways to copy a MySQL database. One method is to use mysqldump | mysql construct. This does not always work and you may encounter some SQL errors. The more direct approach is to temporarily shut down the database and duplicate the entire database directory. The first script below uses the mysqldump method.

#!/bin/ksh

echo -n "Enter database username: " ; read DBUSER
echo -n "Enter $DBUSER password: " ; stty -echo ; read DBPASS ; stty echo ; echo ""
echo -n "Enter old database name: " ; read DBNAME
echo -n "Enter new database name: " ; read DBNEW

MYSQL="/usr/bin/mysql -u${DBUSER} -p${DBPASS}"
MYSQLDUMP="/usr/bin/mysqldump -u${DBUSER} -p${DBPASS}"
DBHOME="/var/lib/mysql"

echo "Copying database $DBNAME to $DBNEW"
echo "CREATE DATABASE $DBNEW;" | $MYSQL
echo "GRANT ALL PRIVILEGES ON ${DBNEW}.* to ${DBUSER}@'%' IDENTIFIED BY 
'$DBPASS' WITH GRANT OPTION ;" | $MYSQL

$MYSQLDUMP $DBNAME | $MYSQL $DBNEW

And here is a script that will do a “cold copy” of your database:

#!/bin/ksh

echo -n "Enter database username: " ; read DBUSER
echo -n "Enter $DBUSER password: " ; stty -echo ; read DBPASS ; stty echo ; echo ""
echo -n "Enter old database name: " ; read DBNAME
echo -n "Enter new database name: " ; read DBNEW

MYSQL="/usr/bin/mysql -u${DBUSER} -p${DBPASS}"
MYSQLDUMP="/usr/bin/mysqldump -u${DBUSER} -p${DBPASS}"
DBHOME="/var/lib/mysql"
RSYNC="/usr/local/bin/rsync -avu"

echo "Copying database $DBNAME to $DBNEW"
echo "CREATE DATABASE $DBNEW;" | $MYSQL
echo "GRANT ALL PRIVILEGES ON ${DBNEW}.* to ${DBUSER}@'%' IDENTIFIED BY 
'$DBPASS' WITH GRANT OPTION ;" | $MYSQL

/etc/init.d/mysql stop

$RSYNC "${DBHOME}/${DBNAME}/" "${DBHOME}/${DBNEW}/"

/etc/init.d/mysql start

The third part of this guide – System Automation: OS Installation & Configuration – will be published shortly. Stay tuned.