Shell Scripting for HPC Clusters, Part 2
This is the second installment of a multipart guide for beginner Unix sysadmins supporting HPC clusters. You can view the first part of the guide here.
Searching, Replacing, Comparing
Try to work with a large cluster and you will soon discover that much of your time is spent on searching for text strings in thousands of files across hundreds of nodes. Cluster nodes should be mostly identical, but there are always small configuration differences. And keeping track of these small differences is a big job. Taking some time to write a good search script will save you a lot of time and effort in the long run.
Searching for Text Strings
A text string search script needs to have three search parameters: path, filename, and text string. It is a good idea to limit the search to ASCII text files: searching binary files may take a long time and will stress your CPU. In the exmple below, the script prompts you to specify the path, the filename string, and the text string. The script then searches recursively for ASCII text files that match these criteria.
Everything the script does is separated into functions. The first function – enter_dir_func – prompts you to enter the name of the directory to be searched. It then verifies that the specified directory actually exists and is readable. Next function – enter_find_string_func – prompts you to enter the search string and so on. Breaking down the script into a series of functions greatly simplifies the task of writing complex scripts.
#!/bin/ksh #--------------------------------------------- # FUNCTIONS #--------------------------------------------- enter_dir_func() { SDIR= CDIR=`pwd` echo "" echo -n "Enter directory to search [$CDIR]: " read SDIR if [ ! -n "$SDIR" ] then SDIR="$CDIR" else if [ ! -d "$SDIR" ] then echo "Directory $SDIR does not exist!" exit 1 elif [ ! -r "$SDIR" ] then echo "Directory $SDIR is not readable!" exit 1 fi fi } #--------------------------------------------- enter_find_string_func() { FINDSTR= echo -n "Enter search string: " read FINDSTR if [ ! -n "$FINDSTR" ] then echo "Search string cannot be null!" exit 1 fi } #--------------------------------------------- enter_filename_string_func() { FILESTR= echo -n "Enter filename string: " read FILESTR } #--------------------------------------------- case_sensitive() { echo -n "Case sensitive search? [n]: " read CASESENCE case "$CASESENCE" in y*|Y*) CASESENCE=1 ;; n*|N*) CASESENCE=0 ;; *) CASESENCE=0 ;; esac } #--------------------------------------------- display_hits() { echo -n "Display hits? [n]: " read DISPLAYHITS case "$DISPLAYHITS" in y*|Y*) DISPLAYHITS=1 ;; n*|N*) DISPLAYHITS=0 ;; *) DISPLAYHITS=0 ;; esac } #--------------------------------------------- log_to_file() { echo -n "Log to file? [n]: " read LOGTOFILE case "$LOGTOFILE" in y*|Y*) LOGTOFILE=1 ;; n*|N*) LOGTOFILE=0 ;; *) LOGTOFILE=0 ;; esac } #--------------------------------------------- display_hit() { if [ $DISPLAYHITS -eq 1 ] then if [ $LOGTOFILE -eq 1 ] then SEARCHLOG=/tmp/searchlog_`date +'%Y-%m-%d_%H-%M'`.txt if [ $HITS -eq 1 ] then echo "Found $HITS match in ${FILENAME}" >> $SEARCHLOG else echo "Found $HITS matches in ${FILENAME}" >> $SEARCHLOG fi if [ $CASESENCE -eq 1 ] then grep "$FINDSTR" "$FILENAME" else grep -i "$FINDSTR" "$FILENAME" fi | while read HITSTR do echo -n " ${HITSTR}" | cut -d " " -f1-10 echo -n " ${HITSTR}" | cut -d " " -f1-10 >> $SEARCHLOG done else if [ $CASESENCE -eq 1 ] then grep "$FINDSTR" "$FILENAME" else grep -i "$FINDSTR" "$FILENAME" fi | while read HITSTR do echo -n " ${HITSTR}" | cut -d " " -f1-10 done fi else if [ $LOGTOFILE -eq 1 ] then SEARCHLOG=/tmp/searchlog_`date +'%Y-%m-%d_%H-%M'`.txt if [ $HITS -eq 1 ] then echo "Found $HITS match in ${FILENAME}" >> $SEARCHLOG else echo "Found $HITS matches in ${FILENAME}" >> $SEARCHLOG fi if [ $CASESENCE -eq 1 ] then grep "$FINDSTR" "$FILENAME" else grep -i "$FINDSTR" "$FILENAME" fi | while read HITSTR do echo -n " ${HITSTR}" | cut -d " " -f1-10 >> $SEARCHLOG done fi fi } #--------------------------------------------- search_func() { echo "Performing search..." echo "-----------------------------------------------" find "$SDIR" -type f -name "*${FILESTR}*" | while read FILENAME do if [ `file "$FILENAME" | awk '{print $NF}'` == "text" ] then HITS=0 if [ $CASESENCE -eq 1 ] then HITS=`grep -c "$FINDSTR" "$FILENAME"` else HITS=`grep -i -c "$FINDSTR" "$FILENAME"` fi if [ $HITS -gt 0 ] then if [ $HITS -eq 1 ] then echo "" echo "Found $HITS match in ${FILENAME}" display_hit else echo "" echo "Found $HITS matches in ${FILENAME}" display_hit fi fi fi done if [ $LOGTOFILE -eq 1 ] then echo "" echo "---------------------------------------------" echo "See search log file $SEARCHLOG" fi } #--------------------------------------------- # RUNTIME #--------------------------------------------- enter_dir_func enter_filename_string_func enter_find_string_func case_sensitive display_hits log_to_file search_func
More features can be added to this script, allowing it, for example, to use wildcards, to control the depth of search, to launch multiple search threads, to look only for files owned by certain users and so on. For most purposes, however, keeping it simple is rarely a bad idea. Here is a sample run of our search script:
icebox:/var/adm/bin # ./find_text.ksh Enter directory to search [/var/adm/bin]: /etc Enter filename string: Enter search string: icebox Case sensitive search? [n]: Display hits? [n]: y Log to file? [n]: Performing search... ----------------------------------------------- Found 1 match in /etc/samba/smb.conf workgroup = ICEBOX Found 1 match in /etc/HOSTNAME icebox.jedi Found 3 matches in /etc/hosts.YaST2save 127.0.0.2 icebox.jedi icebox 192.168.123.98 iceboxg iceboxg 192.168.2.2 iceboxg iceboxg Found 1 match in /etc/ushare.conf USHARE_NAME=ICEBOX Found 1 match in /etc/postfix/main.cf myhostname = icebox.jedi Found 3 matches in /etc/hosts 127.0.0.2 icebox.jedi icebox 192.168.123.98 iceboxg iceboxg 192.168.2.2 iceboxg iceboxg
In the example above, we searched /etc and all subfolders for files containing the “icebox” string, which is the hostname of this particular system. A search like this may be needed when you want to change the hostname. This makes it easy to find any configuration files that need to be updated.
Replacing Text Strings
Whenever you use a script to modify files – especially configuration files – you must create backup copies of the originals. A small mistake in your script can wreck havoc on your system in a matter of seconds. In the example below, we will search for all *.conf files in /etc and change all occurences of IP address 128.225.1.10 to 128.225.1.12. A script like this may be useful for updating your server’s configuration in case of IP change.
#!/bin/ksh search_string="128.225.1.10" replace_string="128.225.1.12" temp_file=/tmp/search-replace.tmp find /etc -type f -name "*.conf" | while read file do if [ `fgrep -c "$search_string" "$file"` -gt 0 ] then file_permissions=$(stat -c "%a" "$file") file_owner=$(ls -als "$file" | awk '{print $4}') file_group=$(ls -als "$file" | awk '{print $5}') cp -p "$file" "${file}_backup_`date +'%Y-%m-%d'`" sed "s/${search_string}/${replace_string}/g" "$file" > $temp_file mv $temp_file "$file" chown ${file_owner}:${file_group} "$file" chmod $file_permissions "$file" fi done
As you can see, we took care to preserve ownership and permissions of the files we replaced. We also created backups of these files. One potential problem to keep in mind is the field separator used by the “sed” command. In this example we are using “/”. However, if either search or replacement strings contain a slash, you would need to add escape characters or – a perferred solution – to change the field separator from slash to something else, like “^” or “@”.
Replacing Text Strings in Databases
While we are on the subject of replacing text strings, let’s take a look at string replacement technique for MySQL databases. MySQL and Postgre databases are frequently used by cluster management software. While this is rarely recommended by the software vendors, sometimes you need to get into the database directly and edit things by hand. A day will eventually come when your need to find and replace a string of text in your database. You don’t know which row, or which column, or which table. Heck, you may not even know which database. Your options are: spend the rest of the summer hunting down the elusive table cells, or use the weapon of mass replacement described below. Naturally and as usual, you absolutely must back up your database (or databases) before attempting any far-reaching scripted mumbo jumbo.
#!/bin/bash echo -n "Enter username: " ; read db_user echo -n "Enter $db_user password: " ; stty -echo ; read db_passwd ; stty echo ; echo "" echo -n "Enter database name: " ; read db_name echo -n "Enter search string: " ; read search_string echo -n "Enter replacement string: " ; read replacement_string MYSQL="/usr/bin/mysql --skip-column-names -u${db_user} -p${db_passwd}" echo "SHOW TABLES;" | $MYSQL $db_name | while read db_table do echo "SHOW COLUMNS FROM $db_table;" | $MYSQL $db_name| awk -F't' '{print $1}' | while read tbl_column do echo "update $db_table set ${tbl_column} = replace(${tbl_column}, '${search_string}', '${replacement_string}');" | $MYSQL $db_name done done
The script will prompt you for username, password, database name, search string, and replacement string. If will then go through every column of every table in search of your text string. And it will replace it with the new string you specified, potentially saving your hours of work and dozens of typos.
Numeric File Permissions
Suppose you are working with a Web server and your task is to make sure that no files or directories have permissions “777″. It would be easy to just recursively change permissions for all files to something like 644, but this may cause unexpected problems. You only need to change those files and directories that have “777″ permission and leave everything else as it is. The script below will search for files and directories that have “777″ permissions and change files to 644 and directories to 755.
#!/bin/bash find . -type f -exec sh -c ' if [ `stat -c "%a" "{}"` -eq 777 ] then chmod 644 "{}" fi ' ; find . -type d -exec sh -c ' if [ `stat -c "%a" "{}"` -eq 777 ] then chmod 755 "{}" fi ' ;
Random Number Generator
Sometimes, in the course of writing shell script, a need arises for some random input. Using the built-in $RANDOM shell variable will give you a random number from 0 to 32767 . Let’s take a look at a few examples making use of this shell variable in a number of handy ways. These examples are presented using “while” loops to better illustrate the variable’s functionality. The basic usage is as follows:
echo $RANDOM
What if you need a random number no greater than 10? No problem:
echo "`expr $RANDOM % 11`"
How about a random number from 1 to 10 (no zeros)? Here you go:
echo "`expr $RANDOM % 10`+1"|bc -l
Same thing for a random number from 1 to 1000:
echo "`expr $RANDOM % 1000`+1"|bc -l
Now, what if you need a random from 1 to 100,000? As you know, the $RANDOM variable only goes to 32767, but there is a simple workaround: just stack two of them side by side (and, no, we will not be diving into a discussion of how random is “random” in this case):
echo "`expr ${RANDOM}${RANDOM} % 100000`+1"|bc -l
So what useful tasks can you perform using the random number generator? You can randomize various lists. For example, you have a long list of URLs and you want to download random ten links. Let’s say our list of URLs is /tmp/url_list.txt:
cat /tmp/url_list.txt https://www.krazyworks.com/?p=1Abouthttps://www.krazyworks.com/?p=3 ... https://www.krazyworks.com/?p=1000
Now we randomize the list and grab ten random URLs:
cat /tmp/url_list.txt | while read url do urls_total=$(wc -l /tmp/url_list.txt | awk '{print $1}') random_number=$(echo "`expr $RANDOM % $urls_total`+1"|bc -l) echo "${random_number}^$url" done | sort -n | sed 's/[0-9]*^//' | head -10 https://www.krazyworks.com/?p=343 https://www.krazyworks.com/?p=790 https://www.krazyworks.com/?p=910 https://www.krazyworks.com/?p=327 https://www.krazyworks.com/?p=639 https://www.krazyworks.com/?p=959 https://www.krazyworks.com/?p=971 https://www.krazyworks.com/?p=75 https://www.krazyworks.com/?p=283 https://www.krazyworks.com/?p=496
Korn Shell Arrays
Supporting a cluster sometimes means you need to collect and analyze a large amount of diversified data. Depending on the amount of data, you can dump it into a text file or put it into a database. However, there is a faster alternative that does not generate a lot of disk I/O and does not require a database: the array function of the Korn shell. First, we will look at some basic examples of array operations.
Create simple arrays
set -A termnames gl35a t2000 s531 vt99 # array elements are separated by blanks, TABs, or NEWLINEs set -A arrayname $(< filename) # where "filename" is the file containing array values typeset -A StateTax StateTax[New Jersey]=0.06 print ${StateTax[New Jersey]}
Reading arrays
print ${#termnames[*]} #shows the number of elements in the array; “*” can be replaced by “@” print ${termnames[*]} #shows all values print ${termnames[0]} #shows the first value for i in 0 3 4 #show values 0, 3, and 4 do print ${termnames[$i]} done print ${termnames[3]} is equivalent to print ${termnames[2+1]}
Sample scripts
Read a file into an array, one line at a time:
i=0 cat /var/log/messages | while read LINE do msgarray[$i]=”${LINE}” (( i = i + 1 )) done
Print values from the array:
i=0 #array element count begins with “0″ while [ $i -lt ${#termnames[*]} ] do print ${termnames[$i]} (( i = i + 1 )) done
Put the output of a command into an array:
set -A dt `date` > print ${#dt[*]} 6 > print ${dt[*]} Wed Jan 30 00:55:04 EST 2008
Practical applications of arrays
Let’s say there is a process on your Unix/Linux system that sometimes tends to consume all CPU resources and become unresponsive. At the same time, you do not want to terminate the process at the first sign of trouble, because momentary high CPU utilization may be legitimate. The solution is to continuously calculate the running average of CPU utilization. Korn shell array is a good tool for storing intermediate values and calculating the average. Below is a sample script that will terminate the monitored process (process.bin) if it exceeds CPU utilization limit during the specified period of time.
#!/bin/ksh configure() { COMMAND="process.bin" # Name of the process being monitored CPULIMIT=99 # CPU threshold LOOP=10 # Monitor process every so many seconds INTERVAL_DURATION=1 # Take CPU utilization readings every so many seconds INTERVAL_COUNT="0 1 2 3 4" # Calculate average based on this many readings } pid() { PID=99999999 PID=$(ps -ef | grep $COMMAND | grep -v grep | awk '{print $2}' | sort | uniq | tail -1) } cpu() { CPU=0 CPUAVG=0 CPU=$(top -b -n 1 -p $PID | grep $COMMAND | awk '{print $9}') if [ $CPU -ge $CPULIMIT ] then for i in $INTERVAL_COUNT # If CPU load exceeds $CPULIMIT, determine average CPU load do array[$i]=$(top -b -n 1 -p $PID | grep $COMMAND | awk '{print $9}') sleep $INTERVAL_DURATION done CPUAVG=$(echo "scale = 0 ; (${array[0]}+${array[1]}+${array[2]}+${array[3]}+${array[4]})/5" | bc -l) fi } terminate() { if [ $CPUAVG -ge $CPULIMIT ] then kill -9 $PID echo "Killed $PID at `date`" fi } # ------------------------------- # RUNTIME # ------------------------------- configure # Configure script parameters i=1 while [ $i -eq 1 ] # Run script in a loop every $LOOP seconds do pid # Aquire unique PID of the $COMMAND cpu # Determine current CPU load for the $COMMAND terminate # Kill $COMMAND if $CPULIMIT is exceeded sleep $LOOP done
Passing MySQL Commands from Shell Script
Running MySQL commands from a shell script is a relatively simple task that has a lot of people baffled. Some say its too complicated and suggest using PHP or Perl, others claim doing so is a security risk (a favorite excuse of the ignorant), and some resort to using a shell script to write SQL commands to a text file that MySQL would use as input. Below is a much more simple and direct way of generating and running complex SQL queries directly from a shell script without temporary files and without any security issues.
Let’s start with the basic idea:
echo "SELECT * FROM table_name" | mysql -u -p db_name
Using this method, you can pass any shell variables to MySQL. Aha, some will say, you have to put your password in the shell script and that is definitely not secure. You don’t have to: you can have your script prompt you for a password:
#!/bin/bash echo -n "Enter username: " ; read db_user echo -n "Enter $db_user password: " ; stty -echo ; read db_passwd ; stty echo ; echo "" echo "SHOW DATABASES ;" | mysql --skip-column-names -u$db_user -p$db_passwd echo -n "Enter database name: " ; read db_name echo "SHOW TABLES ;" | mysql --skip-column-names -u$db_user -p$db_passwd $db_name echo -n "Enter table name: " ; read table_name echo "SELECT * FROM $table_name ;" | mysql -t -u$db_user -p$db_passwd $db_name
The script above will prompt your for the username and password (password will not be visible as you type it). It will then show you the list of available database and prompt you to select one. The script will then show you all the tables in that database and ask you to specify the name of the table you want to use. Finally, the script will select everything from that table. Everything is very simple, secure, and straightforward.
Duplicating MySQL Databases
As we already mentioned, a number of cluster administration tools use databases – usually MySQL or Postgre – to store various configuration details. Before you make any major changes to your cluster’s configuration, you need to backup the database. You should always do a backup before upgrading cluster management software, adding or removing nodes, or installing new software.
The best way to backup a database is to create a fully-functional copy. This way you know for a fact that you have a functional copy and not just some SQL backup file that may or may not work. There are two ways to copy a MySQL database. One method is to use mysqldump | mysql construct. This does not always work and you may encounter some SQL errors. The more direct approach is to temporarily shut down the database and duplicate the entire database directory. The first script below uses the mysqldump method.
#!/bin/ksh echo -n "Enter database username: " ; read DBUSER echo -n "Enter $DBUSER password: " ; stty -echo ; read DBPASS ; stty echo ; echo "" echo -n "Enter old database name: " ; read DBNAME echo -n "Enter new database name: " ; read DBNEW MYSQL="/usr/bin/mysql -u${DBUSER} -p${DBPASS}" MYSQLDUMP="/usr/bin/mysqldump -u${DBUSER} -p${DBPASS}" DBHOME="/var/lib/mysql" echo "Copying database $DBNAME to $DBNEW" echo "CREATE DATABASE $DBNEW;" | $MYSQL echo "GRANT ALL PRIVILEGES ON ${DBNEW}.* to ${DBUSER}@'%' IDENTIFIED BY '$DBPASS' WITH GRANT OPTION ;" | $MYSQL $MYSQLDUMP $DBNAME | $MYSQL $DBNEW
And here is a script that will do a “cold copy” of your database:
#!/bin/ksh echo -n "Enter database username: " ; read DBUSER echo -n "Enter $DBUSER password: " ; stty -echo ; read DBPASS ; stty echo ; echo "" echo -n "Enter old database name: " ; read DBNAME echo -n "Enter new database name: " ; read DBNEW MYSQL="/usr/bin/mysql -u${DBUSER} -p${DBPASS}" MYSQLDUMP="/usr/bin/mysqldump -u${DBUSER} -p${DBPASS}" DBHOME="/var/lib/mysql" RSYNC="/usr/local/bin/rsync -avu" echo "Copying database $DBNAME to $DBNEW" echo "CREATE DATABASE $DBNEW;" | $MYSQL echo "GRANT ALL PRIVILEGES ON ${DBNEW}.* to ${DBUSER}@'%' IDENTIFIED BY '$DBPASS' WITH GRANT OPTION ;" | $MYSQL /etc/init.d/mysql stop $RSYNC "${DBHOME}/${DBNAME}/" "${DBHOME}/${DBNEW}/" /etc/init.d/mysql start
The third part of this guide – System Automation: OS Installation & Configuration – will be published shortly. Stay tuned.