Watch the Log
In the past few days my Postfix server has been having occasional problems talking to the mail gateway. They problem would come and go. The Postfix server would timeout trying to connect to the gateway and keep retrying. In the end, the emails would be delivered, but not without some delays. Troubleshooting these sort of fleeting issues sometimes feels like trying to touch a mirage.
I needed something to watch the maillog and, upon detecting the tell-tale error message, launch tcpdump for a few minutes to see what exactly was going on with the network at the time of the problem, as opposed to the time when I decide to wake up. Here’s a fairly simple script that will monitor the maillog of a specific error message and run tcpdump the first time it finds a match. The script will then exit, but this can be easily modified to run continuously. Just mind the disk space need for tcpdump output.
Make the necessary adjustments, save the script as /var/adm/bin/maillog_tcpdump.sh (or whatever) and run like so:
nohup /var/adm/bin/maillog_tcpdump.sh &
Here’s the script. Don’t forget to change email addresses for notification and grep patterns.
#!/bin/bash configure() { notify_email="email1@domain.com,email2@domain.com" logdir="/var/log" logfilename="maillog" logfile="${logdir}/${logfilename}" if [ ! -r "${logfile}" ] then echo "ERROR: Log file ${logfile} not found. Exiting..." exit 1 fi outfile="/tmp/`hostname -s`.`date +'%s'`.pdump" if [ -f "${outfile}" ] then /bin/rm -f "${outfile}" fi } logmon() { OLDCOUNT=$(grep -Fwc 'suspended' "${logfile}") while : do COUNT=$(grep -Fw 'suspended' "${logfile}" | egrep -c "mxgateway01|mxgateway02") DIFF=$((COUNT-OLDCOUNT)) if [ $DIFF -gt 0 ] then echo "Running tcpdump" nohup tcpdump -w "${outfile}" -s 0 host mxgateway01 and port 25 or host mxgateway02 and port 25 & echo "Sleeping 10 minutes" sleep 600 echo "Killing tcpdump" killall tcpdump echo "Resetting count" OLDCOUNT=$(grep -Fwc 'suspended' "${logfile}") echo "Sending notification" echo "Check ${outfile} on `hostname -s`" | mailx -s "Tcpdump complete on `hostname -s`" ${notify_email} exit # remove this line to keep the script running indefinitely fi OLDCOUNT=$COUNT sleep 15 done < "${logfile}" } # RUNTIME configure logmon