Plotting Log Data
While I am on the subject of selecting time ranges from logs, a practical application presented itself. A lonely server in a far-away land with a habit of running out of memory could only be monitored via stunnel
and reverse SSH. The server crashed again today and, after bringing it back to life, I looked at whatever meager data the SSH monitor collected to see if there was anything useful.
The SSH monitoring script is simple: every fifteen minutes a connection is made to the server via reverse SSH tunnel and the uptime
command is executed. If connection cannot be established in a couple of minutes, an alert is dispatched. So the log file looks something like this:
12:00:08 up 2 days, 3:38, 1 user, load average: 2.81, 1.80, 2.80 12:15:09 up 2 days, 3:53, 1 user, load average: 1.15, 2.15, 2.78 12:30:10 up 2 days, 4:08, 1 user, load average: 6.30, 2.44, 2.42 12:45:07 up 2 days, 4:23, 1 user, load average: 2.38, 6.80, 5.01 13:00:06 up 2 days, 4:38, 1 user, load average: 2.57, 1.70, 2.84
I thought I saw a pattern in the system load numbers, but then it could’ve been just lack of sleep. Basically, I needed to plot the 15-minute average load and I had no fancy tools like Splunk. There are two problems: the log has no day – only hours and minutes – and the timestamp is in a timezone different from mine. So the solution was to select just those lines covering just the right number of 15-minute intervals since last midnight in the log’s local timezone.
(( d = $(date -d "1970-01-01 $(date +'%H:%MSGT')" +%s) / ( 15 * 60 ) )) tail -${d} /var/log/ssh_monitor.log | gnuplot -e "set terminal dumb 84 28; unset key; set style data labels; set xdata time; set xlabel 'Time'; set ylabel 'Load'; set autoscale; set timefmt '%H:%M'; plot '-' using 5:12:ytic(12) with lines"
And the result:
Load 6.77 ++*++-++-++-++-++-+++-++-++-++-++-++-+++-++-++-++-++-+++-++-++-++-++-++-++ + * + + + + + + + + + + | * | 6.09 ++ * * ++ 5.90 ++ * * * ++ | * * ** | 5.46 ++ * * * ** ++ | * * * * * * | | * ** * * * * | | * * * * * * * | | * * * * * * * | | * * * * * * * | | * * * * * * * | | * * * * * * * | | * * * * * * * | | * * * * * * * | 3.26 ++ * * * * * *** * * ++ 3.02 ++ *** * * * * ** * * * ++ 2.96 ++ ***** * *** * * * * *** ++ 2.65 ++ *** ** * * * * * * * * ++ 2.52 ++ * ** * * * * ** * ++ 2.20 ++ ** ** ** ** * ++ 1.94 ++ + + + ** + + * + + *+ + * ++ 1.83 ++-++-++-++-++-++-+++-++-++*++-++-++-+++-++-++-++-++-+++-++-++-++-++-++*++ 09:30:000:00:0010:30:001:00:001:30:0012:00:002:30:003:00:003:30:0014:00:00:30:00 Time
So I wasn’t imagining things: there was a pattern.
And now a bit of explanation. The following command tells you how many seconds have elapsed since midnight in the specified timezone (SGT
, in this case):
date -d "1970-01-01 $(date +'%H:%MSGT')" +%s
Then it’s a simple matter of dividing it into 15-minute intervals and then grabbing that many lines from the end of the log file. And the
gnuplot
bit is pretty standard.