Atop Script with Scheduling and Logging
When something is going down on a server, the first thing most sysadmins will run is the venerable top
utility. This happens automatically: if you suspect the server is being sluggish, your fingers just type top
without you even thinking about it. Unfortunately, top
and many similar tools will only show you the current state of the system. So if a problem came and went before you even logged into the server, you’re out of luck.
It doesn’t help that most cenetralized system performance monitoring tools (OpenView, Solarwinds, Observium, Big Brother, etc), while collecting tons of historical performance data, do not monitor the systems on a per-process basis. And this can be very important when troubleshooting application issues. On the historical performance charts you can see that disk I/O was high and system load went through the roof, but the data about the misbehaving process is long gone.
The atop
utility has one killer feature: ability to write everything it sees to a compressed log file. You can later replay this log file, skip to the time index of interest and see exactly what you would have seen, if you were sitting at the console window at that exact moment. Below is a script I wrote to make this logging process a little easier to schedule and run when you want and for as long as you want.
- Never
kill -9
anatop
process. From within the utility useq
to exit. From console, usekill -15
orpkill atop
. Thepkill
by default uses-15
. - While
atop
creates a compressed log file, it can still get pretty big, so be mindful of available disk space. The rule of thumb is: every hour ofatop
logging will consume about 50MB of filesystem space at one-second sampling interval - The script below requires the
atd
service to be installed and active. On RHEL/CentOS 5/6:yum -y install at ; /sbin/chkconfig atd on; /sbin/service atd restart
. Some versions of CentOS/RHEL had a buggyatd
, so, even if you have it installed, it never hurts to update:yum -y update at ; /sbin/service atd restart
- You should run the script as
root
, so make sure/etc/at.allow
contains theroot
username and the/etc/at.deny
doesn’t.
The syntax is fairly simple:
atoplog -t "7:30am tomorrow" -d 480 -i 15 -w /var/log/atop_log
This will start
atop
at 7:30 tomorrow morning and will keep it going for eight hours, every 15 seconds writing to /var/log/atop_log
directory.
And here’s the script. Syntax and examples are included. You can download it here: atop_log. Uncompress and save it to, say, /var/adm/bin
and create a convenient link: ln -s /var/adm/bin/atop_log.sh /usr/bin/atoplog
#!/bin/bash # # | # ___/"\___ # __________/ o \__________ # (I) (G) \___/ (O) (R) # Igor Os # igor@comradegeneral.com # www.krazyworks.com # 2016-08-03 # ---------------------------------------------------------------------------- # Record atop output in the background for future analysis # ---------------------------------------------------------------------------- usage() { cat << EOF Syntax: --------------------- atoplog -d <duration_minutes> [-t "<time when to run>" Default: in a minute] [-i <interval_seconds> Default: 5] [-w <target_directory> Default: /var/log/atop] Example: --------------------- atoplog -t "2:30pm today" -d 30 -i 2 -w /var/tmp/atop EOF exit 1 } atop_check() { if [ ! -x /usr/bin/atop ] then echo "Can't find /usr/bin/atop. Exiting..." exit 1 fi if [ ! -x /usr/bin/timeout ] then echo "Can't find /usr/bin/timeout. Exiting..." exit 1 fi if [ $(ps -ef | egrep -c "[a]top\w[1-9].*log") -ne 0 ] then echo "Just FYI, there's another atop already running:" ps -ef | egrep "[a]top\w[1-9].*log" fi } while getopts ":d:t:i:w:" OPTION; do case "${OPTION}" in d) duration_minutes="${OPTARG}" ;; t) when_to_run="${OPTARG}" ;; i) interval_seconds="${OPTARG}" ;; w) logdir="${OPTARG}" ;; \? ) echo "Unknown option: -$OPTARG" >&2; usage;; : ) echo "Missing option argument for -$OPTARG" >&2; usage;; * ) echo "Unimplemented option: -$OPTARG" >&2; usage;; esac done configure() { if [ -z "${duration_minutes}" ] ; then usage ; fi if [ -z "${when_to_run}" ] ; then when_to_run="now" ; fi datetime="$(date -d "${when_to_run}" +'%Y-%m-%d_%H%M%S')" if [ -z "${interval_seconds}" ] ; then interval_seconds=5 ; fi if [ -z "${logdir}" ] ; then logdir="/var/log/atop" ; fi if [ ! -d "${logdir}" ] ; then mkdir -p "${logdir}" ; fi outfile="${logdir}/atop_${datetime}.log" if [ -f "${outfile}" ] ; then /bin/rm -f "${outfile}" ; fi (( duration_seconds = duration_minutes * 60 )) (( duration_samples = duration_seconds / interval_seconds )) } atop_do() { at ${when_to_run} <<<"atop ${interval_seconds} ${duration_samples} -w ${outfile}" echo "Running atop at $(atq 2>/dev/null | tail -1 | awk '{print $2,$3}') for ${duration_minutes} minutes at ${interval_seconds}-second intervals with output saved to ${outfile}" } atop_help() { cat << EOF You can read this file like so: atop -r ${outfile} -------------------------------------------------------------------------------------------------- | | | You access this file at any time: no need to wait for recording to finish. | | | | Here are some of the useful filtering options: | | | | t - Skip forward in time to next snapshot | | T - Skip back in time to previous snapshot | | P - Filter by process name regex | | U - Filter by username regex | | b - [hh:mm] - jump to specified timestamp | | r - skip back to start of file with current filter applied | | | | For more help, press "?" in atop | | | -------------------------------------------------------------------------------------------------- EOF } # RUNTIME atop_check configure atop_do atop_help