A couple of years ago I wrote an article about selecting specific time ranges from log files. I proposed two options: either convert all timestamps to epoch format (a CPU-intensive process) or rely on regex (limited to specific date/time periods). Here’s a followup with a couple more methods. Perhaps a bit more practical.
First, let talk about generating timestamps for time ranges. I always prefer examples to long-winded explanations, so here are a few.
Generate a daily range from '2018-10-02' to '2019-02-28'
TZ="$(date +"%Z")"; s="2018-10-02"; e="2019-02-28"; i=0
se=$(date -d "${s}" +'%s'); ee=$(date -d "${e}" +'%s')
while [ ${se} -lt ${ee} ]; do
[[ ${i} < 1 ]] && echo ${s} && (( i = i + 1 ))
s="$(date -d "${s} + 1 day" +'%Y-%m-%d')"; se=$(date -d "${s}" +'%s')
echo ${s}
done
# --------------------------------------------------------------
2018-10-02
2018-10-03
2018-10-04
...
2019-02-26
2019-02-27
2019-02-28
Generate an hourly range from 'Oct 29 23:00:00' to 'Nov 1 23:00:00'
TZ="$(date +"%Z")"; s="Oct 29 23:00:00"; e="Nov 1 22:59:59"; i=0
se=$(date -d "${s}" +'%s'); ee=$(date -d "${e}" +'%s')
sd="$(date -d "${s}")"; ed="$(date -d "${e}")"
while [ ${se} -lt ${ee} ]; do
[[ ${i} < 1 ]] && echo ${s} && (( i = i + 1 ))
s="$(date -d "${sd} + 1 hour" +'%b %e %H:%M:%S')"; sd="$(date -d "${s}")"; se=$(date -d "${s}" +'%s')
echo ${s}
done
# --------------------------------------------------------------
Oct 29 23:00:00
Oct 30 00:00:00
Oct 30 01:00:00
...
Nov 1 21:00:00
Nov 1 22:00:00
Nov 1 23:00:00
Generate a range from 'Apr 1 13:23:12' to 'Apr 1 13:41:02' at a 1-second interval
TZ="$(date +"%Z")"; s="Apr 1 13:23:12"; e="Apr 1 13:41:02"; i=0
se=$(date -d "${s}" +'%s'); ee=$(date -d "${e}" +'%s')
sd="$(date -d "${s}")"; ed="$(date -d "${e}")"
while [ ${se} -lt ${ee} ]; do
[[ ${i} < 1 ]] && echo ${s} && (( i = i + 1 ))
s="$(date -d "${sd} + 1 second" +'%b %e %H:%M:%S')"; sd="$(date -d "${s}")"; se=$(date -d "${s}" +'%s')
echo ${s}
done
# --------------------------------------------------------------
Apr 1 13:23:12
Apr 1 13:23:13
Apr 1 13:23:14
...
Apr 1 13:41:00
Apr 1 13:41:01
Apr 1 13:41:02
With that out of the way, lets take a look how to use these timestamps to select specific time ranges from log files.
Grep lines from syslog for the 'Oct 2 14:24' to 'Oct 2 14:54' period
TZ="$(date +"%Z")"; s="Oct 2 14:24:00"; e="Oct 2 14:54:59"; i=0; l="/var/log/messages"
se=$(date -d "${s}" +'%s'); ee=$(date -d "${e}" +'%s')
sd="$(date -d "${s}")"; ed="$(date -d "${e}")"
grep -E \
"$(while [ ${se} -lt ${ee} ]; do
[[ ${i} < 1 ]] && echo -en "^${s}:*|" && (( i = i + 1 ))
s="$(date -d "${sd} + 1 minute" +'%b %e %H:%M')"; sd="$(date -d "${s}")"; se=$(date -d "${s}" +'%s')
echo -en "^${s}:*|"
done | sed 's/|$//g')" "${l}"
Grep lines from syslog for the 'Oct 2 14:24:13' to 'Oct 2 14:32:13' period
Note: grep
can accept a limited number of arguments. You can find out the limit by running getconf ARG_MAX
. See the follow-up examples showing how to get around this limitation.
TZ="$(date +"%Z")"; s="Oct 2 14:24:13"; e="Oct 2 14:32:13"; i=0; l="/var/log/messages"
se=$(date -d "${s}" +'%s'); ee=$(date -d "${e}" +'%s')
sd="$(date -d "${s}")"; ed="$(date -d "${e}")"
grep -E \
"$(while [ ${se} -lt ${ee} ]; do
[[ ${i} < 1 ]] && echo -en "^${s}|" && (( i = i + 1 ))
s="$(date -d "${sd} + 1 second" +'%b %e %H:%M:%S')"; sd="$(date -d "${s}")"; se=$(date -d "${s}" +'%s')
echo -en "^${s}|"
done | sed 's/|$//g')" "${l}"
The example below is similar to the previous script. However, to avoid the grep: Argument list too long
error, the list of arguments is written to a temporary file.
TZ="$(date +"%Z")"; s="Oct 2 14:24:13"; e="Oct 2 14:32:13"; i=0
l="/var/log/messages"; t="$(mktemp)"
se=$(date -d "${s}" +'%s'); ee=$(date -d "${e}" +'%s')
sd="$(date -d "${s}")"; ed="$(date -d "${e}")"
while [ ${se} -lt ${ee} ]; do
[[ ${i} < 1 ]] && echo -en "^${s}|" && (( i = i + 1 ))
s="$(date -d "${sd} + 1 second" +'%b %e %H:%M:%S')"; sd="$(date -d "${s}")"; se=$(date -d "${s}" +'%s')
echo -en "^${s}|"
done | sed 's/|$//g' > "${t}"
grep -E -f "${t}" "${l}"
/bin/rm -f "${t}"
Another way to cut down on the size of the argument list is by using xargs
. This method also allows you to utilize multiple processor cores. The downside here is that results from each xargs
thread may not be in chronological order. In the following example we’re using sort
command to put all log lines back in order.
TZ="$(date +"%Z")"; s="Oct 2 14:24:13"; e="Oct 2 14:32:13"; i=0
l="/var/log/messages"; p=$(grep -c proc /proc/cpuinfo)
se=$(date -d "${s}" +'%s'); ee=$(date -d "${e}" +'%s')
sd="$(date -d "${s}")"; ed="$(date -d "${e}")"
while [ ${se} -lt ${ee} ]; do
[[ ${i} < 1 ]] && echo -e "^${s}|" && (( i = i + 1 ))
s="$(date -d "${sd} + 1 second" +'%b %e %H:%M:%S')"; sd="$(date -d "${s}")"; se=$(date -d "${s}" +'%s')
echo -e "^${s}|"
done | sed 's/|$//g' | xargs -n100 -P${p} -I% grep "%" "${l}" | sort -k1,1M -k2,2n