Networking

Unix and Linux network configuration. Multiple network interfaces. Bridged NICs. High-availability network configurations.

Applications

Reviews of latest Unix and Linux software. Helpful tips for application support admins. Automating application support.

Data

Disk partitioning, filesystems, directories, and files. Volume management, logical volumes, HA filesystems. Backups and disaster recovery.

Monitoring

Distributed server monitoring. Server performance and capacity planning. Monitoring applications, network status and user activity.

Commands & Shells

Cool Unix shell commands and options. Command-line tools and application. Things every Unix sysadmin needs to know.

Home » Commands & Shells, Featured

Coronavirus Stats in Bash

Submitted by on October 7, 2020 – 1:40 pm

The result of my morbid fascination with the coronavirus situation is this quick bash script that parses Johns Hopkins University coronavirus data to generate a quick report for the current date for the specified countries.

The plan is to add some statistical analysis to spot potential anomalies in the reported data. For now, just a simple summary for the current day.

The script is below. You can also download it from my GitHub repo here. Here’s an example of how to run it:

./covid19_stats_mk2.sh -c US -c Italy -c Spain -c China -c "United Kingdom"

COUNTRY         DATE        CONFIRMED  DEATHS  RECOVERED  ACTIVE  MORTALITY  RECOVERY
US              03-19-2020  13680      200     108        13372   1.4%       .7%
Italy           03-19-2020  41035      3405    4440       33190   8.2%       10.8%
Spain           03-19-2020  17963      830     1107       16026   4.6%       6.1%
China           03-19-2020  81156      3249    70535      7372    4.0%       86.9%
United Kingdom  03-19-2020  2716       138     67         2511    5.0%       2.4%

And this is the script:

An update

It would seem Johns Hopkins University Center for Systems Science and Engineering has issues with maintaining consistent format of their COVID-19 data files. For unknown reasons they rearranged the columns differently for data file from different dates. They also made other arbitrary changes, such as renamed ‘Country_Region’ column to ‘Country/Region’. Well, I hope that made someone very happy.

In any case, I made a couple of changes to my script to compensate for someone’s lack of experience handling data. From bash scripting standpoint you may find interesting the use of *_field variables that dynamically change to identify the correct data column based on the exact or approximate header name. So, as long JHU CSSE doesn’t rename “Deaths” to “Casualties” or “Confirmed” to “Verified”, we should be fine…

#!/bin/bash

while getopts ":c:" opt
do
    case ${opt} in
        c  ) countries+=("${OPTARG}") ;;
        \? ) echo "Unknown option: -$OPTARG" >&2; exit 1;;
:  ) echo "Missing option argument for -$OPTARG" >&2; exit 1;;
*  ) echo "Unimplemented option: -$OPTARG" >&2; exit 1;;
    esac
done
shift $((OPTIND -1))

url="https://github.com/CSSEGISandData/COVID-19/tree/master/csse_covid_19_data/csse_covid_19_daily_reports"
url_raw="https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_daily_reports"

if [ -z "${countries}" ]
then
    echo "You need to specify country code. Exiting..."
    exit 1010
fi

curl_get() {
    curl -s0 -k "${url_raw}/${e}.csv" 2>/dev/null | grep -vE "404: Not Found" > "${tmpfile}"
}

tmpfile="$(mktemp)"
e="$(date +'%m-%d-%Y')"
curl_get
if [ ! -s "${tmpfile}" ]
then
    e="$(date -d'-1 days' +'%m-%d-%Y')"
    curl_get
fi
if [ ! -s "${tmpfile}" ]
then
    echo "Unable to download CSV file. Exiting..."
    exit 1030
fi
if [ ! -s "${tmpfile}" ]
then
    e="$(date -d'-1 days' +'%m-%d-%Y')"
    curl_get
fi

for ((i = 0; i < ${#countries[@]}; i++))
do
    c="${countries[$i]}"
    c="$(echo ${c} | sed 's/^ //g')"
    country_field=$(awk -F, 'NR==1{for(i=1;i<=NF;i++)if($i~/Country.Region/)f[n++]=i}{for(i=0;i<n;i++)printf"%s%s",i?" ":"",f[i];print""}' "${tmpfile}" | sort -u | head -1)
    confirmed_field=$(awk -F, 'NR==1{for(i=1;i<=NF;i++)if($i~/Confirmed/)f[n++]=i}{for(i=0;i<n;i++)printf"%s%s",i?" ":"",f[i];print""}' "${tmpfile}" | sort -u | head -1)
    deaths_field=$(awk -F, 'NR==1{for(i=1;i<=NF;i++)if($i~/Deaths/)f[n++]=i}{for(i=0;i<n;i++)printf"%s%s",i?" ":"",f[i];print""}' "${tmpfile}" | sort -u | head -1)
    recovered_field=$(awk -F, 'NR==1{for(i=1;i<=NF;i++)if($i~/Recovered/)f[n++]=i}{for(i=0;i<n;i++)printf"%s%s",i?" ":"",f[i];print""}' "${tmpfile}" | sort -u | head -1)
    confirmed=$(awk -F, -v c="$c" -v field=$country_field '$field == c' "${tmpfile}" | awk -v field=$confirmed_field -F, '{s+=$field}END{print s}')
    deaths=$(awk -F, -v c="$c" -v field=$country_field '$field == c' "${tmpfile}" | awk -v field=$deaths_field -F, '{s+=$field}END{print s}')
    recovered=$(awk -F, -v c="$c" -v field=$country_field '$field == c' "${tmpfile}" | awk -v field=$recovered_field -F, '{s+=$field}END{print s}')
    death_pct="$(echo "scale=1;(${deaths}*100)/${confirmed}"|bc -l)"
    recovery_pct="$(echo "scale=1;(${recovered}*100)/${confirmed}"|bc -l)"
    active_cases="$(echo "scale=0;${confirmed}-(${deaths}+${recovered})"|bc -l)"
    echo "${c},${e},${confirmed},${deaths},${recovered},${active_cases},${death_pct}%,${recovery_pct}%"
done | (echo "COUNTRY,DATE,CONFIRMED,DEATHS,RECOVERED,ACTIVE,MORTALITY,RECOVERY" && cat) | column -s',' -t
/bin/rm -f "${tmpfile}" 2>/dev/null

 

Print Friendly, PDF & Email

Leave a Reply