Multithreaded Encryption and Compression
One problem with encryption is it’s a slow and resource-intensive process. While most encryption software lacks multithreading support, it is possible to use the GNU Parallel to take full advantage of modern multi-core CPUs to greatly speedup encryption and decryption. Below are a few examples showing how to use parallel with gpg. I also included some examples of using tar and pigz (multithreaded gzip).
I suggest you create a directory with a bunch of small files and use it as your sandbox to test encryption/decryption before you move on to the real data. Here is a script that will generate a dummy folder structure with a few thousand small, randomly-generate files. You can read more about this script here.
Note: for the sake of simplicity I am using gpg password encryption. For better security, in real life you should be using gpg with encryption key.Install tools:
yum -y install pigz gpg parallel
To install parallel from source:
cd /tmp ; wget http://ftp.gnu.org/gnu/parallel/parallel-latest.tar.bz2 ; bzip2 -d parallel-latest.tar.bz2 ; tar xvf parallel-latest.tar ; cd parallel-[0-9]* ; ./configure ; make ; make install
On Solaris 10/11:
# Install OpenCSW pkgadd -d http://get.opencsw.org/now /opt/csw/bin/pkgutil -U /opt/csw/bin/pkgutil -a vim /opt/csw/bin/pkgutil -y -i vim # Install tools /opt/csw/bin/pkgutil -i pigz /opt/csw/bin/pkgutil -i parallel /opt/csw/bin/pkgutil -i gnupg
Set dir and number of cores:
dir=test ; cores=`grep -c '^processor' /proc/cpuinfo`
On Solaris 10/11:
/usr/bin/kstat -m cpu_info | egrep "chip_id|core_id|module: cpu_info" | grep 'module: cpu_info' | awk '{ print $4 }' | sort -u | wc -l | tr -d ' '
Encrypt all files in a directory:
find "${dir}" -type f -not -iname "*.gpg" | sort | parallel --gnu -j $${cores} --workdir "$PWD" 'echo "Encrypting {}" ; gpg --batch --symmetric --passphrase MySecretPassword1 -z 2 "{}" >/dev/null 2>&1'
Delete original non-encrypted files:
find "${dir}" -type f -not -iname "*.gpg" -exec /bin/rm -f "{}" \;
Decrypt all encrypted files in a directory:
find "${dir}" -type f -iname "*.gpg" | sort | parallel --gnu -j $${cores} 'echo "Decrypting {}" ; gpg --batch --decrypt --passphrase MySecretPassword1 --output "{.}" "{}"'
Delete original encrypted file:
find "${dir}" -type f -iname "*.gpg" -exec /bin/rm -f "{}" \;
Create compressed tarball with pigz:
tar cf - "${dir}" | pigz -9 -p $${cores} > "${dir}.tar.gz"
Create compressed encrypted tarball with pigs and gpg:
tar c "${dir}" | pigz -9 -p 30 | gpg --batch --symmetric --passphrase MySecretPassword1 > "${dir}.tar.gz.pgp"
Decrypt compressed tarball:
gpg --batch --decrypt --passphrase MySecretPassword1 -o ${dir}.tar.gz ${dir}.tar.gz.pgp
5 Comments »
4 Pingbacks »
-
[…] wrote about GNU Parallel in the past and mentioned its usefulness when working with a large number of […]
-
[…] some time now I’ve been playing with GNU Parallel and it is the perfect choice for the job. Without further ado, here’s a quick practical […]
-
[…] may get even better time using parallel and passing multiple arguments per line (option -N). The advantage of parallel over xargs in this […]
-
[…] GNU Parallel […]
GNU Parallel defaults to -j number_of_cores, so you do no need that. Also you do not need ” around {}.
GNU parallel defaults to 9 threads if -j is not specified. See here: http://www.admin-magazine.com/HPC/Articles/GNU-Parallel-Multicore-at-the-Command-Line-with-GNU-Parallel
An alternative is to specify “-j +0”, allowing parallel to automatically determine the number of core. However, this doesn’t always work on Linux and never works on SPARC Solaris.
It is a good practice to enclose variables in double quotes, thus saving yourself a lot of trouble when the variable’s value happens to contain a space or a special character.
Being the author of GNU Parallel I see myself as a more authoritative source than admin-magazine. GNU Parallel _used_ to default to 9 processes. It was changed in 20110122 after a user vote.
While it is good practice to enclose VARIABLES in double quotes, that is not the case with {} (which is not a variable but a replacement string).
GNU Parallel deals with the spacing, so putting ” around is always unneeded and can cause harm in some situations. Here for example: parallel echo ‘a “{}”‘ ::: ‘a b’
Point taken on the double-quotes. Thanks.
I noticed on SPARC T4-2 parallel launches 16 threads with or without the –use-cpus-instead-of-cores. I would have expected 128 – the number of vcores.
Core detection is not tested very well on SPARC: I do not have access to a T4-2, so I rely on users to provide improvements. Feel free to do so.