Document Conversion with Unoconv
The other day I ran into the “Flexible Import/Export” article by Bruce Byfield in the March 2018 issue of Linux Pro Magazine and thought it could use some more detail. So here’s some more detail.
The unoconv
utility is a part of LibreOffice. All examples below were ran on RHEL 7.3. The first step is to get the latest version of LibreOffice. This is not necessary, but may save you some time and aggravation.
Remove any existing installations of LibreOffice and install the latest stable release from one of the project’s mirrors:
yum -y remove openoffice* libreoffice* cd && v="6.0.4" && wget http://ftp.utexas.edu/libreoffice/libreoffice/stable/${v}/rpm/x86_64/LibreOffice_${v}_Linux_x86-64_rpm.tar.gz tar xfz LibreOffice_${v}_Linux_x86-64_rpm.tar.gz cd LibreOffice_${v}*_Linux_x86-64_rpm/RPMS yum -y install *rpm
Now, download a more up-to-speed version of unoconv
and replace the one that came with your LibreOffice installation. Once again, this is not necessary, but is a good idea.
cd && git clone https://github.com/dagwieers/unoconv.git /bin/cp -pf unoconv/unoconv /usr/bin
Add a startup file for the unoconv
listener and add an appropriate selinux
rule, if your system is using selinux
.
cat << EOF > /etc/systemd/system/unoconv.service [Unit] Description=Unoconv listener for document conversions Documentation=https://github.com/dagwieers/unoconv After=network.target remote-fs.target nss-lookup.target [Service] Type=simple Environment="UNO_PATH=/usr/lib64/libreoffice/program" ExecStart=/usr/bin/unoconv --listener [Install] WantedBy=multi-user.target EOF systemctl enable unoconv.service systemctl start unoconv.service f=/etc/sysconfig/selinux if [ -f "${f}" ] && [ "$(grep -oP "(?<=^SELINUX=)[a-z]{1,}(?=$)" "${f}")" != "disabled" ]; then setsebool -P httpd_execmem on fi
Now with the installation out of the way, here come the examples.
# Convert DOCX to PDF i="Document Name"; unoconv -f pdf -o "./output/${i}.pdf" "${i}.docx" # Convert DOCX to password-protected PDF i="Document Name"; unoconv -f pdf -o "./output/${i}.pdf" -e EncryptFile=true -e DocumentOpenPassword=admin123 "${i}.docx" # Convert pages 2-3 of DOCX to PDF that cannot be printed unless permissions are unlocked using a password i="Document Name"; unoconv -f pdf -o "./output/${i}.pdf" -e EncryptFile=true -e Printing=0 -e RestrictPermissions=true -e PermissionPassword=admin123 -e PageRange=2-3 "${i}.docx" # Convert multiple Word documents in the current directory to PDF find . -maxdepth 1 -mindepth 1 -type f -regextype posix-extended -regex '^.*\.(docx|doc)$' | while read i; do unoconv -f pdf -o "./output/${i}" "${i}" 2>/dev/null; done # Convert DOCX to multiple JPG i="Document Name"; unoconv -f pdf -o "./output/${i}.pdf" "${i}.docx" j=$(strings < "./output/${i}.pdf" | sed -n 's|.*/Count -\{0,1\}\([0-9]\{1,\}\).*||p' | sort -rn | head -n 1) k=1 while [ ${k} -le ${j} ]; do unoconv -f pdf -o "./output/${i}_page_${k}.pdf" -e PageRange=${k}-${k} -e UseLosslessCompression=true "./output/${i}.pdf" unoconv -f jpg -o "./output/${i}_page_${k}.jpg" -e Quality=94 "./output/${i}_page_${k}.pdf" /bin/rm "./output/${i}_page_${k}.pdf" (( k = k + 1 )) done # Convert DOCX to multiple JPG of specified resolution and dimensions # Requires 'convert' utility: # yum -y install ImageMagick i="Document Name"; unoconv -f pdf -o "./output/${i}.pdf" "${i}.docx" j=$(strings < "./output/${i}.pdf" | sed -n 's|.*/Count -\{0,1\}\([0-9]\{1,\}\).*||p' | sort -rn | head -n 1) k=1 while [ ${k} -le ${j} ]; do unoconv -f pdf -o "./output/${i}_page_${k}.pdf" -e PageRange=${k}-${k} -e UseLosslessCompression=true "./output/${i}.pdf" convert -density 400 "./output/${i}_page_${k}.pdf" -resize 2000x1500 "./output/${i}_page_${k}.jpg" /bin/rm "./output/${i}_page_${k}.pdf" (( k = k + 1 )) done # Convert XLSX to CSV # Limitation: only the first sheet is converted i="Spreadsheet Name"; unoconv -f csv -d spreadsheet -o "./output/${i}.csv" "${i}.xlsx"
With soda software, you can find additional options for the unoconv
utility’s PDF import/export functionality here. There was some talk about adding a command-line option to unoconv
to allow the user to specify the sheet name or number during the conversion of a multi-sheet spreadsheet.
I don’t know if anything came out of this. I was not able to find a version of unoconv
with this capability. So not to leave this question unanswered, here’s how you can use xlsx2csv
tool to work with multi-sheet spreadsheets.
# Convert XLSX to CSV using xlsx2csv # https://github.com/dilshod/xlsx2csv # Install xlsx2csv cd && git clone https://github.com/dilshod/xlsx2csv.git && cd xlsx2csv && /bin/cp -p xlsx2csv.py /usr/bin/xlsx2csv # Convert sheets 1-10, remove empty or non-existent sheets for j in `seq 1 10`; do xlsx2csv -s ${j} ${i}.xlsx "./output/${i}_sheet_${j}.csv" 2>/dev/null; if [ $? -ne 0 ] || [ ! -s "./output/${i}_sheet_${j}.csv" ]; then /bin/rm -f "./output/${i}_sheet_${j}.csv"; fi; done # Convert all sheets. This will create a subfolder with CSV files named after every sheet xlsx2csv -a ${i}.xlsx "./output/${i}"