pdftotext − Portable Document Format (PDF) to text converter (version 3.00) |
pdftotext [options] [PDF-file [text-file]] |
Pdftotext converts Portable Document Format (PDF) files to plain text. Pdftotext reads the PDF file, PDF-file, and writes a text file, text-file. If text-file is not specified, pdftotext converts file.pdf to file.txt. If text-file is ´-’, the text is sent to stdout. |
−f number |
Specifies the first page to convert. |
−l number |
Specifies the last page to convert. |
−r number |
Specifies the resolution, in DPI. The default is 72 DPI. |
−x number |
Specifies the x-coordinate of the crop area top left corner |
−y number |
Specifies the y-coordinate of the crop area top left corner |
−W number |
Specifies the width of crop area in pixels (default is 0) |
−H number |
Specifies the height of crop area in pixels (default is 0) |
−layout |
Maintain (as best as possible) the original physical layout of the text. The default is to ´undo’ physical layout (columns, hyphenation, etc.) and output the text in reading order. |
−raw |
Keep the text in content stream order. This is a hack which often "undoes" column formatting, etc. Use of raw mode is no longer recommended. |
−htmlmeta |
Generate a simple HTML file, including the meta information. This simply wraps the text in <pre> and </pre> and prepends the meta headers. |
−enc encoding-name |
Sets the encoding to use for text output. This defaults to "UTF-8". |
−listenc |
Lits the available encodings |
−eol unix | dos | mac |
Sets the end-of-line convention to use for text output. |
−nopgbrk |
Don’t insert page breaks (form feed characters) between pages. |
−opw password |
Specify the owner password for the PDF file. Providing this will bypass all security restrictions. |
−upw password |
Specify the user password for the PDF file. |
−q |
Don’t print any messages or errors. |
||
−v |
Print copyright and version information. |
||
−h |
Print usage information. (−help and −−help are equivalent.) |
Some PDF files contain fonts whose encodings have been mangled beyond recognition. There is no way (short of OCR) to extract text from these files. |
The Xpdf tools use the following exit codes: |
0 |
No error. |
|||
1 |
Error opening a PDF file. |
|||
2 |
Error opening an output file. |
|||
3 |
Error related to PDF permissions. |
|||
99 |
Other error. |
The pdftotext software and documentation are copyright 1996-2004 Glyph & Cog, LLC. |
pdftops(1), pdfinfo(1), pdffonts(1), pdftoppm(1), pdfimages(1), |