Tech and travel

Processing PDFs with Python, the easy way

2007-03-14

Sometimes you get some PDF files, from which you have to extract some data. A handy way of doing this is using the pdftotext utility. Here I’ll show you how to use it, using Python.

The easiest way to do this, is to put the pdftotext program in the directory where your script is or in the path. You can then call it on your file. This is the Python code :

import subprocess
subprocess.call(["pdftotext.exe", "summary.pdf"])

This will create a file called summary.txt in the same directory, which can then be processed using standard Python functionality.

Copyright (c) 2024 Michel Hollands