Tabula OCR - Free Tool to Extract Tables from PDF Files for Windows and macOS
Tabula is a free self-hosted lightweight tool that enables you to read and extract table data from PDF files easily.
Because it is written using Java, It works for Windows, Linux and macOS.
How to use Tabula?
- Upload a PDF file containing a data table.
- Browse to the page you want, then select the table by clicking and dragging to draw a box around the table.
- Click "Preview & Export Extracted Data". Tabula will try to extract the data and display a preview. Inspect the data to make sure it looks correct. If data is missing, you can go back to adjust your selection.
- Click the "Export" button.
- Now you can work with your data as text file or a spreadsheet rather than a PDF! (You can open the downloaded file in Microsoft Excel or the free LibreOffice Calc)
Install using Docker
You can also install it using Docker.
docker run \
--name tabula \
-p 5000:5000 \
-d \
turicas/tabula:1.2.1
Customize with Docker
docker run \
--name tabula \
-p 5001:5001 \
-e PORT=5001 \
-e JAVA_XMS="256M" \
-e JAVA_XMX="1024M" \
-d \
turicas/tabula:1.2.1
License
Tabula is an open-source project that is released under the MIT License