Tabula OCR - Free Tool to Extract Tables from PDF Files for Windows and macOS

Tabula OCR - Free Tool to Extract Tables from PDF Files for Windows and macOS

Tabula is a free self-hosted lightweight tool that enables you to read and extract table data from PDF files easily.

Because it is written using Java, It works for Windows, Linux and macOS.

How to use Tabula?

  1. Upload a PDF file containing a data table.
  2. Browse to the page you want, then select the table by clicking and dragging to draw a box around the table.
  3. Click "Preview & Export Extracted Data". Tabula will try to extract the data and display a preview. Inspect the data to make sure it looks correct. If data is missing, you can go back to adjust your selection.
  4. Click the "Export" button.
  5. Now you can work with your data as text file or a spreadsheet rather than a PDF! (You can open the downloaded file in Microsoft Excel or the free LibreOffice Calc)

Install using Docker

You can also install it using Docker.

docker run \
	--name tabula \
	-p 5000:5000 \
	-d \
	turicas/tabula:1.2.1

Customize with Docker

docker run \
	--name tabula \
	-p 5001:5001 \
	-e PORT=5001 \
	-e JAVA_XMS="256M" \
	-e JAVA_XMX="1024M" \
	-d \
	turicas/tabula:1.2.1

License

Tabula is an open-source project that is released under the MIT License

Resources & Downloads

GitHub - tabulapdf/tabula: Tabula is a tool for liberating data tables trapped inside PDF files
Tabula is a tool for liberating data tables trapped inside PDF files - tabulapdf/tabula
Tabula: Extract Tables from PDFs
Tabula is a free tool for extracting data from PDF files into CSV and Excel files.







Open-source Apps

9,500+

Medical Apps

500+

Lists

450+

Dev. Resources

900+