OCRmyPDF: OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched (Free software)
OCRmyPDF is a free open-source command-line tool that adds an OCR text layer to scanned PDF files, allowing them to be searched or copy-pasted. It is already being used to scan and search millions of heavy PDF files.
Features
Its features include:
-
Generates a searchable PDF/A file from a regular PDF
-
Places OCR text accurately below the image to ease copy / paste
-
Keeps the exact resolution of the original embedded images
-
When possible, inserts OCR information as a "lossless" operation without disrupting any other content
-
Optimizes PDF images, often producing files smaller than the input file
-
If requested, deskews and/or cleans the image before performing OCR
-
Validates input and output files
-
Distributes work across all available CPU cores
-
Uses Tesseract OCR engine to recognize more than 100 languages
-
Keeps your private data private.
-
Scales properly to handle files with thousands of pages.
Platforms
macOS Windows and Linux
License
MPL-2.0 license
Tags
PDF,OCR,OCR PDF, PDF OCR, CLI, tools,office,Optical character recognition,tool,productivity