I came a cross an amazing Python code snippet that convert PDF e-books into an audiobook with a minimal code.

The code snippet uses two Python packages:

  1. PyPDF2: a free and open-source pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files. It can also add custom data, viewing options, and passwords to PDF files. PyPDF2 can retrieve text and metadata from PDFs as well.
  2. PyTTSx3 which is a text-to-speech conversion library in Python. Unlike alternative libraries, it works offline, and is compatible with both Python 2 and 3.

The code is pretty straightforward, and it demonstrates how simple and cool Python is.

First install the required packages

pip install PyPDF2
pip install pyttsx3

Now create your Python script file, and add:

import PyPDF2 
import pyttsx3 
# Read the pdf by specifying the path in your computer 
pdfReader = PyPDF2.PdfFileReader(open('clcoding.pdf', 'rb')) 
# Get the handle to speaker 
speaker = pyttsx3.init() 
# split the pages and read one by one 
for page_num in range(pdfReader.numPages):
   text = pdfReader.getPage(page_num). extractText() 
   speaker.say(text) #clcoding.com
   speaker.runAndWait() 
# stop the speaker after completion 
speaker.stop() 
# save the audiobook at specified path 
engine.save_to_file(text, 'E:\audio.mp3') 
engine.runAndWait()

I found a pretty close tutorial from 2020 that explains more, by Aman Kharwal.

Resources