Percollate: Converts any Webpage into PDF, EPUB or Markdown Files

Percollate: Converts any Webpage into PDF, EPUB or Markdown Files
Photo by Joseph Vaz / Unsplash

Percollate is a highly useful and versatile command-line tool that offers the capability to convert web pages into professionally formatted PDF, EPUB, HTML, or Markdown files. This service can be beneficial in a variety of scenarios.

Why you may need to convert Web Pages into Portable formats?

Firstly, if you come across an informative article, research paper, or any web content that you would like to save for offline access or future reference, Percollate allows you to transform it into a PDF or EPUB file. This way, you can conveniently access and read the content on your preferred device without the need for an internet connection.

Additionally, if you are working on a project that involves compiling information from multiple web pages, Percollate can be a valuable tool. It enables you to gather all the relevant web pages and convert them into a single, well-structured Markdown or HTML file.

This makes it easier to organize and manage the content, allowing for improved productivity and collaboration.

Moreover, Percollate's ability to generate beautifully formatted files adds a professional touch to your documents. Whether you need to share research findings, create visually appealing reports, or simply present web content in a more polished manner, this tool can help you achieve that.

In summary, the Percollate service is valuable for those who require the ability to convert web pages into aesthetically pleasing and well-structured PDF, EPUB, HTML, or Markdown files.

Whether you need offline access, content organization, or enhanced presentation capabilities, this tool can effectively meet your needs.

Install

The app is a command-line Node.js script that can be installed globally with:

npm install -g percollate

However, make sure you have Node.js installed.

If you are using Arch Linux, or Arch-based distros like Manjaro, you can install the AUR package, with tools like yay, or pacaur:

yay -S nodejs-percollate

How to use?

Run percollate --help for a list of available commands and options.

Percollate is invoked on one or more operands (usually URLs):

percollate <command> [options] url [url]...

The following commands are available:

  • percollate pdf produces a PDF file;
  • percollate epub produces an EPUB file;
  • percollate html produces a HTML file.
  • percollate md produces a Markdown file.

The operands can be URLs, paths to local files, or the - character which stands for stdin (the standard inputs).

Available options

Unless otherwise stated, these options apply to all three commands.

-o, --output

Specify the path of the resulting bundle relative to the current folder.

percollate pdf https://example.com -o my-example.pdf

-u, --url

Using the - operand you can read the HTML content from stdin, as fetched by a separate command, such as curl. In this sort of setup, percollate does not know the URL from which the content has been fetched, and relative paths on images, anchors, et cetera won't resolve correctly.

Use the --url option to supply the source's original URL.

curl https://example.com | percollate pdf - --url=https://example.com

-w, --wait

By default, percollate processes URLs in parallel. Use the --wait option to process them sequentially instead, with a pause between items. The delay is specified in seconds, and can be zero.

percollate epub --wait=1 url1 url2 url3

--individual

By default, percollate bundles all web pages in a single file. Use the --individual flag to export each source to a separate file.

percollate pdf --individual http://example.com/page1 http://example.com/page2

--template

Path to a custom HTML template. Applies to pdfhtml, and md.

--style

Path to a custom CSS stylesheet, relative to the current folder.

--css

Additional CSS styles you can pass from the command-line to override styles specified by the default/custom stylesheet.

--no-amp

Don't prefer the AMP version of the web page.

--debug

Print more detailed information.

-t, --title

Provide a title for the bundle.

percollate epub http://example.com/page-1 http://example.com/page-2 --title="Best Of Example"

-a, --author

Provide an author for the bundle.

percollate pdf --author="Ella Example" http://example.com

--cover

Generate a cover. The option is implicitly enabled when the --title option is provided, or when bundling more than one web page to a single file. Disable this implicit behavior by passing the --no-cover flag.

--toc

Generate a hyperlinked table of contents. The option is implicitly enabled when bundling more than one web page to a single file. Disable this implicit behavior by passing the --no-toc flag.

Applies to pdfhtml, and md.

--hyphenate

Hyphenation is enabled by default for pdf, and disabled for epubhtml, and md. You can opt into hyphenation with the --hyphenate flag, or disable it with the --no-hyphenate flag.

See also the Hyphenation and justification recipe.

--inline

Embed images inline with the document. Images are fetched and converted to Base64-encoded data URLs.

This option is particularly useful for html to produce self-contained HTML files.

--md.<option>=<value>

Pass options to the underlying Markdown stringifier, mdast-util-to-markdown. These are the default Markdown options:

const DEFAULT_MARKDOWN_OPTIONS = {
	fences: true,
	emphasis: '_',
	strong: '_',
	resourceLink: true,
	rule: '-'
};

License

  • MIT License

Resources

GitHub - danburzo/percollate: A command-line tool to turn web pages into readable PDF, EPUB, HTML, or Markdown docs.
A command-line tool to turn web pages into readable PDF, EPUB, HTML, or Markdown docs. - GitHub - danburzo/percollate: A command-line tool to turn web pages into readable PDF, EPUB, HTML, or Markdo…