Docspell is a personal document organizer. Or sometimes called a "Document Management System" (DMS). You'll need a scanner to convert your papers into files. Docspell can then assist in organizing the resulting mess πŸ˜‰. It can unify your files from scanners, emails and other sources. It is targeted for home use, i.e. families, households and also for smaller groups/companies.

You can associate tags, set correspondents and lots of other predefined and custom metadata. If your documents are associated with such metadata, you can quickly find them later using the search feature. But adding this manually is a tedious task.

Docspell can help by suggesting correspondents, guessing tags or finding dates using machine learning. It can learn metadata from existing documents and find things using NLP. This makes adding metadata to your documents a lot easier. For machine learning, it relies on the free (GPL) Stanford Core NLP library.

Docspell also runs OCR (if needed) on your documents, can provide full text search and has great e-mail integration. Everything is accessible via a REST/HTTP api. A mobile friendly SPA web application is the default user interface. An Android app exists for conveniently uploading files from your phone/tablet and a cli. The feature overview lists some more points.

Features

  • Easy to use interface
  • Tags and categories
  • OCR
  • Edit metadata
  • Multiple document select
  • full text search
  • Organize docs in folders
  • Create custom fields
  • Upload multiple files by drag-and-drop
  • Organize tags in categories
  • Filter documents by sources, tags, categories, due date, metadata, and size
  • Inbox view
  • Manage and organize contacts, manuals, docs, and more

Tech

Backend

The servers is written in Scala in a pure functional style, based on libraries from the typelevel stack: Cats, FS2, Doobie, Http4s, Circe and Pureconfig.

There are more libraries and technologies used, of course. Docspell is only a orchestration of great tools and libs. One important is the Stanford-NLP, that provides the ML features. Furthermore, file processing relies on external tools like tesseract, unoconv and ocrmypdf. All dependencies can be looked up in project/Dependencies.scala.

Frontend

The web frontend is an SPA written in Elm. The UI framework in use is tailwind.

License

  • Docspell is free software, distributed under the AGPLv3 or later.

Resources

15 Free Open-source Document Management Systems (DMS) for Enterprises and Individuals
What is a Document Management System? A Document Management System (DMS) is a software system that enables organizations to store, manage, and track electronic documents and images of paper-based information. The primary purpose of a DMS is to provide a central location for managing all digital and…
15 Open-source Full-Text Search Engine Solutions for developers
Full-Text Search is a technical term referred to advanced linguistic text query for a database or text documents. The search engine examines all the words stored in a document as it tries to match certain search criteria giving by the user. Many web websites depend on Full-text search to perform
13 Best Open Source Free PDF OCR Text Extractors
PDF file formats are a compact format widely used to create portable documents, reports, e-books, and more. Originally developed by Adobe in 1992, it has become a world standard. PDF files can contain text, images, and tables, and can be generated by many office suites, document editors, apps, web…