Document Manager

Ambar: Libre Document Search Engine for Office, Text and PDF Documents

Hamza Musa

10 Dec 2022 — 1 min read

Photo by Jametlene Reskp / Unsplash

Ambar is an open-source document search engine with automated crawling, OCR, tagging and instant full-text search.

Ambar defines a new way to implement full-text document search into your workflow.

Easily deploy Ambar with a single docker-compose file
Perform Google-like search through your documents and contents of your images
Tag your documents
Use a simple REST API to integrate Ambar into your workflow

Search

Tutorial: Mastering Ambar Search Queries

Fuzzy Search (John~3)
Phrase Search ("John Smith")
Search By Author (author:John)
Search By File Path (filename:*.txt)
Search By Date (when: yesterday, today, lastweek, etc)
Search By Size (size>1M)
Search By Tags (tags:ocr)
Search As You Type
Supported language analyzers: English ambar_en, Russian ambar_ru, German ambar_de, Italian ambar_it, Polish ambar_pl, Chinese ambar_cn, CJK ambar_cjk

Crawling

Ambar 2.0 only supports local FS crawling, if you need to crawl an SMB share of an FTP location - just mount it using standard Linux tools. Crawling is automatic, no schedule is needed due to crawlers monitor file system events and automatically process new, changed and removed files.

Content Extraction

Ambar supports large files (>30MB)

Supported file types:

ZIP archives
Mail archives (PST)
MS Office documents (Word, Excel, PowerPoint, Visio, Publisher)
OCR over images
Email messages with attachments
Adobe PDF (with OCR)
OCR languages: Eng, Rus, Ita, Deu, Fra, Spa, Pl, Nld
OpenOffice documents
RTF, Plaintext
HTML / XHTML
Multithread processing

License

Ambar is released under the MIT License.

Resources

https://github.com/RD17/ambar

Understanding AI-To-AI Loop, and Why It is Important for Healthcare!

What is AI-to-AI Loop? The AI-to-AI Loop refers to a growing phenomenon where artificial intelligence systems increasingly consume, process, and generate content that was originally created by other AI systems. In the past, the digital ecosystem was primarily human-generated: people wrote articles, coded software, and created art. Nowadays , a significant

Stop Wasting Tokens: How CLAUDE.md Slashes Your AI Development Costs

Building an app in the age of agentic AI is incredibly fast, but it can also become incredibly expensive. If you've ever used Claude Code, Cursor, or other agentic coding tools, you know the drill: you start a session, watch the AI write beautiful code, and then-three prompts

How Patients Can Use AI to Strengthen Their Medico-Legal Claims

Navigating a medico-legal claim can feel like walking through a maze blindfolded. Between complex medical jargon, endless paperwork, and strict legal deadlines, it is easy to feel overwhelmed. However, Artificial Intelligence (AI) has emerged as a powerful ally for patients. It does not replace your lawyer or doctor, but it

Why and How Modern Developers Build on Cloudflare

If you’ve spent any time in developer circles lately, you’ve probably noticed a massive shift in how teams are architecting their applications. More and more of my developer friends, startup founders, and engineering teams are quietly, and in some cases, completely, moving parts of their infrastructure over to