Ambar is an open-source document search engine with automated crawling, OCR, tagging and instant full-text search.
Ambar defines a new way to implement full-text document search into your workflow.
- Easily deploy Ambar with a single
- Perform Google-like search through your documents and contents of your images
- Tag your documents
- Use a simple REST API to integrate Ambar into your workflow
- Fuzzy Search (John~3)
- Phrase Search ("John Smith")
- Search By Author (author:John)
- Search By File Path (filename:*.txt)
- Search By Date (when: yesterday, today, lastweek, etc)
- Search By Size (size>1M)
- Search By Tags (tags:ocr)
- Search As You Type
- Supported language analyzers: English
Ambar 2.0 only supports local FS crawling, if you need to crawl an SMB share of an FTP location - just mount it using standard Linux tools. Crawling is automatic, no schedule is needed due to crawlers monitor file system events and automatically process new, changed and removed files.
Ambar supports large files (>30MB)
Supported file types:
- ZIP archives
- Mail archives (PST)
- MS Office documents (Word, Excel, PowerPoint, Visio, Publisher)
- OCR over images
- Email messages with attachments
- Adobe PDF (with OCR)
- OCR languages: Eng, Rus, Ita, Deu, Fra, Spa, Pl, Nld
- OpenOffice documents
- RTF, Plaintext
- HTML / XHTML
- Multithread processing
Ambar is released under the MIT License.