Full-Text Search is a technical term referred to advanced linguistic text query for a database or text documents. The search engine examines all the words stored in a document as it tries to match certain search criteria giving by the user.
Many web websites depend on Full-text search to perform advanced search operations.
As a new trend, many developers, and companies tend to create static websites instead of dynamic database-driven ones. In both cases they can use Full-Text search with help of several libraries and services.
Some cloud-based services offer Full-Text search as a service like
Algolia.com. However, open-source alternatives can save time and resources as provide better control for enterprise.
Why is Full-text search required?
- Speed: Using Full-text search ensure speed in retrieval of the results for large numbers of documents of a huge text database.
- Efficiency: Accurate precise search results in all fields.
- Configurable search criteria and logic.
- Static websites support: Many static websites use flat-files approach like JSON or Markdown formatted files. Some libraries and frameworks are using
- Mobile apps support
Implementing Full-text search with static generated websites is a necessity, especially when most of the static website generators don't include search as a primary functionality. Also, if you are using a self-hosted Ghost blog system, you may want to include a full-text search yourself, mainly because Ghost does not offer built-in search.
Here we will list the best open-source full-text search libraries for developers which can be used to enrich the user experience and provide more valid and accurate search results.
It's a lightweight alternative library for Apache Solr. Lunr supports 14 languages out-of-box and offers fuzzy term.
2- Apache Solr
As an enterprise-grade platform, Solr is packed with features like load-balancing queries, automated functions, centralized configuration, distributed instant indexing and scale-ready infrastructure.
Solr is used by several big players like DuckDuckGo, AT&T, Instagram, eBey, Comcast, Magento eCommerce, Adobe, Netflix, Internet Archive and more.
Developer can build apps on Solr easily because it supports many open-standards interfaces: JSON, XML and HTTP.
3- Sphinx Search
Sphinx is a full-text search engine server written in C++ for best performance. It works seamlessly on Windows, Linux, macOS. It indexes all data in SQL or NoSQL database.
Sphinx offers a rich API (SphinxAPI) that allows developer to integrate it easily and search using SphinxQL which resample old school SQL.
It's proven to index 10-15mb of text per second per single CPU core and 60+MB/sec per server. For a double core desktop machine it runs 500+ queries/sec.
Sphinx is a scale ready, it's noted that the biggest Sphinx cluster indexes 25+ billion documents at Craigslist serving 300+ million search queries/day.
It features, SQL/ NoSQL database indexing, non-text attributes search, real-time full-text indexing and supports distributed search.
4- Manticore Search
Manticore Search is a multilingual full-text search with support for big data sets and real-time data streaming.
It's the best project on this list that offers unique features as geo-search, replications, search ranking algorithms, real-time indexing and built-in JSON support.
Manticore search provide indexing support for MySQL, PostgreSQL, and flat files like CSV, TSV as well as markdown files.
It also has a built-in morphology support for many languages.
5- Apache Lucene Core
Apache Lucene is a full-featured text search engine library. It's highly scalable with real-time text indexing and low hardware requirements.
Its features include: search ranked (favoring best results), dozens of search query types, field search, multiple indexing strategies, multiple ranking models and configurable storage engines.
Apache Lucense is built with Java, so it works on all known systems with implementations in other languages (C++, .NET, PHP5, Perl, Lisp, Python, Delphi, Objective-C, and Ruby).
6- Ambar Cloud
Ambar Cloud is an open-source document search engine with automated crawling, OCR, tagging and real-time indexing.
It supports all known text document format. It also performs automated OCR on images and PDF files.
It does not require deployment and offers an offline search functionalities.
8- Apache Nutch
Apache Nutch is a highly extensible and scalable open-source crawler, text-indexer and full-text search engine.
Typesense is a free open-source search engine with user and developer-friendly functionalities. It supports full-text search, automatic suggest, ranking results, allows a wide range of filters and facets, and it's also a typo tolerant.
Yet another full-text open-source search engine and a column database for enterprise. It's fast, supports aggregated queries and inverted index.
It's the second solution on this list that supports Geo-location search out-of-box.
Groonga is built with pure C language, and it has libraries for many other popular languages like Ruby, Python and .Net.
Bleve is a full-text search engine written in Go language. It's simple, fast and lightweight. It supports text analysis out of box and many languages like French, Dutch, Turkish, Italian, Persian, Arabic, Russian and many more.
HubbleDotNet is .Net based full-text search engine. It acts as a full-text search library for .Net projects. It fast and it comes with SQL support.
Please note that HubbleDotNet didn't receive any update for years.
A full-text search engine written completely in PHP. It features fuzzy search, Geo-search, text classification, Boolean search, result highlighting and dynamic indexing.
TNTSearch supports many languages as: English, German, French, Dutch, Russian, Italian. It also provides a full support for RT languages like Arabic, Hebrew and Persian.
14- Flex Search (FlexSearch)
MG4J is a cross-platform full-text search engine for text documents. It has advanced customizable indexing tool with support for multi-index interval semantics. It supports virtual fields, distributed search, multi-threading and clustering.
Bayard is a full-text search engine and indexing server built with Rust language on top of Tantivy a full-text search engine (Rust). It features index replication, clustering and comes with command-line interface. It's under active development by a team of developers.
ElasticSearch is a popular open-source enterprise-grade full-text search. It has REST-API and supports real-time indexing and scaling.
The ndx is a full-text search engine library written in Node.js. It supports multiple fields, real-time indexing, inverted text, text queries and serializable index.
This library has a small memory footprint which is optimized for mobile applications and web apps.
Srchx is a standalne full-text search engine built on Bleve, but it supports multiple storage Scorch, BoltDB, LevelDB and Badger.DB It larverages full CPU cores and comes with REST-API.
LIFTI is .NET based full-text search indexer for .NET-based applications.
A full-text search engine and indexing server built with Rust.
As our list comes to an end, we listed the best active full-text search projects with a good support. If you know of anyone that we didn't include on this list, please let us know in a comment or a message.