txtai - Open-source embedded database for Semantic search, LLM orchestration
txtai is an all-in-one embeddings database for semantic search, LLM orchestration and language model workflows.
It offers versatile tools for processing text, audio, and image data.
txtai is built with Python 3.9+, Hugging Face Transformers, Sentence Transformers and FastAPI. txtai is open-source under an Apache 2.0 license.
Features
- Embeddings & Semantic Search: Supports semantic search with embeddings for retrieving and ranking results based on similarity.
- Pipelines: Provides pipelines for processing text (e.g., summarization, translation, question answering), audio (e.g., transcription, text-to-speech), and images (e.g., captioning, object detection).
- Language Model Integration: Easily integrates large language models (LLMs) like LLaMA and transformers for tasks like text generation, classification, and labeling.
- Retrieval-Augmented Generation (RAG): Implements RAG workflows, allowing models to retrieve information from external sources before generating responses.
- Workflows: Allows for complex workflows, connecting multiple processing tasks, e.g., transcribing, translating, and indexing documents.
- API and Distributed Setup: Offers a REST API and can be deployed across distributed environments, enabling scalable and adaptable solutions.
- 🔎 Vector search with SQL, object storage, topic modeling, graph analysis and multimodal indexing
- 📄 Create embeddings for text, documents, audio, images and video
- 💡 Pipelines powered by language models that run LLM prompts, question-answering, labeling, transcription, translation, summarization and more
- ↪️️ Workflows to join pipelines together and aggregate business logic. txtai processes can be simple microservices or multi-model workflows.
- ⚙️ Build with Python or YAML. API bindings available for JavaScript, Java, Rust and Go.
- ☁️ Run local or scale out with container orchestration
License
Apache-2.0 License