VectorAdmin - a Free Vector Database Manager for AI Developers
Table of Content
What is a Vector Database?
A vector database is a type of database specifically designed to store, manage, and query data represented as vectors. Vectors are numerical representations of data objects, often derived from machine learning or deep learning models, which capture the semantic meaning of text, images, audio, or other data types in high-dimensional space.
These databases are optimized for operations like similarity search, where the goal is to find items in a dataset that are most similar to a given query vector. This makes vector databases crucial for applications that rely on understanding and leveraging the relationships between data objects.
Vector databases are critical for AI developers because they provide efficient solutions for managing and querying large-scale vector embeddings, which are fundamental to many AI and machine learning applications.
For AI developers, a vector database is indispensable for handling the complexities of managing high-dimensional data. It empowers them to build scalable, efficient, and intelligent systems capable of delivering real-time insights and solutions across various industries, from e-commerce to healthcare. As AI continues to evolve, the role of vector databases will only grow in significance.
What is a VectorAdmin?
VectorAdmin is a free and open-source project that aims to be a full-stack application that gives you total control over your otherwise unwieldy vector data that you are embedding via an API or using tools like LangChain, which don't show you what you just saved into your database.
VectorAdmin is a full capable multi-user product that you can run locally via Docker as well as host remotely and manage multiple vector databases at once.
VectorAdmin is more than a single tool. VectorAdmin is a suite of tools that make interacting with and understanding vectorized text easy without compromise for the controls you would expect from a traditional database management system.
Features
- Multi-user instance support and oversight
- Atomically view, update, and delete singular text chunks of embeddings.
- Copy entire documents or even whole namespaces and embeddings without paying to re-embed.
- Upload & embed new documents directly into the vector database.
- Migrate an entire existing vector database to another type or instance. still in progress
- Manage multiple concurrent vector databases at once.
- Permission data and access to data
- 100% Cloud deployment ready.
- Automated regression testing that run as namespaces or collections are updated with new documents to ensure response quality. still in progress
- Full API, Javascript, and Python standalone client and LangChain integration. still in progress
- Extremely efficient cost-saving measures for managing very large documents. You'll never pay to embed a massive document or transcript more than once.
Technical Overview
This monorepo consists of three main sections:
document-processor
: Flask app to digest, parse, and embed documents easily.frontend
: A viteJS + React frontend that you can run to easily create and manage all your content.backend
: A nodeJS + express server to handle all the interactions and do all the vectorDB management.workers
: An InngestJS instance to handle long-running processes background tasks for snappy performance.docker
: Run this entire arch in a single command as a docker instance recommended.
Requirements
yarn
andnode
on your machinepython
3.9+ for running scripts indocument-processor/
.- access to an OpenAI API key if planning to update embeddings or upload new documents.
- a Pinecone.io free account or a running ChromaDB instance.
License
MIT License