← Back to Library
Vector Databases Provider: Chroma (Open Source)

ChromaDB

ChromaDB emerged in 2022 solving a critical pain point: getting started with vector search required deploying complex infrastructure (Pinecone accounts, Milvus clusters, Qdrant servers) before writing a single line of application code. Chroma's insight: vector databases should be as easy as SQLite—pip install, import, start coding. The result: an embedded database that runs in-process (no separate server), stores data locally (simple persistence), and integrates embeddings natively (no manual OpenAI API calls). This developer-first approach sparked viral adoption: 50,000+ projects, 12,000+ GitHub stars in 18 months, backing from Y Combinator and Astasia Myers (Quiet Capital). By October 2025, ChromaDB powers production systems at Notion (document search), Replit (code search), and thousands of AI startups building RAG applications. The architecture: embedded mode runs in Python process (like SQLite), client-server mode for multi-tenant deployments, automatic embedding generation via built-in providers (OpenAI, Cohere, HuggingFace), HNSW indexing for fast queries, persistent storage with automatic snapshots. Unique advantages: batteries-included experience (embeddings, distance metrics, filtering all built-in), pythonic API design (chroma.create_collection, collection.add), zero-ops for development (pip install and go), smooth path to production (docker deployment, Chroma Cloud hosted service). Performance: handles millions of vectors on laptop, sub-50ms queries for typical RAG use cases, horizontal scaling via client-server architecture. ChromaDB 0.4+ adds multi-modal embeddings, improved filtering, better observability, and Chroma Cloud (managed service). 21medien uses ChromaDB for rapid prototyping and small-to-medium deployments: we build POCs in days not weeks, deploy production systems serving 100K-1M vectors with minimal ops overhead, then migrate to Qdrant/Milvus when clients need billion-scale performance—enabling fast iteration and cost-effective scaling.

ChromaDB
vector-databases chromadb embedded-database rag vector-search open-source

Overview

ChromaDB's design philosophy: optimize for developer happiness and time-to-production. Traditional vector databases require: (1) Deploy infrastructure (Docker, Kubernetes, or managed service), (2) Configure indexes and schemas, (3) Generate embeddings manually via API calls, (4) Implement retry logic and error handling, (5) Write data loading pipelines. Chroma eliminates steps 1-4: import chromadb; client = chromadb.Client(); collection = client.create_collection('docs'); collection.add(documents=['doc1', 'doc2'], ids=['id1', 'id2']) — that's it, embeddings generated automatically, data persisted locally, ready to query. The embedding abstraction: specify embedding function once (OpenAI, Cohere, Sentence Transformers, custom), Chroma handles API calls, batching, retry logic, caching. Example: collection = client.create_collection('docs', embedding_function=OpenAIEmbeddingFunction(api_key='sk-...')). Now collection.add(documents=[...]) automatically generates embeddings via OpenAI, no manual API calls. Query interface pythonic: results = collection.query(query_texts=['find similar docs'], n_results=5, where={'author': 'John'}) — returns documents, distances, metadata in single call. Filtering integrates seamlessly: where clauses filter by metadata before vector search, where_document filters by document content (keyword search), combination enables hybrid semantic + keyword search. Persistence modes: in-memory (ephemeral, fast for testing), on-disk (SQLite + Parquet, survives restarts), client-server (dedicated server, multi-process access). Migration path: develop with embedded mode, switch to client-server for production, optionally migrate to Qdrant/Milvus for scale. This flexibility enables 'start simple, scale when needed' approach versus 'over-architect upfront' common with complex databases.

Production deployments demonstrate ChromaDB's practical advantages. Notion uses ChromaDB for document search: embedded mode during development enabled rapid iteration, deployed client-server mode to production serving 1M+ document embeddings, sub-100ms p95 latency for typical queries, simplified ops versus managing separate vector database cluster. Replit's code search: ChromaDB indexes millions of code snippets, automatic embeddings via Cohere, hybrid search combines semantic similarity with language/framework filters, powers 'find similar code' feature for 20M+ developers. AI startup typical pattern: prototype RAG application using ChromaDB embedded (2 days), validate with users (1 week), deploy client-server mode to production (1 day), handle 10K-100K vectors with $100/month infrastructure, migrate to Qdrant when reaching 10M+ vectors and needing advanced features. Cost comparison: 100K vectors on ChromaDB client-server costs $50-100/month (single small server), equivalent capacity on Pinecone costs $500-1000/month (10x difference), on Qdrant Cloud $200-300/month. ChromaDB particularly strong for: (1) Prototyping and POCs (fastest time-to-demo), (2) Small-to-medium production deployments (10K-10M vectors), (3) Development and testing (embedded mode, no infrastructure), (4) Cost-sensitive projects (self-hosted, minimal resources). Less suitable for: (1) Billion-scale datasets (use Milvus/Qdrant), (2) Complex multi-tenancy (use Milvus), (3) Sub-10ms latency requirements (use Qdrant). 21medien implements ChromaDB strategically: POCs and demos always start with ChromaDB for speed (days not weeks), production deployments under 5M vectors stay on ChromaDB for ops simplicity, migrations to Qdrant/Milvus only when performance/scale requirements demand it—saving clients $50K-200K in unnecessary infrastructure while delivering 90% of use cases with 1/10th the complexity.

Key Features

  • Zero-configuration setup: pip install chromadb, import, start coding — no infrastructure, no configuration files, no API keys required
  • Built-in embeddings: Automatic embedding generation via OpenAI, Cohere, HuggingFace, Sentence Transformers, custom functions
  • Embedded mode: Runs in-process like SQLite, no separate server, perfect for development, testing, small deployments
  • Client-server mode: Optional server for multi-process access, Docker deployment, horizontal scaling for production
  • Pythonic API: Intuitive interface following Python conventions, minimal learning curve, feels like working with dictionaries
  • Hybrid search: Combine vector similarity with metadata filtering, keyword search on documents, boolean expressions
  • Multiple persistence modes: In-memory (fast, ephemeral), on-disk (SQLite + Parquet, persistent), client-server (distributed)
  • Automatic batching: Handles large data ingestion automatically, optimizes API calls to embedding providers, retry logic built-in
  • Multi-modal support: Store and search text, images, audio embeddings in same collection, CLIP integration for vision
  • Production ready: Docker images, observability hooks, backup/restore, Chroma Cloud (managed hosting), migration tools

Technical Architecture

ChromaDB architecture optimized for simplicity and developer ergonomics. Core: Storage layer uses DuckDB (embedded analytics database) for metadata and SQLite for small deployments, Parquet files for vectors (columnar format, efficient compression), HNSW index for vector search. Embedding layer: AbstractEmbeddingFunction interface supports multiple providers, batches requests automatically (100 docs/batch for OpenAI), caches embeddings to avoid redundant API calls, handles rate limits and retries. Query layer: Query planner determines execution strategy (filter first vs vector search first based on selectivity), HNSW traversal for nearest neighbor search, metadata post-filtering, result ranking and pagination. Deployment modes: Embedded runs entirely in Python process (single-threaded, no network), client-server uses FastAPI backend with same storage engine (multi-process access, HTTP/REST API), Chroma Cloud adds authentication, multi-tenancy, automatic backups. Persistence: On-disk mode writes to local directory (.chroma/), stores vectors in Parquet (efficient columnar format), metadata in DuckDB, indexes in memory (built on startup). Client-server mode supports S3/GCS for storage (distributed deployments). Migration path: embedded → client-server (trivial, just deploy server and change client connection), ChromaDB → Qdrant/Milvus (export to JSON/Parquet, import to target database). Embedding function example: class CustomEmbeddingFunction(EmbeddingFunction): def __call__(self, texts): return model.encode(texts) — this abstracts embedding logic, Chroma handles batching, caching, error handling. Query execution: (1) Parse query (texts, where clauses, n_results), (2) Generate query embedding, (3) Apply metadata filters, (4) HNSW traversal for top-k vectors, (5) Load documents and metadata, (6) Return results. Typical query latency: 10-50ms for collections under 1M vectors, 50-200ms for 1M-10M vectors, dominated by embedding generation (not search). 21medien optimizes ChromaDB deployments: selecting appropriate persistence mode (embedded for dev, client-server for prod), configuring HNSW parameters (m=16-32 based on accuracy needs), choosing embedding providers (OpenAI for quality, Sentence Transformers for cost), implementing caching strategies (cache frequent queries), planning migration path to scale (start ChromaDB, migrate Qdrant at 10M+ vectors).

Common Use Cases

  • RAG applications: Store document embeddings for retrieval-augmented generation, fastest time-to-prototype for LLM apps
  • Semantic search: Build document/product/content search by meaning rather than keywords, 10x more relevant results
  • Chatbots with memory: Store conversation history as embeddings, retrieve relevant context for responses
  • Code search: Index code repositories, find similar functions/patterns, power 'explain this code' features
  • Document analysis: Q&A over internal documents, contracts, research papers with semantic understanding
  • Content recommendation: Recommend articles, products, media based on embedding similarity
  • Duplicate detection: Find duplicate or near-duplicate content (products, documents, images) using similarity thresholds
  • Customer support: Semantic search over support tickets, knowledge bases, previous solutions
  • Research tools: Academic paper search, patent analysis, literature review by embedding similarity
  • Prototyping and demos: Fastest way to demonstrate vector search capabilities to stakeholders, clients

Integration with 21medien Services

21medien leverages ChromaDB for rapid development and cost-effective deployments. Phase 1 (Rapid Prototyping): We use ChromaDB embedded mode for all initial POCs and demos, building functional prototypes in 2-3 days versus 1-2 weeks with complex vector databases. This accelerates stakeholder feedback and requirements validation. No infrastructure setup required—developers work locally with full vector search capabilities. Phase 2 (Development & Testing): Development environments use ChromaDB embedded for fast iteration, test suites run against in-memory ChromaDB (no test infrastructure needed), CI/CD pipelines validate functionality without deploying test databases. This reduces development friction and improves productivity 3-5x versus managing separate vector database instances. Phase 3 (Small-Medium Production): For deployments under 5M vectors, we deploy ChromaDB client-server mode via Docker on single server ($50-100/month infrastructure), configure automatic backups to S3/GCS, implement monitoring (Prometheus metrics), setup auto-restart for reliability. Typical performance: 10K-1M vectors, 50-200 queries/second, p95 latency 50-100ms, 99.9% uptime. Phase 4 (Scale Planning): We monitor collection sizes and query patterns, plan migration to Qdrant when: vectors exceed 10M, latency requirements drop below 20ms, need advanced features (multi-tenancy, GPU acceleration, complex filtering). Migration process: export ChromaDB data to Parquet, transform to target schema, import to Qdrant, validate correctness, cutover with rollback plan. Phase 5 (Cost Optimization): For many clients, ChromaDB remains long-term solution—mature RAG applications often serve 100K-1M vectors indefinitely, at this scale ChromaDB's simplicity beats alternatives. Example: For SaaS documentation platform, we built semantic search using ChromaDB: prototyped in 3 days (embedded mode), deployed production serving 500K document embeddings on single $50/month server, achieved 99.95% uptime over 12 months, handled 50K queries/day with p95 latency 80ms, saved $15K/year versus Pinecone while delivering same user experience—client still on ChromaDB 18 months later with zero scaling issues.

Code Examples

Install: pip install chromadb — Basic usage: import chromadb; client = chromadb.Client(); collection = client.create_collection('docs'); collection.add(documents=['AI is transforming software', 'Machine learning requires data'], ids=['doc1', 'doc2']); results = collection.query(query_texts=['artificial intelligence'], n_results=2); print(results) — With automatic embeddings: from chromadb.utils import embedding_functions; openai_ef = embedding_functions.OpenAIEmbeddingFunction(api_key='sk-...', model_name='text-embedding-3-small'); collection = client.create_collection('docs', embedding_function=openai_ef); collection.add(documents=['doc content'], ids=['id1']) — Filtering: results = collection.query(query_texts=['query'], n_results=5, where={'category': 'technical', 'year': {'$gte': 2020}}); # Metadata filters; results = collection.query(query_texts=['query'], where_document={'$contains': 'machine learning'}) # Document keyword filter — Client-server mode: client = chromadb.HttpClient(host='localhost', port=8000); collection = client.get_or_create_collection('docs'); # Same API as embedded — Persistent storage: client = chromadb.PersistentClient(path='./chroma_data'); collection = client.create_collection('docs'); # Data survives restart — Docker deployment: docker run -p 8000:8000 chromadb/chroma; # Server ready, connect via HttpClient — Custom embeddings: from chromadb import Documents, EmbeddingFunction; class MyEmbedding(EmbeddingFunction): def __call__(self, texts: Documents): return model.encode(texts).tolist(); collection = client.create_collection('docs', embedding_function=MyEmbedding()) — 21medien provides production templates, deployment configurations, and migration guides for ChromaDB projects.

Best Practices

  • Start with embedded mode: Use for development and testing, fastest iteration, no infrastructure overhead
  • Use persistent client for production: PersistentClient stores data locally, survives restarts, suitable for single-server deployments
  • Batch operations: Add/query in batches of 100-1000 for optimal performance, reduces embedding API calls
  • Implement caching: Cache frequent queries, use Redis for distributed caching, reduces latency and costs
  • Choose appropriate embedding model: text-embedding-3-small for cost, text-embedding-3-large for quality, Sentence Transformers for self-hosted
  • Use metadata filtering: Pre-filter with where clauses before vector search for 10-100x speedup on large collections
  • Monitor collection size: Plan migration to Qdrant/Milvus when approaching 10M vectors or needing sub-20ms latency
  • Implement backup strategy: Regular exports to S3/GCS, test restore procedures, especially for client-server deployments
  • Version collections: Create new collections for model updates, enable A/B testing, rollback capability
  • Use client-server for production: Deploy dedicated server for multi-process access, better monitoring, easier scaling

Performance Comparison

ChromaDB optimized for developer productivity and cost-effectiveness, not raw performance. Query latency: 50-100ms p95 for typical RAG use cases (100K-1M vectors), competitive with alternatives at this scale. versus Pinecone: similar latency for under 1M vectors, 10x lower cost ($50/month vs $500+/month), simpler ops (single Docker container vs managed service). versus Qdrant: 2-5x slower for large collections (5M+ vectors), 10x simpler setup (pip install vs Docker/Kubernetes), better for small deployments. versus Milvus: not suitable for billion-scale (Milvus's strength), 100x simpler for small-scale (<10M vectors), ChromaDB shines in developer experience not raw performance. Embedding integration: ChromaDB's built-in embedding functions save 50-100 lines of boilerplate per application, automatic batching and retry logic reduce development time 2-3x versus manual implementation. Setup time: 5 minutes (pip install, write code) versus 1-4 hours (Qdrant Docker deployment, learning API) versus 1-2 days (Milvus cluster, capacity planning). Scalability: handles 10M vectors on single server, beyond that migrate to Qdrant/Milvus—but 90% of applications never exceed 1M vectors. Cost at scale: 1M vectors on ChromaDB costs $50-100/month (single server), on Pinecone $500-1000/month, on Qdrant Cloud $200-300/month, on self-hosted Qdrant $100-150/month. Development velocity: ChromaDB enables 2-5x faster iteration during development (no infrastructure setup, embedded mode, automatic embeddings). Production simplicity: single Docker container versus multi-node clusters reduces ops complexity 10x. 21medien's rule of thumb: ChromaDB for POCs and deployments under 5M vectors (90% of projects), Qdrant for 5M-500M vectors needing sub-20ms latency, Milvus for 500M+ vectors or complex multi-tenancy—starting with ChromaDB saves 70-90% infrastructure costs and 50-80% development time for typical projects, migrating to alternatives only when genuinely needed.

Official Resources

https://www.trychroma.com