Vector Database¶
Service ownership
Owner: ai-platform (ai-pm@clouddigit.ai) — Status: GA — Last audited: 2026-05-11
Standalone managed vector store — pick PGVector, Milvus, or Weaviate.
What it is¶
A managed deployment of one of three vector databases, picked at create time. Cloud Digit operates the engine; you operate schemas, indexes, and queries.
Engines¶
| Engine | Strengths | When |
|---|---|---|
| PGVector | SQL semantics, transactions, filter + ANN in one query | You already use Postgres and want vectors alongside relational data |
| Milvus | Pure-vector; HNSW + IVF + DiskANN + GPU index | Very large vector estates, billions of vectors |
| Weaviate | Hybrid (vector + keyword + filter), built-in reranking | RAG over knowledge bases with text + metadata |
Topologies¶
- Single-node — dev / non-prod
- 3-node HA — production default
- Sharded — for Milvus, large-scale deployments
Use cases¶
- RAG over corporate knowledge bases (pair with Inference Endpoints)
- Semantic search
- Recommendation systems (similar items)
- De-duplication / fuzzy matching at scale
Pricing¶
- Compute by flavor
- Storage on Provisioned IOPS
See Pricing.
Related¶
- Inference Endpoints — embeddings producer
- Managed PostgreSQL — base engine for PGVector option
Operate this service¶
Managed vector search (pgvector / dedicated Milvus / Qdrant) for RAG, semantic search, and embedding-based recommendation.
Engine choice¶
| Engine | Strengths | Weaknesses |
|---|---|---|
| pgvector (Postgres extension) | Use existing Postgres, transactional, hybrid search | Slower at scale (10M+ vectors) |
| Milvus | Built for billion-vector scale, GPU-able | Standalone service to manage |
| Qdrant | Lightweight, good filter performance | Less mature at extreme scale |
Default for most workloads: pgvector until you hit scale issues (>10M vectors or > 1k QPS).
IAM¶
Inherits from the underlying engine — pgvector uses Postgres roles, Milvus/Qdrant use their own RBAC. Cloud Digit issues credentials through the same IAM model.
Index choice¶
| Index type | Build time | Query speed | Recall |
|---|---|---|---|
| Flat (no index) | None | Slow | 100% |
| IVF_FLAT | Fast | Medium | 90-95% |
| HNSW | Slow | Fast | 95-99% |
For most apps: HNSW with M=16, ef_construction=200. Tune ef_search per query for recall/latency tradeoff.
Embeddings dimension¶
Pre-decide the embedding model — changing dimensions later requires re-indexing everything:
- OpenAI text-embedding-3-small: 1536-dim
- Cohere embed-multilingual-v3: 1024-dim
- Sentence-transformers all-MiniLM-L6-v2: 384-dim
- Custom: depends
Hybrid search¶
For RAG: combine vector similarity + keyword (BM25). pgvector supports this natively in a SQL WHERE + ORDER BY embedding <-> :query clause.
Cost¶
pgvector: regular Postgres pricing + extra IOPS during index build. Milvus/Qdrant: dedicated cluster pricing, separate from DB.
Related¶
Metrics¶
| Metric | Healthy | Alert |
|---|---|---|
vector.query_latency_ms p95 | < 100 ms | > 500 ms |
vector.recall (sampled) | > 95% | < 90% |
vector.index_size_bytes | grows with data | sudden 10× jump (bug) |
vector.queries_per_sec | varies | |
vector.cache_hit_ratio (Milvus) | > 80% | < 60% |
Bulk ingest¶
For initial corpus load (millions of vectors):
```bash
pgvector — disable index, bulk insert, recreate index¶
DROP INDEX docs_embedding_idx; \COPY docs(id, embedding, content) FROM 'data.csv' WITH CSV HEADER; CREATE INDEX docs_embedding_idx ON docs USING hnsw (embedding vector_cosine_ops); ```
Bulk insert is 10-100× faster without the index. Index build itself can take hours for large corpora — schedule off-hours.
Re-embedding migration¶
When changing embedding models, you must re-embed everything:
- Provision a parallel table / collection with new dimensions
- Re-embed documents in batches
- Write traffic dual-writes both
- Read traffic shifted gradually
- Decom old table
Plan for this before picking your first embedding model.
Query tuning¶
For HNSW: ef_search parameter per query controls speed/recall:
sql SET hnsw.ef_search = 100; -- fast but lower recall SELECT * FROM docs ORDER BY embedding <-> :query LIMIT 10;
ef_search=40 → ~95% recall, fast. ef_search=200 → ~99% recall, slower. Tune per workload.
Hybrid search pattern¶
sql SELECT id, content, (embedding <-> :query_vec) * 0.6 + (1 - ts_rank(tsv, :query_text)) * 0.4 AS score FROM docs WHERE tsv @@ plainto_tsquery(:query_text) ORDER BY score LIMIT 10;
Weights tuned per query type. Pure vector for "find similar by meaning"; weighted hybrid for "find related text matching a query."
Backup¶
pgvector: regular Postgres backup. Vectors are just a column type. Milvus/Qdrant: native snapshot to S3 daily.
Related¶
Slow vector queries¶
| Symptom | Cause |
|---|---|
| All queries slow | No index; using brute-force scan |
| Cold cache slow, warm fast | First query loads index from disk |
ef_search set too high | Reduce for less recall, more speed |
| Result count > 100 | Vector search beyond top-N gets slow exponentially |
| Filter on a high-cardinality field | Pre-filter blows up the candidate set |
sql EXPLAIN ANALYZE SELECT ... ORDER BY embedding <-> :q LIMIT 10; -- Look for "Index Scan using docs_embedding_idx"
Recall low¶
Sampled recall < 90% — indexes lose accuracy under heavy modification:
- HNSW: rebuild after large delete or update
- IVF: re-cluster periodically
sql REINDEX INDEX docs_embedding_idx; -- pgvector
Dimension mismatch¶
ERROR: expected 1536 dimensions, got 1024
A client embedded with a different model. The vector table is dimension-locked. Either re-embed the query with the correct model, or migrate the whole table to the new dimension.
Memory pressure (Milvus / Qdrant)¶
| Symptom | Fix |
|---|---|
| Slow queries, swap usage rising | Index doesn't fit in RAM — resize node |
| Frequent eviction | Increase cache size |
| OOM crashes | Hard upgrade required |
For pgvector: shared_buffers + work_mem tuning, same as regular Postgres.
Index build never finishes¶
For pgvector HNSW with millions of vectors: index build can take >24h. Workarounds:
- Build in parallel:
SET maintenance_work_mem='4GB'; SET max_parallel_maintenance_workers=8; - IVF_FLAT instead (faster build, slightly lower recall)
- Build on a temp larger instance, then migrate
Inconsistent results between runs¶
Vector search is approximate. Two queries with the same input may return slightly different top-10s:
- HNSW uses random graph construction
- Higher
ef_searchreduces variance
For exact match needs (legal "must contain this doc"): combine with keyword filter.
Re-embedding migration failures¶
| Cause | Fix |
|---|---|
| Embedding model API timing out | Batch smaller, retry |
| Cost spike | Embedding APIs charge per token; estimate |
| Dimension mismatch in new table | Verify table schema before bulk load |