Vector Database¶

Service ownership

Owner: ai-platform (ai-pm@clouddigit.ai) — Status: GA — Last audited: 2026-05-11

Standalone managed vector store — pick PGVector, Milvus, or Weaviate.

What it is¶

A managed deployment of one of three vector databases, picked at create time. Cloud Digit operates the engine; you operate schemas, indexes, and queries.

Engines¶

Engine	Strengths	When
PGVector	SQL semantics, transactions, filter + ANN in one query	You already use Postgres and want vectors alongside relational data
Milvus	Pure-vector; HNSW + IVF + DiskANN + GPU index	Very large vector estates, billions of vectors
Weaviate	Hybrid (vector + keyword + filter), built-in reranking	RAG over knowledge bases with text + metadata

Topologies¶

Single-node — dev / non-prod
3-node HA — production default
Sharded — for Milvus, large-scale deployments

Use cases¶

RAG over corporate knowledge bases (pair with Inference Endpoints)
Semantic search
Recommendation systems (similar items)
De-duplication / fuzzy matching at scale

Pricing¶

Compute by flavor
Storage on Provisioned IOPS

See Pricing.

Inference Endpoints — embeddings producer
Managed PostgreSQL — base engine for PGVector option

Operate this service¶

AdministrationOperationTroubleshooting

Managed vector search (pgvector / dedicated Milvus / Qdrant) for RAG, semantic search, and embedding-based recommendation.

Engine choice¶

Engine	Strengths	Weaknesses
pgvector (Postgres extension)	Use existing Postgres, transactional, hybrid search	Slower at scale (10M+ vectors)
Milvus	Built for billion-vector scale, GPU-able	Standalone service to manage
Qdrant	Lightweight, good filter performance	Less mature at extreme scale

Default for most workloads: pgvector until you hit scale issues (>10M vectors or > 1k QPS).

IAM¶

Inherits from the underlying engine — pgvector uses Postgres roles, Milvus/Qdrant use their own RBAC. Cloud Digit issues credentials through the same IAM model.

Index choice¶

Index type	Build time	Query speed	Recall
Flat (no index)	None	Slow	100%
IVF_FLAT	Fast	Medium	90-95%
HNSW	Slow	Fast	95-99%

For most apps: HNSW with M=16, ef_construction=200. Tune ef_search per query for recall/latency tradeoff.

Embeddings dimension¶

Pre-decide the embedding model — changing dimensions later requires re-indexing everything:

OpenAI text-embedding-3-small: 1536-dim
Cohere embed-multilingual-v3: 1024-dim
Sentence-transformers all-MiniLM-L6-v2: 384-dim
Custom: depends

Hybrid search¶

For RAG: combine vector similarity + keyword (BM25). pgvector supports this natively in a SQL WHERE + ORDER BY embedding <-> :query clause.

Cost¶

pgvector: regular Postgres pricing + extra IOPS during index build. Milvus/Qdrant: dedicated cluster pricing, separate from DB.

Related¶

Metrics¶

Metric	Healthy	Alert
`vector.query_latency_ms` p95	< 100 ms	> 500 ms
`vector.recall` (sampled)	> 95%	< 90%
`vector.index_size_bytes`	grows with data	sudden 10× jump (bug)
`vector.queries_per_sec`	varies
`vector.cache_hit_ratio` (Milvus)	> 80%	< 60%

Bulk ingest¶

For initial corpus load (millions of vectors):

```bash

pgvector — disable index, bulk insert, recreate index¶

DROP INDEX docs_embedding_idx; \COPY docs(id, embedding, content) FROM 'data.csv' WITH CSV HEADER; CREATE INDEX docs_embedding_idx ON docs USING hnsw (embedding vector_cosine_ops); ```

Bulk insert is 10-100× faster without the index. Index build itself can take hours for large corpora — schedule off-hours.

Re-embedding migration¶

When changing embedding models, you must re-embed everything:

Provision a parallel table / collection with new dimensions
Re-embed documents in batches
Write traffic dual-writes both
Read traffic shifted gradually
Decom old table

Plan for this before picking your first embedding model.

Query tuning¶

For HNSW: ef_search parameter per query controls speed/recall:

sql SET hnsw.ef_search = 100; -- fast but lower recall SELECT * FROM docs ORDER BY embedding <-> :query LIMIT 10;

ef_search=40 → ~95% recall, fast. ef_search=200 → ~99% recall, slower. Tune per workload.

Hybrid search pattern¶

sql SELECT id, content, (embedding <-> :query_vec) * 0.6 + (1 - ts_rank(tsv, :query_text)) * 0.4 AS score FROM docs WHERE tsv @@ plainto_tsquery(:query_text) ORDER BY score LIMIT 10;

Weights tuned per query type. Pure vector for "find similar by meaning"; weighted hybrid for "find related text matching a query."

Backup¶

pgvector: regular Postgres backup. Vectors are just a column type. Milvus/Qdrant: native snapshot to S3 daily.

Related¶

Slow vector queries¶

Symptom	Cause
All queries slow	No index; using brute-force scan
Cold cache slow, warm fast	First query loads index from disk
`ef_search` set too high	Reduce for less recall, more speed
Result count > 100	Vector search beyond top-N gets slow exponentially
Filter on a high-cardinality field	Pre-filter blows up the candidate set

sql EXPLAIN ANALYZE SELECT ... ORDER BY embedding <-> :q LIMIT 10; -- Look for "Index Scan using docs_embedding_idx"

Recall low¶

Sampled recall < 90% — indexes lose accuracy under heavy modification:

HNSW: rebuild after large delete or update
IVF: re-cluster periodically

sql REINDEX INDEX docs_embedding_idx; -- pgvector

Dimension mismatch¶

ERROR: expected 1536 dimensions, got 1024

A client embedded with a different model. The vector table is dimension-locked. Either re-embed the query with the correct model, or migrate the whole table to the new dimension.

Memory pressure (Milvus / Qdrant)¶

Symptom	Fix
Slow queries, swap usage rising	Index doesn't fit in RAM — resize node
Frequent eviction	Increase cache size
OOM crashes	Hard upgrade required

For pgvector: shared_buffers + work_mem tuning, same as regular Postgres.

Index build never finishes¶

For pgvector HNSW with millions of vectors: index build can take >24h. Workarounds:

Build in parallel: SET maintenance_work_mem='4GB'; SET max_parallel_maintenance_workers=8;
IVF_FLAT instead (faster build, slightly lower recall)
Build on a temp larger instance, then migrate

Inconsistent results between runs¶

Vector search is approximate. Two queries with the same input may return slightly different top-10s:

HNSW uses random graph construction
Higher ef_search reduces variance

For exact match needs (legal "must contain this doc"): combine with keyword filter.

Re-embedding migration failures¶

Cause	Fix
Embedding model API timing out	Batch smaller, retry
Cost spike	Embedding APIs charge per token; estimate
Dimension mismatch in new table	Verify table schema before bulk load