Skip to content

Vector Database

Service ownership

Owner: ai-platform (ai-pm@clouddigit.ai) — Status: GA — Last audited: 2026-05-11

Standalone managed vector store — pick PGVector, Milvus, or Weaviate.

What it is

A managed deployment of one of three vector databases, picked at create time. Cloud Digit operates the engine; you operate schemas, indexes, and queries.

Engines

Engine Strengths When
PGVector SQL semantics, transactions, filter + ANN in one query You already use Postgres and want vectors alongside relational data
Milvus Pure-vector; HNSW + IVF + DiskANN + GPU index Very large vector estates, billions of vectors
Weaviate Hybrid (vector + keyword + filter), built-in reranking RAG over knowledge bases with text + metadata

Topologies

  • Single-node — dev / non-prod
  • 3-node HA — production default
  • Sharded — for Milvus, large-scale deployments

Use cases

  • RAG over corporate knowledge bases (pair with Inference Endpoints)
  • Semantic search
  • Recommendation systems (similar items)
  • De-duplication / fuzzy matching at scale

Pricing

See Pricing.

Operate this service

Managed vector search (pgvector / dedicated Milvus / Qdrant) for RAG, semantic search, and embedding-based recommendation.

Engine choice

Engine Strengths Weaknesses
pgvector (Postgres extension) Use existing Postgres, transactional, hybrid search Slower at scale (10M+ vectors)
Milvus Built for billion-vector scale, GPU-able Standalone service to manage
Qdrant Lightweight, good filter performance Less mature at extreme scale

Default for most workloads: pgvector until you hit scale issues (>10M vectors or > 1k QPS).

IAM

Inherits from the underlying engine — pgvector uses Postgres roles, Milvus/Qdrant use their own RBAC. Cloud Digit issues credentials through the same IAM model.

Index choice

Index type Build time Query speed Recall
Flat (no index) None Slow 100%
IVF_FLAT Fast Medium 90-95%
HNSW Slow Fast 95-99%

For most apps: HNSW with M=16, ef_construction=200. Tune ef_search per query for recall/latency tradeoff.

Embeddings dimension

Pre-decide the embedding model — changing dimensions later requires re-indexing everything:

  • OpenAI text-embedding-3-small: 1536-dim
  • Cohere embed-multilingual-v3: 1024-dim
  • Sentence-transformers all-MiniLM-L6-v2: 384-dim
  • Custom: depends

For RAG: combine vector similarity + keyword (BM25). pgvector supports this natively in a SQL WHERE + ORDER BY embedding <-> :query clause.

Cost

pgvector: regular Postgres pricing + extra IOPS during index build. Milvus/Qdrant: dedicated cluster pricing, separate from DB.

Metrics

Metric Healthy Alert
vector.query_latency_ms p95 < 100 ms > 500 ms
vector.recall (sampled) > 95% < 90%
vector.index_size_bytes grows with data sudden 10× jump (bug)
vector.queries_per_sec varies
vector.cache_hit_ratio (Milvus) > 80% < 60%

Bulk ingest

For initial corpus load (millions of vectors):

```bash

pgvector — disable index, bulk insert, recreate index

DROP INDEX docs_embedding_idx; \COPY docs(id, embedding, content) FROM 'data.csv' WITH CSV HEADER; CREATE INDEX docs_embedding_idx ON docs USING hnsw (embedding vector_cosine_ops); ```

Bulk insert is 10-100× faster without the index. Index build itself can take hours for large corpora — schedule off-hours.

Re-embedding migration

When changing embedding models, you must re-embed everything:

  1. Provision a parallel table / collection with new dimensions
  2. Re-embed documents in batches
  3. Write traffic dual-writes both
  4. Read traffic shifted gradually
  5. Decom old table

Plan for this before picking your first embedding model.

Query tuning

For HNSW: ef_search parameter per query controls speed/recall:

sql SET hnsw.ef_search = 100; -- fast but lower recall SELECT * FROM docs ORDER BY embedding <-> :query LIMIT 10;

ef_search=40 → ~95% recall, fast. ef_search=200 → ~99% recall, slower. Tune per workload.

Hybrid search pattern

sql SELECT id, content, (embedding <-> :query_vec) * 0.6 + (1 - ts_rank(tsv, :query_text)) * 0.4 AS score FROM docs WHERE tsv @@ plainto_tsquery(:query_text) ORDER BY score LIMIT 10;

Weights tuned per query type. Pure vector for "find similar by meaning"; weighted hybrid for "find related text matching a query."

Backup

pgvector: regular Postgres backup. Vectors are just a column type. Milvus/Qdrant: native snapshot to S3 daily.

Slow vector queries

Symptom Cause
All queries slow No index; using brute-force scan
Cold cache slow, warm fast First query loads index from disk
ef_search set too high Reduce for less recall, more speed
Result count > 100 Vector search beyond top-N gets slow exponentially
Filter on a high-cardinality field Pre-filter blows up the candidate set

sql EXPLAIN ANALYZE SELECT ... ORDER BY embedding <-> :q LIMIT 10; -- Look for "Index Scan using docs_embedding_idx"

Recall low

Sampled recall < 90% — indexes lose accuracy under heavy modification:

  • HNSW: rebuild after large delete or update
  • IVF: re-cluster periodically

sql REINDEX INDEX docs_embedding_idx; -- pgvector

Dimension mismatch

ERROR: expected 1536 dimensions, got 1024

A client embedded with a different model. The vector table is dimension-locked. Either re-embed the query with the correct model, or migrate the whole table to the new dimension.

Memory pressure (Milvus / Qdrant)

Symptom Fix
Slow queries, swap usage rising Index doesn't fit in RAM — resize node
Frequent eviction Increase cache size
OOM crashes Hard upgrade required

For pgvector: shared_buffers + work_mem tuning, same as regular Postgres.

Index build never finishes

For pgvector HNSW with millions of vectors: index build can take >24h. Workarounds:

  • Build in parallel: SET maintenance_work_mem='4GB'; SET max_parallel_maintenance_workers=8;
  • IVF_FLAT instead (faster build, slightly lower recall)
  • Build on a temp larger instance, then migrate

Inconsistent results between runs

Vector search is approximate. Two queries with the same input may return slightly different top-10s:

  • HNSW uses random graph construction
  • Higher ef_search reduces variance

For exact match needs (legal "must contain this doc"): combine with keyword filter.

Re-embedding migration failures

Cause Fix
Embedding model API timing out Batch smaller, retry
Cost spike Embedding APIs charge per token; estimate
Dimension mismatch in new table Verify table schema before bulk load