Skip to content

Managed MongoDB

Service ownership

Owner: data-platform (data-pm@clouddigit.ai) — Status: GA — Last audited: 2026-05-11

Managed MongoDB 6 and 7, replica sets and sharded clusters.

What it is

MongoDB (Community Edition) provisioned and operated by Cloud Digit. Replica-set deployment by default; sharded clusters for high write throughput or large datasets.

Versions

MongoDB CE 6.0, 7.0. Atlas-style features (Search, Stream, Charts) are not part of the managed offering — for vector / search workloads, see Managed OpenSearch and Vector Database.

Topologies

Topology Use case
Replica set (3 nodes) Standard production
Sharded cluster Large datasets, high write throughput
Cross-region replica DR / geo-read

Backup & PITR

Daily snapshots + oplog archive for PITR. Standard 7-day retention; configurable to 35.

Connection

  • TLS-only
  • Connection string mongodb+srv://... resolves over Cloud Digit DNS
  • mongosh, official drivers all work as-is

Pricing

Compute by flavor + Provisioned-IOPS storage. See Pricing.

Operate this service

MongoDB 7.x replica sets and sharded clusters with HA, authentication, and audit.

Topology

Topology Use
Replica set (3 members) Standard production; primary + 2 secondaries
Sharded cluster Data >2 TB or write throughput >50k ops/s
Replica set + hidden member Add a backup-only node

Always 3+ voting members. A 2-node replica set can't elect after a single failure.

IAM

Role Can do
mongo.viewer Read cluster metadata, metrics
mongo.connector Connect (creds issued)
mongo.dba-operator Failover, parameter changes
mongo.cluster-admin Create / delete / resize clusters / sharding

In-database roles: platform issues a clusterAdmin analog, you create app-scoped users with readWrite on specific DBs.

Encryption

  • At rest: WiredTiger encryption with KMS-backed keys (platform-managed by default; CMK supported)
  • In transit: TLS required by default

Authentication

SCRAM-SHA-256 by default. For workloads integrated with corporate IdP, LDAP / SASL passthrough is supported (premium feature).

Sharding key choice

The biggest decision when sharding. Bad choices cause: - Hotspots (monotonically-increasing keys send all writes to one shard) - Jumbo chunks (low-cardinality keys)

Generally: hashed shard key for uniform distribution; ranged shard key only when range queries dominate.

Backups

  • Continuous oplog tailing → PITR within retention window
  • Daily snapshot of underlying storage
  • Default 7-day retention; bump per workload

Metrics

Metric Healthy Alert
mongo.connections.current < 80% of max > 90%
mongo.replSet.lag_seconds < 1 s > 5 s
mongo.opLatencies.command_ms p99 < 50 ms > 200 ms
mongo.opCounters.ops_per_sec varies
mongo.cache.evicted_clean_bytes low high (working set > cache)
mongo.slow_queries_per_min < 10 spike

Failover

Triggered by election; manual:

bash cd db mongo stepDown --cluster acme-mongo-prod --primary <node>

RTO typically < 20 s.

Index management

Build indexes in the background:

javascript db.users.createIndex({ email: 1 }, { background: true, unique: true })

Foreground index builds block writes. For very large collections, use rolling index build via the platform:

bash cd db mongo index build-rolling \ --cluster acme-mongo-prod \ --collection acme.users \ --keys '{"email": 1}' \ --options '{"unique": true}'

Builds on each replica in turn, then primary — zero write impact.

Sharding operations

Add a shard: bash cd db mongo shard add --cluster acme-mongo-prod --instance-type mongo-r5.xlarge

Balancer rebalances chunks automatically; lasts hours for large collections. Watch mongo.balancer.chunks_moved_per_hour.

Backups

Trigger ad-hoc snapshot before risky migrations:

bash cd db mongo snapshot --cluster acme-mongo-prod --name pre-migration-2026-05-11

PITR restores: bash cd db mongo restore --cluster acme-mongo-prod \ --target-time "2026-05-11T08:00:00+06:00" \ --target-cluster acme-mongo-restore

Major version upgrade

bash cd db mongo upgrade --cluster acme-mongo-prod --target-version 7.0

Rolling: upgrades each secondary, then steps down primary, then upgrades it. ~30 s of brief failover delay; rest is online.

Slow queries spike after a deploy

Cause Check
New query missing an index db.collection.explain("executionStats").find(...)
Working set > cache mongo.cache.evicted_clean_bytes rising
App misusing operators ($or without indexes) Profile with db.setProfilingLevel(1, 100)

Use the slow profiler to find offenders:

javascript db.system.profile.find({millis: {$gt: 100}}).sort({ts: -1}).limit(20)

Election storm

mongo.replSet.elections_24h > 5:

  • Heartbeat timeouts (network plane issue or congestion)
  • Disk I/O on the primary so slow the heartbeat can't complete
  • A flapping member; pull it from the set, re-add

bash cd db mongo replSet status --cluster acme-mongo-prod

Replication lag persists

mongo.replSet.lag_seconds > 5:

  • Long-running operation on primary blocks oplog (rare; check currentOp)
  • Secondary's storage slow
  • Secondary's network plane saturated

bash cd db mongo currentOp --cluster acme-mongo-prod --filter '{op: "command", millis: {$gt: 1000}}'

Sharded query hits all shards (scatter-gather)

The query doesn't include the shard key. Symptom: every shard's mongo.opCounters ticks, p99 latency much higher than expected.

javascript db.collection.find({...}).explain("executionStats") // Look at shards: should be 1 if the query is targeted

Either add the shard key as a filter, or restructure the collection.

Chunk migration stuck

WARN: chunk migration from shard-01 to shard-02 stuck at 45%

Causes: - Active queries on the chunk's range hold cursors open - Target shard out of space - Network plane congestion

cd db mongo balancer status shows current state. Pause balancer if migration causes user-visible latency; investigate; resume.

Out of disk

The platform auto-provisions disk based on mongo.storage.used_bytes growth; you may still hit ceiling on rapid growth:

  • Drop unused indexes
  • Drop unused collections
  • Resize underlying compute node (also adds storage)
  • Shard horizontally if vertical scaling is tapped out

TLS / auth failures

Standard checks: cert validity, hostname match, SCRAM mechanism matches client driver. Drivers from 2018+ generally fine; 2014-era drivers struggle.