Skip to content

Backup-as-a-Service

Service ownership

Owner: security-platform (security-pm@clouddigit.ai) — Status: GA — Last audited: 2026-05-11

Policy-driven backup orchestration for VMs, volumes, file shares, and managed databases — agent and agentless modes.

What it is

A single backup product that handles snapshots, retention policies, cross-region copies, and lifecycle to Object Archive — across the whole estate. You define policies once and apply them by tag, project, or service.

What it can back up

Source Mode
VMs Agentless (snapshot) or agent (file-aware)
Block volumes Agentless
File Storage shares Agentless
Managed databases Native + WAL/binlog archive
Object Storage Cross-bucket replication / versioning
On-prem / non-Cloud-Digit workloads Agent (Cloud Digit BaaS agent)

Policy model

A backup policy looks like:

yaml name: prod-daily-7d-30d-monthly-1y schedule: daily: "02:00 Asia/Dhaka" retention: daily: 7 weekly: 4 monthly: 12 copy: cross_region: bd-ctg-1 target: storage_class: ARCHIVE object_lock: mode: GOVERNANCE days: 365

Apply by tag: every resource with tag backup-policy=prod-daily-7d-30d-monthly-1y is automatically protected.

Restore options

  • In-place — restore over the existing volume (with a confirmation gate)
  • New resource — restore to a new volume / VM / DB cluster
  • Cross-region — restore in another region directly from a copied backup
  • File-level — for agent-backed sources, mount a backup as a virtual filesystem and pull individual files

Compliance hooks

  • Pair with Object Lock in COMPLIANCE mode for WORM retention that survives root credentials
  • Backup integrity reports — daily checksum verification, signed
  • Backup-success metrics fed into SIEM

Pricing

  • Backup data — billed at Object Archive rates
  • Cross-region copy — at inter-region transfer rates
  • Restore — at standard egress (free domestic / metered international)

See Pricing.

Operate this service

Policy-driven, cross-region, encrypted backup for VMs, volumes, file shares, and databases.

Policy templates

Template RPO Retention Use case
critical-1h 1 hour 7d/30d/365d Tier-1 production
standard-daily 24 hours 7d/30d Standard production
weekly-light 7 days 4 weeks Internal tools
compliance-7yr 24 hours 30d + 7yr archive Regulated workloads

IAM

Role Can do
baas.viewer List backups, view restore tests
baas.builder Configure policies, attach to workloads
baas.restore-operator Trigger restores into a sandbox
baas.admin Above + delete backups, modify retention

Separate restore-operator from admin so restores are auditable and don't require destructive permissions.

Policy attachment

bash cd baas policy attach \ --policy critical-1h \ --target-type vm --target-tag env=prod,tier=1

Tag-based attachment is the canonical pattern — works for new workloads automatically.

Encryption

  • At rest: AES-256-GCM with CMK from Key Manager
  • In transit: TLS 1.3
  • Cross-region: encrypted before leaving source region

Immutability

Compliance-tier backups can be immutable for the retention window — protects against ransomware:

bash cd baas policy set --policy compliance-7yr --immutable true

Once written, can't be deleted (even by admins) until retention expires.

Compliance reporting

Monthly attestation report: - Workloads with vs. without backup policy - Last successful backup per workload - Restore drills run in the period - Failed backups by reason

Metrics

Metric Healthy Alert
baas.backups_24h.success_count matches policies drops
baas.backups_24h.failure_count 0 > 0
baas.policy.workloads_uncovered 0 > 0 (a tagged workload missing policy)
baas.storage.gb_consumed grows slowly sudden 2× jump (deduplication broken)
baas.restore.last_test_age_days < 90 > 90 (overdue)

Restore drills

Quarterly per workload class. Pattern:

bash cd baas restore \ --backup-id <id> \ --target-project acme-restore-sandbox \ --new-name restore-test-2026-05-11

Validate: app starts, data looks right, query a known record. Time-to-restore is the metric to track — improving from 4 h to 30 min is a real DR win.

Verifying backup integrity

Backups can technically be invalid (corrupt source, partial copy, encryption-key issue). Periodic integrity check:

```bash cd baas verify --backup-id

Reads the backup, verifies checksums; does not restore

```

Monthly verification on a sampled subset is standard for compliance-tier workloads.

Cross-region replication

Default for compliance-tier policies: write to peer BD region. Monitor:

Metric Healthy Alert
baas.cross_region.lag_seconds < 3600 (1 hour) > 7200
baas.cross_region.failures_24h 0 > 0

Retention pruning

Old backups auto-delete per policy. Verify with sample audit:

```bash cd baas list --policy compliance-7yr --older-than 7y

Should be empty

```

Recovery time targets

Document per workload:

Workload class RPO RTO
Tier-1 prod 1 h 4 h
Tier-2 prod 24 h 24 h
Internal 7 d 72 h

Match policies to targets. Beware: "we'd like 0 RPO" without justifying the cost is common.

Backup failed

```bash cd baas backup show --backup-id

Look at status_reason

```

Reason Fix
SourceUnreachable Source VM stopped; either pause backups for stopped or start the VM
SnapshotFailed See Snapshot troubleshooting
EncryptionKeyAccessDenied KMS key policy doesn't grant BaaS principal kms:Encrypt
StorageQuotaExceeded BaaS storage quota hit; bump
NetworkTimeout Source network congestion; retry

Restore is slow

Cause Mitigation
Cross-region restore Use same-region restore for drills
Backup age > 30 days (Archive tier) Thaw time adds hours
Volume size huge Restore parallelism limited by destination IOPS
Compute capacity in target region Test restore destination before relying on it

Restored VM won't boot

Likely the same as snapshot restore issues — image/flavor mismatch, boot disk corruption at backup time.

If application-consistent backups have been working: try the previous backup.

Workload missing from coverage

baas.policy.workloads_uncovered > 0:

```bash cd baas coverage report --project acme-prod

Lists tagged workloads without an attached policy

```

Either: - The workload's tags don't match any policy's tag selector - The policy was attached but the workload was created before the policy existed (depending on policy version)

Re-apply: cd baas policy reattach --policy critical-1h.

Cross-region replication lag

baas.cross_region.lag_seconds > 7200:

  • Source region produces backups faster than the inter-region link can replicate
  • Target region storage quota — backups land but can't write
  • Inter-region link congestion (rare)

bash cd baas replication status --policy critical-1h

"Immutable backup cannot be deleted"

Working as designed. Compliance-immutable backups cannot be deleted before retention expiry, even by admins.

Document the storage cost as a compliance cost; don't try to work around it.

Failed verify

cd baas verify returns invalid:

  • Source corruption at backup time (rare; SRE will investigate)
  • Encryption key version revoked (CMK rotation gone wrong)
  • Storage backend issue

Don't proceed to a restore from a failed-verify backup; try an earlier one.