Managed SIEM¶

Service ownership

Owner: security-platform (security-pm@clouddigit.ai) — Status: GA — Last audited: 2026-05-11

Centralized log analytics and threat detection on top of Managed OpenSearch — sovereign-resident, with retention you can defend in audit.

What it is¶

A SIEM service that:

Collects events from Cloud Digit services (audit logs, VPC flow logs, WAF logs, LB access logs, K8s audit)
Collects from your workloads (syslog, agents, application logs)
Stores in OpenSearch with configurable retention tiers
Runs detections (Sigma rule-compatible) and raises alerts

Sources, out of the box¶

Source	Mode
Cloud Digit account audit log	Native (no setup)
VPC Flow Logs	Native
WAF logs	Native
Load Balancer access logs	Native
Managed Kubernetes audit	Native
Managed databases slow / error logs	Native
Custom: syslog, OTel, Fluent Bit	Agents shipped

Detection content¶

Sigma-rule library — open-source detection content, kept current
Cloud Digit-authored rules — for misuse of platform APIs (mass-snapshot-export, etc.)
Customer-authored rules — your own Sigma / DSL rules, version-controlled
Threat-intel feeds — community + private (Enterprise tier)

Retention tiers¶

Tier	Storage	Search latency	Use case
Hot	Provisioned IOPS	ms	Last 7–30 days, active hunt
Warm	NVMe HCI	100s of ms	30–365 days
Cold	Object Archive	seconds (rehydrate)	> 1 year, compliance

Tier transition happens on a per-index lifecycle policy you set.

Alerting¶

Out to email / webhook / Slack / Microsoft Teams / your ITSM (PagerDuty, Opsgenie, ServiceNow)
Alert grouping, deduplication, suppression windows
On-call runbooks attached to alerts

Pricing¶

Ingest — per GiB-day (low; we don't gouge ingest like off-shore vendors)
Storage — at the tier rate (Provisioned IOPS / NVMe / Archive)
Detection content — Sigma is included; premium feeds are an add-on

See Pricing.

Managed OpenSearch — engine underneath
WAF, DDoS Premium, CSPM — major sources

Operate this service¶

AdministrationOperationTroubleshooting

Security Information & Event Management — log aggregation, correlation, and alerting across Cloud Digit + your apps.

Architecture¶

Sources → Ingestion → Storage (hot/warm/cold) → Correlation → Alerts/Dashboards

Sources include: - Audit logs (all CD API calls) - WAF events - VPC flow logs - DDoS mitigation events - CSPM findings - Application logs (your apps) - Endpoint logs (where applicable)

IAM¶

Role	Can do
`siem.viewer`	Search logs, view dashboards
`siem.analyst`	Create alerts, build dashboards
`siem.responder`	Acknowledge alerts, triage cases
`siem.admin`	Configure sources, retention, integrations

Retention tiers¶

Tier	Retention	Searchable	Cost
Hot	7 days	Instant	High
Warm	30 days	< 1 minute	Medium
Cold	365+ days	Hours (rehydrate)	Low (Archive class)

Standard: 7d hot, 30d warm, 365d cold (compliance default).

Detection content¶

Pre-built detections for: - Common attack patterns (privilege escalation, lateral movement, data exfil) - Bangladesh-specific threats - Compliance violations (e.g., access to PII outside business hours)

Custom rules: sql SELECT principal, count(*) AS attempts FROM events WHERE action = 'iam.login' AND result = 'failed' GROUP BY principal HAVING attempts > 10 WINDOW '5m'

Alerting¶

Tiered: - info — dashboard only - low — email - medium — email + Slack - high — paging - critical — paging + auto-escalation after 15 min

Related¶

Metrics¶

Metric	Healthy	Alert
`siem.ingest.events_per_sec`	matches sources	sudden drop (ingestion broken)
`siem.ingest.bytes_per_day`	within plan	spikes (chatty source)
`siem.alerts.open_count`	< target	climbing
`siem.alerts.mean_time_to_ack_min`	< 15 (critical)	breach
`siem.search.query_latency_ms` p95	< 5000	> 30000

Daily SOC routine¶

Triage overnight alerts
Investigate high + critical
Sweep medium for patterns
Update detections based on findings

A SIEM with no daily attention is just expensive log storage.

Threat hunting¶

Periodic proactive search beyond alerts:

bash cd siem search --query ' SELECT principal, action, resource, count(*) AS events FROM events WHERE source = "iam.role_assumption" AND timestamp > now() - interval 7 day AND principal NOT IN (SELECT principal FROM expected_assumptions) GROUP BY 1, 2, 3 '

Hypothesis-driven queries find issues the canned detections miss.

Tuning detections¶

Per detection, weekly review: - True-positive count - False-positive count - Mean time to resolution

bash cd siem detection stats --name 'unusual-login-location' --since 30d

Detections with > 50% false-positive rate need tuning or retirement.

Cold tier search¶

For investigations spanning months:

```bash cd siem search rehydrate --query --since 2026-01-01 --until 2026-04-30

Returns a job ID; result available in 1-4 hours¶

```

Rehydrate is expensive — use only for genuine investigations.

Compliance retention¶

Verify quarterly:

```bash cd siem retention audit --tier all

Should show no gaps in any tier per the configured policy¶

```

Related¶

Ingestion dropped¶

WARN: siem.ingest.events_per_sec dropped from 12k to 4k

A source stopped sending. Diagnose:

```bash cd siem source status --all

Lists sources with last_received_at¶

```

Common causes: - Source IAM principal credentials expired - Source app crashed or stopped logging - Network path broken (VPC route, SG) - Source rate-limited by SIEM ingest quota — bump

Alert flood¶

siem.alerts.open_count climbing fast:

A detection misfiring (regex too broad)
Real attack in progress
Routine event mis-classified

Triage by priority; deduplicate via grouping (group by principal reduces 100 alerts to 1 case).

Query timeouts¶

Searches > 30 s: - Time range too wide - Query missing index hints - Hot/warm tier overloaded

Optimize: - Narrow time range - Use index columns (timestamp, principal, action) - Pre-aggregate for dashboard queries

False negatives¶

Real attack happened but SIEM didn't alert: - Source wasn't ingesting that event type - Detection rule had a gap (specific user-agent or pattern) - Alert routed but suppressed (over-aggressive deduplication)

Add a detection covering the specific pattern; replay historical events to verify the new rule would have caught it.

Storage cost spike¶

siem.ingest.bytes_per_day 2× normal: - Chatty source (a new app emitting verbose logs) - Debug-level logging accidentally enabled in production - A loop in the app producing repeating logs

bash cd siem ingest top-sources --since 24h

Filter at source or sample heavily for verbose-but-low-value events.

Cold-tier rehydrate slow¶

Rehydrate is bandwidth and storage-limited; 1–4 h is normal for a 30-day window. Faster: provide more specific time/principal/action filters.

Compliance retention gap¶

cd siem retention audit reports a gap (events missing from a window): - Ingestion was down during that window - Bug in lifecycle policy moved data prematurely - Account misconfiguration

Engage SRE; rehydrate from cold may recover; if not, document the gap for the auditor.