Managed SIEM¶
Service ownership
Owner: security-platform (security-pm@clouddigit.ai) — Status: GA — Last audited: 2026-05-11
Centralized log analytics and threat detection on top of Managed OpenSearch — sovereign-resident, with retention you can defend in audit.
What it is¶
A SIEM service that:
- Collects events from Cloud Digit services (audit logs, VPC flow logs, WAF logs, LB access logs, K8s audit)
- Collects from your workloads (syslog, agents, application logs)
- Stores in OpenSearch with configurable retention tiers
- Runs detections (Sigma rule-compatible) and raises alerts
Sources, out of the box¶
| Source | Mode |
|---|---|
| Cloud Digit account audit log | Native (no setup) |
| VPC Flow Logs | Native |
| WAF logs | Native |
| Load Balancer access logs | Native |
| Managed Kubernetes audit | Native |
| Managed databases slow / error logs | Native |
| Custom: syslog, OTel, Fluent Bit | Agents shipped |
Detection content¶
- Sigma-rule library — open-source detection content, kept current
- Cloud Digit-authored rules — for misuse of platform APIs (mass-snapshot-export, etc.)
- Customer-authored rules — your own Sigma / DSL rules, version-controlled
- Threat-intel feeds — community + private (Enterprise tier)
Retention tiers¶
| Tier | Storage | Search latency | Use case |
|---|---|---|---|
| Hot | Provisioned IOPS | ms | Last 7–30 days, active hunt |
| Warm | NVMe HCI | 100s of ms | 30–365 days |
| Cold | Object Archive | seconds (rehydrate) | > 1 year, compliance |
Tier transition happens on a per-index lifecycle policy you set.
Alerting¶
- Out to email / webhook / Slack / Microsoft Teams / your ITSM (PagerDuty, Opsgenie, ServiceNow)
- Alert grouping, deduplication, suppression windows
- On-call runbooks attached to alerts
Pricing¶
- Ingest — per GiB-day (low; we don't gouge ingest like off-shore vendors)
- Storage — at the tier rate (Provisioned IOPS / NVMe / Archive)
- Detection content — Sigma is included; premium feeds are an add-on
See Pricing.
Related¶
- Managed OpenSearch — engine underneath
- WAF, DDoS Premium, CSPM — major sources
Operate this service¶
Security Information & Event Management — log aggregation, correlation, and alerting across Cloud Digit + your apps.
Architecture¶
Sources → Ingestion → Storage (hot/warm/cold) → Correlation → Alerts/Dashboards
Sources include: - Audit logs (all CD API calls) - WAF events - VPC flow logs - DDoS mitigation events - CSPM findings - Application logs (your apps) - Endpoint logs (where applicable)
IAM¶
| Role | Can do |
|---|---|
siem.viewer | Search logs, view dashboards |
siem.analyst | Create alerts, build dashboards |
siem.responder | Acknowledge alerts, triage cases |
siem.admin | Configure sources, retention, integrations |
Retention tiers¶
| Tier | Retention | Searchable | Cost |
|---|---|---|---|
| Hot | 7 days | Instant | High |
| Warm | 30 days | < 1 minute | Medium |
| Cold | 365+ days | Hours (rehydrate) | Low (Archive class) |
Standard: 7d hot, 30d warm, 365d cold (compliance default).
Detection content¶
Pre-built detections for: - Common attack patterns (privilege escalation, lateral movement, data exfil) - Bangladesh-specific threats - Compliance violations (e.g., access to PII outside business hours)
Custom rules: sql SELECT principal, count(*) AS attempts FROM events WHERE action = 'iam.login' AND result = 'failed' GROUP BY principal HAVING attempts > 10 WINDOW '5m'
Alerting¶
Tiered: - info — dashboard only - low — email - medium — email + Slack - high — paging - critical — paging + auto-escalation after 15 min
Related¶
Metrics¶
| Metric | Healthy | Alert |
|---|---|---|
siem.ingest.events_per_sec | matches sources | sudden drop (ingestion broken) |
siem.ingest.bytes_per_day | within plan | spikes (chatty source) |
siem.alerts.open_count | < target | climbing |
siem.alerts.mean_time_to_ack_min | < 15 (critical) | breach |
siem.search.query_latency_ms p95 | < 5000 | > 30000 |
Daily SOC routine¶
- Triage overnight alerts
- Investigate
high+critical - Sweep
mediumfor patterns - Update detections based on findings
A SIEM with no daily attention is just expensive log storage.
Threat hunting¶
Periodic proactive search beyond alerts:
bash cd siem search --query ' SELECT principal, action, resource, count(*) AS events FROM events WHERE source = "iam.role_assumption" AND timestamp > now() - interval 7 day AND principal NOT IN (SELECT principal FROM expected_assumptions) GROUP BY 1, 2, 3 '
Hypothesis-driven queries find issues the canned detections miss.
Tuning detections¶
Per detection, weekly review: - True-positive count - False-positive count - Mean time to resolution
bash cd siem detection stats --name 'unusual-login-location' --since 30d
Detections with > 50% false-positive rate need tuning or retirement.
Cold tier search¶
For investigations spanning months:
```bash cd siem search rehydrate --query --since 2026-01-01 --until 2026-04-30
Returns a job ID; result available in 1-4 hours¶
```
Rehydrate is expensive — use only for genuine investigations.
Compliance retention¶
Verify quarterly:
```bash cd siem retention audit --tier all
Should show no gaps in any tier per the configured policy¶
```
Related¶
Ingestion dropped¶
WARN: siem.ingest.events_per_sec dropped from 12k to 4k
A source stopped sending. Diagnose:
```bash cd siem source status --all
Lists sources with last_received_at¶
```
Common causes: - Source IAM principal credentials expired - Source app crashed or stopped logging - Network path broken (VPC route, SG) - Source rate-limited by SIEM ingest quota — bump
Alert flood¶
siem.alerts.open_count climbing fast:
- A detection misfiring (regex too broad)
- Real attack in progress
- Routine event mis-classified
Triage by priority; deduplicate via grouping (group by principal reduces 100 alerts to 1 case).
Query timeouts¶
Searches > 30 s: - Time range too wide - Query missing index hints - Hot/warm tier overloaded
Optimize: - Narrow time range - Use index columns (timestamp, principal, action) - Pre-aggregate for dashboard queries
False negatives¶
Real attack happened but SIEM didn't alert: - Source wasn't ingesting that event type - Detection rule had a gap (specific user-agent or pattern) - Alert routed but suppressed (over-aggressive deduplication)
Add a detection covering the specific pattern; replay historical events to verify the new rule would have caught it.
Storage cost spike¶
siem.ingest.bytes_per_day 2× normal: - Chatty source (a new app emitting verbose logs) - Debug-level logging accidentally enabled in production - A loop in the app producing repeating logs
bash cd siem ingest top-sources --since 24h
Filter at source or sample heavily for verbose-but-low-value events.
Cold-tier rehydrate slow¶
Rehydrate is bandwidth and storage-limited; 1–4 h is normal for a 30-day window. Faster: provide more specific time/principal/action filters.
Compliance retention gap¶
cd siem retention audit reports a gap (events missing from a window): - Ingestion was down during that window - Bug in lifecycle policy moved data prematurely - Account misconfiguration
Engage SRE; rehydrate from cold may recover; if not, document the gap for the auditor.