Serverless Containers¶

Service ownership

Owner: container-platform (k8s-pm@clouddigit.ai) — Status: GA — Last audited: 2026-05-11

Run containers without provisioning nodes — billed per request and per CPU-second of execution.

What it is¶

A serverless platform that runs OCI containers in response to HTTP requests. Bring an image, get a URL. Concurrency, autoscaling, and zero-instance scaling are managed for you.

When to pick this over Managed Kubernetes ¶

Workload pattern	Pick
Sporadic / low-volume HTTP API	Serverless Containers
Steady-state long-running services	Managed Kubernetes
Bursty event-driven workers	Serverless Containers
Anything needing custom CRDs / operators / admission webhooks	Managed Kubernetes
Anything needing GPUs	Managed K8s + GPU pool

Container contract¶

Listens on $PORT (set by the platform)
Stateless (any state goes to managed services — DB, Redis, S3)
Boots in < 10 s (longer is supported but counts against cold-start latency)
Logs to stdout / stderr (auto-captured)
1 vCPU + 512 MiB RAM minimum, scaling up to 8 vCPU + 32 GiB RAM

Concurrency model¶

Per-instance concurrency knob (default 80 in-flight requests per container)
Auto-scale to zero when idle (configurable minimum)
Scale up to 1,000 instances per service by default

Pricing¶

Component	Pricing
Per request	Per million
Per CPU-second	Pro-rated, billed in 100 ms increments
Per RAM-GiB-second	Pro-rated
Idle (scale-to-zero)	Free
Min-instance reservation	Hourly per instance

See Pricing.

Operate this service¶

AdministrationOperationTroubleshooting

Run containers without managing a cluster — Cloud Digit handles scheduling, scaling, and isolation.

When this fits¶

Stateless HTTP workloads with spiky traffic
Event-driven jobs (queue consumers, scheduled tasks)
Internal APIs that scale to zero between requests

Not for: stateful workloads, anything with persistent local disk needs beyond ephemeral, or workloads needing host-level customization.

IAM¶

Role	Can do
`serverless.viewer`	List services, view metrics
`serverless.deployer`	Deploy / update services
`serverless.admin`	Above + delete services, configure custom domains, scaling policies

Deployment¶

bash cd serverless deploy \ --name acme-api \ --image registry.cloudigit.bd/acme/api:v1.2.3 \ --port 8080 \ --env DATABASE_URL=secret://acme/db-url \ --min-instances 1 --max-instances 50 \ --memory 512Mi --cpu 1

Returns a stable HTTPS endpoint immediately. Subsequent deploys are zero-downtime (new revision shadows old until healthy).

Networking¶

Two modes: - Public — Internet-exposed HTTPS endpoint (default) - VPC-internal — accessible only from a specified VPC

For workloads that call private databases / internal APIs: enable VPC connectivity:

bash cd serverless network set --service acme-api --vpc acme-prod-vpc --subnet private-a

Concurrency¶

Per-instance concurrency = max simultaneous requests one instance handles. Default 80; tune based on app:

Light requests (e.g., API gateway forwarding): 100+
Heavy compute per request: 1–10
CPU-bound: usually 1

Per-instance saturation triggers a new instance.

Cost shape¶

Billed per request count + per-second of vCPU/RAM while handling requests. Idle instances (when min-instances > 0) bill at a reduced "keep-warm" rate.

Related¶

Metrics¶

Metric	Healthy	Alert
`serverless.requests_per_sec`	varies
`serverless.instance_count`	scales with load	stuck at max for hours
`serverless.cold_start_latency_ms` p95	< 800 ms	> 2 s
`serverless.request_latency_ms` p95	within app SLO	breach
`serverless.errors_per_min`	< target	spike
`serverless.cpu.utilization_pct`	40–70%	sustained > 80%

Cold start mitigation¶

min-instances >= 1 keeps a warm pool (idle billing applies)
Use a slim base image (alpine, distroless)
Avoid heavy initialization in the entry point
Pre-warm via scheduled "ping" probes for predictable traffic patterns

Rollout strategy¶

```bash cd serverless rollout --service acme-api --revision v124 --strategy canary --percent 10

10% of traffic to v124; monitor; promote or roll back¶

cd serverless rollout promote --service acme-api --revision v124 ```

Automatic rollback on error threshold:

bash cd serverless rollout --service acme-api --auto-rollback-on "error-rate > 1%"

Custom domains¶

```bash cd serverless domain add --service acme-api --domain api.acme.com

Returns DNS records to add at your domain¶

```

After DNS propagation, cert is auto-issued (ACM).

Logs¶

bash cd serverless logs --service acme-api --tail cd serverless logs --service acme-api --filter "level=error" --since 1h

Logs ship to S3 (compliance retention) and to a real-time stream for the UI.

Secret consumption¶

Reference secrets in env vars, not the values directly:

bash --env DATABASE_URL=secret://acme/db-url

The platform injects the value at runtime; rotations are picked up on next instance start (cold restart needed).

Related¶

High cold-start latency¶

P95 cold start > 2 s:

Image size > 500 MB — switch to slim base
App's startup logic doing IO (loading large model, DB connection) — defer or pre-warm
VPC connectivity enabled — adds ~200 ms cold start (the ENI attach)

Mitigation: - min-instances >= 1 for latency-sensitive APIs - Snapshot-restore-based starts (Cloud Digit feature for some runtimes) — sub-100 ms cold start

Service stuck at max instances¶

Cause	Check
Real traffic spike	`requests_per_sec` metric
Per-instance concurrency too low	Raise; reduces instance count for same load
Downstream dependency slow	Requests pile up; each instance holds longer
Error-retry storm	Logs show repeating errors

5xx errors¶

bash cd serverless logs --service acme-api --filter "status>=500" --tail

Status	Likely cause
500	App-level exception
502	App crashed / TLS misconfig in container
503	Cold start in progress; client may retry
504	App slow / hung; exceeded request timeout

Deploy failed: `ImagePullFailed`¶

The platform couldn't pull the image. Same checks as K8s ImagePullBackOff: registry auth, image existence, network.

Request timeout¶

Default 60 s. For long-running requests (large file processing, AI inference):

bash cd serverless config --service acme-api --timeout 300

Maximum 900 s. For longer jobs, use a different primitive (queue + worker, scheduled task).

VPC connectivity broken¶

After enabling VPC connectivity, requests to in-VPC services fail:

The ENI subnet has IGW/NAT for outbound?
Security group allows the service traffic?
DNS resolution working (private hostnames need VPC DNS)?

bash cd serverless network show --service acme-api

Cost spike¶

serverless.instance_count and serverless.requests_per_sec over the spike window:

Real traffic (legitimate spike or DDoS — check cdn.requests_per_sec upstream)
Bug producing hot loop (a client retrying excessively)
Recently bumped min-instances for a quiet service

Cap with --max-instances to prevent runaway costs.

Serverless Containers¶

What it is¶

When to pick this over Managed Kubernetes¶

Container contract¶

Concurrency model¶

Pricing¶

Related¶

Operate this service¶

When this fits¶

IAM¶

Deployment¶

Networking¶

Concurrency¶

Cost shape¶

Related¶

Metrics¶

Cold start mitigation¶

Rollout strategy¶

10% of traffic to v124; monitor; promote or roll back¶

Custom domains¶

Returns DNS records to add at your domain¶

Logs¶

Secret consumption¶

Related¶

High cold-start latency¶

Service stuck at max instances¶

5xx errors¶

Deploy failed: ImagePullFailed¶

Request timeout¶

VPC connectivity broken¶

Cost spike¶

Related¶

When to pick this over Managed Kubernetes ¶

Deploy failed: `ImagePullFailed`¶