Skip to content

Load Balancer

Service ownership

Owner: network-platform (network-pm@clouddigit.ai) — Status: GA — Last audited: 2026-05-11

Managed L4 (TCP/UDP) and L7 (HTTP/HTTPS) load balancing with health checks, sticky sessions, and integration with Auto Scaling Groups.

What it is

Two flavors:

Flavor Layer Protocols Use case
Network LB L4 TCP, UDP, TCP_PROXY High-throughput, lowest latency, non-HTTP
Application LB L7 HTTP/1.1, HTTP/2, HTTPS, gRPC Routing on path/host/headers; TLS termination; WAF integration

Application LB features

  • TLS termination — pin a certificate from your store or use Cloud Digit-managed (auto-renewing)
  • Path / host routing — route by URL prefix, host header, query, headers, source IP
  • Sticky sessions — cookie-based or source-IP-based
  • HTTP/2 + gRPC to the backend
  • Integration with WAF — attach a WAF policy at the LB
  • DDoS protection — every LB is fronted by DDoS Basic; upgrade to Premium per LB
  • Access logs to Object Storage or SIEM

Network LB features

  • Static or pre-warmed — both supported; pre-warm for known traffic spikes
  • TCP_PROXY to preserve client IP without HTTP-level termination
  • UDP load balancing for game servers, VoIP, custom protocols
  • TLS passthrough for SNI-based routing without termination

Health checks

Check Where it makes sense
TCP Anything, lowest cost
HTTP/HTTPS Web/app servers
gRPC gRPC services
Custom (script) Available on request via support

Performance

Property Value
Throughput per LB Up to 100 Gbps
Connections per second 1M (Application LB), 5M (Network LB)
Backends per target group 200 default, 1,000 cap

Pricing

Hourly LB charge + per-LCU (load-balancer-capacity unit, like AWS) + per-GB international egress. Domestic egress over BDIX is free. See Pricing.

Operate this service

L4 + L7 load balancers with TLS termination, health checks, and target-group routing.

LB types

Type Layer Use
lb-net L4 TCP/UDP passthrough, raw throughput
lb-app L7 HTTP/HTTPS, path/host-based routing
lb-gw L3/L4 Inline traffic inspection (paired with WAF)

Pick lb-app for anything that speaks HTTP. lb-net is for non-HTTP (databases, custom protocols).

IAM

Role Can do
lb.viewer List LBs, view metrics
lb.builder Create / modify LB, listeners, target groups
lb.cert-admin Upload / manage TLS certificates
lb.admin Above + delete LBs, modify cross-zone policies

lb.cert-admin is a separate role because cert mishandling has audit implications.

TLS posture

  • TLS 1.3 default; TLS 1.2 allowed; TLS 1.0/1.1 disabled.
  • Certificates from Cloud Digit ACM (free, auto-renewal) or BYO (PEM upload).
  • For regulated workloads: enable mTLS with a customer-controlled CA.

bash cd lb listener create \ --lb acme-web-lb \ --port 443 \ --protocol https \ --tls-policy modern-tls-only \ --cert-arn arn:cd:acm:::cert/abcd

Cross-zone load balancing

Default: ON. Distributes traffic across targets in all AZs, not just same-AZ as the request.

Turn OFF for: - Stateful workloads with sticky requirements - Cost optimization (cross-AZ traffic is metered)

Target group hygiene

  • Health check path: /health, returning 200 only when ready
  • Healthy threshold: 2 (avoid flap on single bad sample)
  • Unhealthy threshold: 3
  • Timeout: > app's worst-case response time

Metrics

Metric Healthy Alert
lb.request_count_per_target balanced ± 20% one target 3× others (placement)
lb.target_5xx_per_min 0 > 0
lb.target_response_time_p99 < 500 ms > 2 s
lb.healthy_target_count matches expected drops
lb.tls_handshake_failures 0 > 0 (cert / protocol mismatch)
lb.connection_resets low climbing

Certificate rotation

ACM certs auto-renew at 60 days remaining. BYO certs require manual:

```bash cd lb cert upload --cert-pem cert.pem --key-pem key.pem --chain-pem chain.pem cd lb listener update --lb acme-web-lb --port 443 --cert-arn

Brief overlap; old cert stays valid until removed

cd lb cert delete --cert-arn ```

Calendar reminder 30 days before BYO cert expiry. Cert lapse = full outage.

Connection draining

Before terminating an instance:

```bash cd lb target deregister --tg --target

Existing connections served; new connections rejected; default drain 30s

```

For long-lived connections (WebSockets, gRPC streams), tune the drain timeout up.

Sticky sessions

Either: - Application-controlled — set a cookie, LB respects it (HTTP only) - Duration-based — LB-issued cookie, 1h-24h sticky window - None (default) — round-robin

Avoid stickiness unless you have to. It interferes with scaling and recovery.

Cross-region LB

A single LB is region-local. Multi-region failover: DNS-level (health-checked DNS record).

Access logging

bash cd lb access-logs enable --lb acme-web-lb --destination s3://acme-lb-logs/

5-minute batched delivery to S3. Combine with a lifecycle rule for retention.

Target showing unhealthy

Diagnostic order:

  1. Is the app actually healthy? SSH in, curl the health endpoint
  2. Health check path — does the LB use /health (correct) or / (often 302-redirect, fails strict check)?
  3. Health check port — same as the target port?
  4. Security group — allows LB CIDR on the health-check port?
  5. Health check response code — LB expects 200 by default; configure to allow others if your app returns 204

```bash cd lb target health --tg --target

Returns: status, last-check-time, reason for failure

```

All targets unhealthy

Probably a config drift — health check or SG.

If it happens after a deploy: the new image broke the /health endpoint. Roll back.

If it happens at scale: a downstream dependency (DB, cache) went down — health check rightly fails.

504 Gateway Timeout

LB couldn't get a response from target within idle-timeout:

  • Increase listener idle-timeout (default 60s; up to 4000s)
  • Or fix the slow target (app perf, DB query)

502 Bad Gateway

Different from 504. The target accepted the connection then closed unexpectedly:

  • App is crashing on request
  • App killed by OOM
  • TLS handshake fails (LB → target with TLS enabled, target's cert misconfigured)

cd lb access-logs query filters logs for 502 events.

Uneven traffic distribution

lb.request_count_per_target shows one target taking 5× others:

  • Sticky sessions — disable temporarily to verify
  • Cross-zone LB disabled + uneven targets per AZ
  • Slow target — fast targets process more requests in the same window. Check lb.target_response_time per target.

TLS handshake failures

WARN: lb.tls_handshake_failures > 0

Causes: - Client speaks TLS 1.0/1.1 (disabled by default) - Cipher mismatch (rare with modern-tls-only) - SNI mismatch (client sends wrong server name) - Expired cert (verify cd lb listener show)

Detail logs: enable lb.detailed_logging; introspect per-request handshake errors.

ACM cert renewal failed

ACM auto-renews at 60 days. Renewal failure:

  • Domain ownership changed (DNS validation broke)
  • Cert is on a domain no longer pointed at the LB

Logs: console LB → Certificates → Renewal history. Re-validate domain ownership and retry.