Skip to content

Troubleshooting — Performance & latency

When the Console is sluggish, when API calls feel slow, when objects take forever to download — work through this page. Most performance issues come down to: where is the user, where is the resource, and what's on the path between them.

Mental model: where's the latency budget going

graph LR
    User[User device]
    ISP[Customer ISP]
    BDIX((BDIX))
    Edge[Cloud Digit edge]
    Region[Region: bd-dha-1 / bd-ctg-1 / bd-syl-1]
    Service[Service backend]

    User --> ISP --> BDIX --> Edge --> Region --> Service

Typical budgets for a domestic Bangladeshi user:

Hop Typical latency What dominates it
User → ISP 2–10 ms Local loop
ISP → BDIX 1–5 ms ISP backbone
BDIX → Cloud Digit edge < 1 ms Cross-connect
Edge → Region 1–5 ms DC interconnect
Region → Service < 1 ms Internal LAN
Total round-trip ~10–30 ms for snappy interactions

If you're seeing 200+ ms, something's wrong on the path. The diagnostic flow:

graph TD
    A[Console feels slow] --> B[Open DevTools Network tab]
    B --> C{First-byte time}
    C -->|"< 100 ms"| D[Frontend issue]
    C -->|"100-500 ms"| E[Edge / region issue]
    C -->|"> 500 ms"| F[Routing or service issue]

    D --> D1[CPU / extensions / dev console open]
    E --> E1[Region selection / CDN warm-up]
    F --> F1[International transit / service incident]

Console UI feels slow

Symptom Likely cause Fix
Spinner on every page load Slow API; check Status Status
Pages render but interactions lag Browser CPU at 100% (likely extensions or DevTools open) Close DevTools when not debugging; disable heavy extensions
Smooth on desktop, choppy on phone Mobile data + far CDN edge Test on Wi-Fi; or use the BDIX-direct path
First page after sign-in slow, subsequent fast Cold start of the SPA bundle (~ 1 MiB JS) Expected on first load; cached after
Search results take seconds Project has thousands of resources Use filters to narrow before searching
Chart redraws slowly Long time range × many resources × hourly granularity Switch to daily / monthly granularity in Cost Explorer

API calls are slow

For API consumers (CLI, Terraform, your own apps):

Symptom Likely cause Fix
First call slow, rest fast TLS handshake + connection setup Use a long-lived HTTPS client with connection pooling
Every call adds 200+ ms International transit (you're not on BDIX) Move clients onto BDIX network; or use the closer regional endpoint
Burst of calls suddenly slow Hit the rate limit, getting throttled Respect Retry-After; back off; use bulk endpoints where available
List operations very slow Listing a project with 10k+ objects Use pagination + filters; don't list-all
Inconsistent latencies DNS lookup variance Pre-resolve and pin; or use SDK with built-in DNS caching

Measuring from your side

```bash

Time a single API call end-to-end

curl -w "@curl-format.txt" -o /dev/null -s -H "Authorization: Bearer $CD_API_TOKEN" \ https://api.bd-dha-1.clouddigit.ai/v1/compute/servers

Where curl-format.txt is:

time_namelookup: %{time_namelookup}\n

time_connect: %{time_connect}\n

time_appconnect: %{time_appconnect}\n

time_pretransfer: %{time_pretransfer}\n

time_redirect: %{time_redirect}\n

time_starttransfer:%{time_starttransfer}\n

time_total: %{time_total}\n

```

Compare time_starttransfer to the typical-budget table above. If your overall is > 100 ms while staying domestic, escalate.

Object Storage uploads/downloads

Symptom Likely cause Fix
Single-file PUT very slow for large files No multipart Use multipart upload (≥ 5 MiB parts) — every modern S3 client does this automatically above a threshold
Multipart PUT slow despite parts Sequential upload Parallelize parts (default in AWS CLI; configure with aws configure set s3.max_concurrent_requests 20)
GET very slow from outside BD International transit Use the regional endpoint closest to you; or front with CDN
Mixed performance, region-dependent Inter-region transfers Read from the bucket's own region
Slow list-bucket Bucket has millions of objects Use prefix-based listing; don't enumerate everything

VM / Kubernetes networking

Symptom Likely cause Fix
VM pings other VM, but app-level very slow TCP window scaling not enabled; or MTU mismatch Confirm sysctl net.ipv4.tcp_window_scaling=1; check MTU on the interface
K8s LoadBalancer Service slow to converge New LB warm-up Hit it a few times; LBs scale on traffic
Cross-AZ latency > 5 ms Same region but different AZs Confirm AZ pairing; this is usually < 5 ms — open a ticket if persistently higher
Pods on different nodes slower than same-node CNI overhead Use IPVS over iptables for kube-proxy; consider Cilium with eBPF
HTTP request slow but TCP fast TLS handshake dominating; or DNS lookup inside the pod Re-use connections; cache DNS at the pod

Database

Symptom Likely cause Fix
Postgres query slow despite small DB Missing index EXPLAIN ANALYZE the query; add the index
Same query slower than last month Stats out of date VACUUM ANALYZE; check autovacuum is running
Read replicas lag growing Replication can't keep up with writes Larger replica; or split read traffic
Connection pool exhausted Too many short-lived connections Use pgbouncer in front; reuse connections

For deeper DB ops, see Managed DBA.

"From inside Bangladesh it's fast, from outside it's slow"

Cloud Digit is sovereign-resident — services are in BD. Reaching them from outside BD adds international transit latency. Patterns:

Where the user is Typical added latency
India (Mumbai/Chennai) 30–80 ms
Singapore 60–120 ms
EU 200–300 ms
US East 250–350 ms
US West 350–500 ms

This is inherent — no fix on Cloud Digit's side. Mitigations:

  • Front public-facing services with CDN for international reads
  • For interactive admin work, accept the latency; for bulk transfers, schedule overnight
  • Cache aggressively in app layer

When to open a ticket

Open a ticket when:

  • Performance is materially worse than the budgets above and reproducible
  • You've ruled out client-side issues (close DevTools, test in incognito, test from another machine)
  • Multiple users / multiple resources are affected
  • The performance has changed recently without changes on your end

Include:

  • Trace IDs of slow API calls (X-Cd-Trace-Id response header)
  • Timestamps + region(s) affected
  • curl -w output for representative calls
  • Network path (where users / clients are; ISP names)
  • Whether the issue is constant or intermittent (with intermittent pattern if any)