Troubleshooting — Performance & latency¶

When the Console is sluggish, when API calls feel slow, when objects take forever to download — work through this page. Most performance issues come down to: where is the user, where is the resource, and what's on the path between them.

Mental model: where's the latency budget going¶

graph LR
    User[User device]
    ISP[Customer ISP]
    BDIX((BDIX))
    Edge[Cloud Digit edge]
    Region[Region: bd-dha-1 / bd-ctg-1 / bd-syl-1]
    Service[Service backend]

    User --> ISP --> BDIX --> Edge --> Region --> Service

Typical budgets for a domestic Bangladeshi user:

Hop	Typical latency	What dominates it
User → ISP	2–10 ms	Local loop
ISP → BDIX	1–5 ms	ISP backbone
BDIX → Cloud Digit edge	< 1 ms	Cross-connect
Edge → Region	1–5 ms	DC interconnect
Region → Service	< 1 ms	Internal LAN
Total round-trip	~10–30 ms	for snappy interactions

If you're seeing 200+ ms, something's wrong on the path. The diagnostic flow:

graph TD
    A[Console feels slow] --> B[Open DevTools Network tab]
    B --> C{First-byte time}
    C -->|"< 100 ms"| D[Frontend issue]
    C -->|"100-500 ms"| E[Edge / region issue]
    C -->|"> 500 ms"| F[Routing or service issue]

    D --> D1[CPU / extensions / dev console open]
    E --> E1[Region selection / CDN warm-up]
    F --> F1[International transit / service incident]

Console UI feels slow¶

Symptom	Likely cause	Fix
Spinner on every page load	Slow API; check Status	Status
Pages render but interactions lag	Browser CPU at 100% (likely extensions or DevTools open)	Close DevTools when not debugging; disable heavy extensions
Smooth on desktop, choppy on phone	Mobile data + far CDN edge	Test on Wi-Fi; or use the BDIX-direct path
First page after sign-in slow, subsequent fast	Cold start of the SPA bundle (~ 1 MiB JS)	Expected on first load; cached after
Search results take seconds	Project has thousands of resources	Use filters to narrow before searching
Chart redraws slowly	Long time range × many resources × hourly granularity	Switch to daily / monthly granularity in Cost Explorer

API calls are slow¶

For API consumers (CLI, Terraform, your own apps):

Symptom	Likely cause	Fix
First call slow, rest fast	TLS handshake + connection setup	Use a long-lived HTTPS client with connection pooling
Every call adds 200+ ms	International transit (you're not on BDIX)	Move clients onto BDIX network; or use the closer regional endpoint
Burst of calls suddenly slow	Hit the rate limit, getting throttled	Respect `Retry-After`; back off; use bulk endpoints where available
List operations very slow	Listing a project with 10k+ objects	Use pagination + filters; don't list-all
Inconsistent latencies	DNS lookup variance	Pre-resolve and pin; or use SDK with built-in DNS caching

Measuring from your side¶

```bash

Time a single API call end-to-end¶

curl -w "@curl-format.txt" -o /dev/null -s -H "Authorization: Bearer $CD_API_TOKEN" \ https://api.bd-dha-1.clouddigit.ai/v1/compute/servers

Where curl-format.txt is:¶

time_namelookup: %{time_namelookup}\n¶

time_connect: %{time_connect}\n¶

time_appconnect: %{time_appconnect}\n¶

time_pretransfer: %{time_pretransfer}\n¶

time_redirect: %{time_redirect}\n¶

time_starttransfer:%{time_starttransfer}\n¶

time_total: %{time_total}\n¶

```

Compare time_starttransfer to the typical-budget table above. If your overall is > 100 ms while staying domestic, escalate.

Object Storage uploads/downloads¶

Symptom	Likely cause	Fix
Single-file PUT very slow for large files	No multipart	Use multipart upload (≥ 5 MiB parts) — every modern S3 client does this automatically above a threshold
Multipart PUT slow despite parts	Sequential upload	Parallelize parts (default in AWS CLI; configure with `aws configure set s3.max_concurrent_requests 20`)
GET very slow from outside BD	International transit	Use the regional endpoint closest to you; or front with CDN
Mixed performance, region-dependent	Inter-region transfers	Read from the bucket's own region
Slow list-bucket	Bucket has millions of objects	Use prefix-based listing; don't enumerate everything

VM / Kubernetes networking¶

Symptom	Likely cause	Fix
VM pings other VM, but app-level very slow	TCP window scaling not enabled; or MTU mismatch	Confirm `sysctl net.ipv4.tcp_window_scaling=1`; check MTU on the interface
K8s LoadBalancer Service slow to converge	New LB warm-up	Hit it a few times; LBs scale on traffic
Cross-AZ latency > 5 ms	Same region but different AZs	Confirm AZ pairing; this is usually < 5 ms — open a ticket if persistently higher
Pods on different nodes slower than same-node	CNI overhead	Use IPVS over iptables for kube-proxy; consider Cilium with eBPF
HTTP request slow but TCP fast	TLS handshake dominating; or DNS lookup inside the pod	Re-use connections; cache DNS at the pod

Database¶

Symptom	Likely cause	Fix
Postgres query slow despite small DB	Missing index	`EXPLAIN ANALYZE` the query; add the index
Same query slower than last month	Stats out of date	`VACUUM ANALYZE`; check autovacuum is running
Read replicas lag growing	Replication can't keep up with writes	Larger replica; or split read traffic
Connection pool exhausted	Too many short-lived connections	Use pgbouncer in front; reuse connections

For deeper DB ops, see Managed DBA.

"From inside Bangladesh it's fast, from outside it's slow"¶

Cloud Digit is sovereign-resident — services are in BD. Reaching them from outside BD adds international transit latency. Patterns:

Where the user is	Typical added latency
India (Mumbai/Chennai)	30–80 ms
Singapore	60–120 ms
EU	200–300 ms
US East	250–350 ms
US West	350–500 ms

This is inherent — no fix on Cloud Digit's side. Mitigations:

Front public-facing services with CDN for international reads
For interactive admin work, accept the latency; for bulk transfers, schedule overnight
Cache aggressively in app layer

When to open a ticket¶

Open a ticket when:

Performance is materially worse than the budgets above and reproducible
You've ruled out client-side issues (close DevTools, test in incognito, test from another machine)
Multiple users / multiple resources are affected
The performance has changed recently without changes on your end

Include:

Trace IDs of slow API calls (X-Cd-Trace-Id response header)
Timestamps + region(s) affected
curl -w output for representative calls
Network path (where users / clients are; ISP names)
Whether the issue is constant or intermittent (with intermittent pattern if any)

Browsers, sessions & SPA quirks
Troubleshoot — Browser & cache
Networking — VPC + security-group setup
BDIX Peering Direct Connect — for sub-ms domestic
CDN — for international fan-out
Status — platform-wide latency reports
Managed Monitoring / SIEM — for application-side observability