Troubleshooting — Performance & latency¶
When the Console is sluggish, when API calls feel slow, when objects take forever to download — work through this page. Most performance issues come down to: where is the user, where is the resource, and what's on the path between them.
Mental model: where's the latency budget going¶
graph LR
User[User device]
ISP[Customer ISP]
BDIX((BDIX))
Edge[Cloud Digit edge]
Region[Region: bd-dha-1 / bd-ctg-1 / bd-syl-1]
Service[Service backend]
User --> ISP --> BDIX --> Edge --> Region --> Service Typical budgets for a domestic Bangladeshi user:
| Hop | Typical latency | What dominates it |
|---|---|---|
| User → ISP | 2–10 ms | Local loop |
| ISP → BDIX | 1–5 ms | ISP backbone |
| BDIX → Cloud Digit edge | < 1 ms | Cross-connect |
| Edge → Region | 1–5 ms | DC interconnect |
| Region → Service | < 1 ms | Internal LAN |
| Total round-trip | ~10–30 ms | for snappy interactions |
If you're seeing 200+ ms, something's wrong on the path. The diagnostic flow:
graph TD
A[Console feels slow] --> B[Open DevTools Network tab]
B --> C{First-byte time}
C -->|"< 100 ms"| D[Frontend issue]
C -->|"100-500 ms"| E[Edge / region issue]
C -->|"> 500 ms"| F[Routing or service issue]
D --> D1[CPU / extensions / dev console open]
E --> E1[Region selection / CDN warm-up]
F --> F1[International transit / service incident] Console UI feels slow¶
| Symptom | Likely cause | Fix |
|---|---|---|
| Spinner on every page load | Slow API; check Status | Status |
| Pages render but interactions lag | Browser CPU at 100% (likely extensions or DevTools open) | Close DevTools when not debugging; disable heavy extensions |
| Smooth on desktop, choppy on phone | Mobile data + far CDN edge | Test on Wi-Fi; or use the BDIX-direct path |
| First page after sign-in slow, subsequent fast | Cold start of the SPA bundle (~ 1 MiB JS) | Expected on first load; cached after |
| Search results take seconds | Project has thousands of resources | Use filters to narrow before searching |
| Chart redraws slowly | Long time range × many resources × hourly granularity | Switch to daily / monthly granularity in Cost Explorer |
API calls are slow¶
For API consumers (CLI, Terraform, your own apps):
| Symptom | Likely cause | Fix |
|---|---|---|
| First call slow, rest fast | TLS handshake + connection setup | Use a long-lived HTTPS client with connection pooling |
| Every call adds 200+ ms | International transit (you're not on BDIX) | Move clients onto BDIX network; or use the closer regional endpoint |
| Burst of calls suddenly slow | Hit the rate limit, getting throttled | Respect Retry-After; back off; use bulk endpoints where available |
| List operations very slow | Listing a project with 10k+ objects | Use pagination + filters; don't list-all |
| Inconsistent latencies | DNS lookup variance | Pre-resolve and pin; or use SDK with built-in DNS caching |
Measuring from your side¶
```bash
Time a single API call end-to-end¶
curl -w "@curl-format.txt" -o /dev/null -s -H "Authorization: Bearer $CD_API_TOKEN" \ https://api.bd-dha-1.clouddigit.ai/v1/compute/servers
Where curl-format.txt is:¶
time_namelookup: %{time_namelookup}\n¶
time_connect: %{time_connect}\n¶
time_appconnect: %{time_appconnect}\n¶
time_pretransfer: %{time_pretransfer}\n¶
time_redirect: %{time_redirect}\n¶
time_starttransfer:%{time_starttransfer}\n¶
time_total: %{time_total}\n¶
```
Compare time_starttransfer to the typical-budget table above. If your overall is > 100 ms while staying domestic, escalate.
Object Storage uploads/downloads¶
| Symptom | Likely cause | Fix |
|---|---|---|
| Single-file PUT very slow for large files | No multipart | Use multipart upload (≥ 5 MiB parts) — every modern S3 client does this automatically above a threshold |
| Multipart PUT slow despite parts | Sequential upload | Parallelize parts (default in AWS CLI; configure with aws configure set s3.max_concurrent_requests 20) |
| GET very slow from outside BD | International transit | Use the regional endpoint closest to you; or front with CDN |
| Mixed performance, region-dependent | Inter-region transfers | Read from the bucket's own region |
| Slow list-bucket | Bucket has millions of objects | Use prefix-based listing; don't enumerate everything |
VM / Kubernetes networking¶
| Symptom | Likely cause | Fix |
|---|---|---|
| VM pings other VM, but app-level very slow | TCP window scaling not enabled; or MTU mismatch | Confirm sysctl net.ipv4.tcp_window_scaling=1; check MTU on the interface |
| K8s LoadBalancer Service slow to converge | New LB warm-up | Hit it a few times; LBs scale on traffic |
| Cross-AZ latency > 5 ms | Same region but different AZs | Confirm AZ pairing; this is usually < 5 ms — open a ticket if persistently higher |
| Pods on different nodes slower than same-node | CNI overhead | Use IPVS over iptables for kube-proxy; consider Cilium with eBPF |
| HTTP request slow but TCP fast | TLS handshake dominating; or DNS lookup inside the pod | Re-use connections; cache DNS at the pod |
Database¶
| Symptom | Likely cause | Fix |
|---|---|---|
| Postgres query slow despite small DB | Missing index | EXPLAIN ANALYZE the query; add the index |
| Same query slower than last month | Stats out of date | VACUUM ANALYZE; check autovacuum is running |
| Read replicas lag growing | Replication can't keep up with writes | Larger replica; or split read traffic |
| Connection pool exhausted | Too many short-lived connections | Use pgbouncer in front; reuse connections |
For deeper DB ops, see Managed DBA.
"From inside Bangladesh it's fast, from outside it's slow"¶
Cloud Digit is sovereign-resident — services are in BD. Reaching them from outside BD adds international transit latency. Patterns:
| Where the user is | Typical added latency |
|---|---|
| India (Mumbai/Chennai) | 30–80 ms |
| Singapore | 60–120 ms |
| EU | 200–300 ms |
| US East | 250–350 ms |
| US West | 350–500 ms |
This is inherent — no fix on Cloud Digit's side. Mitigations:
- Front public-facing services with CDN for international reads
- For interactive admin work, accept the latency; for bulk transfers, schedule overnight
- Cache aggressively in app layer
When to open a ticket¶
Open a ticket when:
- Performance is materially worse than the budgets above and reproducible
- You've ruled out client-side issues (close DevTools, test in incognito, test from another machine)
- Multiple users / multiple resources are affected
- The performance has changed recently without changes on your end
Include:
- Trace IDs of slow API calls (
X-Cd-Trace-Idresponse header) - Timestamps + region(s) affected
curl -woutput for representative calls- Network path (where users / clients are; ISP names)
- Whether the issue is constant or intermittent (with intermittent pattern if any)
Related¶
- Browsers, sessions & SPA quirks
- Troubleshoot — Browser & cache
- Networking — VPC + security-group setup
- BDIX Peering Direct Connect — for sub-ms domestic
- CDN — for international fan-out
- Status — platform-wide latency reports
- Managed Monitoring / SIEM — for application-side observability