Power Overage (Metered)¶
Service ownership
Owner: dc-operations (colo-pm@clouddigit.ai) — Status: GA — Last audited: 2026-05-11
When your draw exceeds your committed kW, the difference is billed metered.
How it works¶
- Each rack / cage has a committed kW (the flat amount you bought)
- Smart-PDUs meter actual draw in 5-minute intervals
- Monthly billing shows: committed-kW × 730 h flat + (sum of overage-kWh) × overage rate
The overage rate is intentionally higher than the rate baked into the committed amount — it's not the line-item to optimise for, it's the safety valve for unplanned bursts.
When you'd hit it¶
- Seasonal commerce traffic doubling your traffic / compute draw
- A new GPU node that pushed the cabinet over its commitment
- Maintenance migration that briefly doubles draw
How to size correctly¶
Cloud Digit will work with you to right-size committed kW based on measured 95th-percentile draw plus headroom. After 3 months of operation, we recommend reviewing the commitment.
Pricing¶
See Pricing. Overage is in kWh, billed monthly.
Related¶
Operate this service¶
Pay-as-you-go power for usage above your contracted baseline — useful when you can't predict peak load precisely.
How it works¶
- Rack contract specifies a committed kW per circuit
- Real usage is metered per minute
- Below committed: included in monthly contract
- Above committed: billed per metered kWh at overage rate (~1.5-2× standard)
IAM¶
| Role | Can do |
|---|---|
colo.power.viewer | View power usage and overage charges |
colo.power.admin | Configure overage policy, set alerts |
When overage fits¶
- Workload peak is rare but real (event-driven)
- Right-sizing committed-kW would mean over-paying baseline
- Burst-tolerant workloads where capacity > BDT savings
When NOT to use: steady-state workloads above committed kW — pay for a higher commit, not overage.
Hard limits¶
Even with overage enabled, the circuit breaker is physical: - 32A circuit: ~7.6 kW absolute max - 63A circuit: ~15 kW absolute max
Exceed = breaker trips = workload offline. Overage doesn't extend physical limits.
Alerts¶
bash cd colo power alert set \ --rack rack-acme-bd-dha-1-r042 \ --threshold 85% \ --notify slack-noc
Alert at 85% of breaker (well above committed); set hard cap if you want to avoid surprise overage bills.
Cost capping¶
Optional monthly cap:
```bash cd colo power overage-cap set --rack
When cap hit, the platform notifies; if exceeded, breaker trip protects you¶
```
Trade-off: trips kill workload; no-cap means surprise bill.
Related¶
Metrics¶
| Metric | Healthy | Alert |
|---|---|---|
colo.rack.power_kw (current) | < committed | > committed (overage active) |
colo.rack.power_kwh_overage_mtd | small | climbing |
colo.rack.psu_imbalance_pct | < 20% | > 30% (one leg overloaded) |
colo.rack.breaker_proximity_pct | < 80% | > 90% (breaker risk) |
Daily review¶
Power-aware ops: - Morning check: yesterday's peak, any overage? - Identify the cause (cron jobs, schedule alignment) - Plan: tolerate, shift, or commit higher
Load distribution¶
Within a rack: balance load across PSUs. A rack with 60% on PSU-A and 5% on PSU-B is misbalanced: - Lose PSU-A → workload exceeds PSU-B → cascading shutdown
Use intelligent PDUs to monitor + alert per-outlet.
Cost vs commit tradeoff¶
Monthly: compare overage cost vs the cost of upgrading commit:
| Monthly overage | Commit upgrade cost | Decision |
|---|---|---|
| 5,000 BDT | 15,000 BDT | Stay with overage |
| 25,000 BDT | 15,000 BDT | Upgrade commit |
Re-evaluate every 6 months.
Optimization¶
- Stagger cron jobs to spread peak across hours
- Use GPU MIG for inference to load more workloads per chassis
- Right-size servers; idle servers still draw 30-50% TDP
Related¶
Surprise overage bill¶
Monthly bill higher than expected:
bash cd colo power usage --rack <id> --month current --granularity hour
Identify peak hours/days; correlate with workload. Common surprises: - Backup window concentrated power use - Cron-aligned tasks all firing at midnight - New deploy added 8 servers; total draw exceeded commit
Mitigation: stagger, upgrade commit, or accept ongoing overage.
Breaker tripped¶
Workload offline: 1. Smart-hands resets breaker (10-30 min response) 2. Document the peak that caused it 3. Either reduce load or upgrade circuit (32A → 63A)
Trip is hard limit; overage budget doesn't help.
PSU imbalance trip¶
WARN: Rack PSU-A drew 65% of total; lost PSU-B redundancy buffer
One leg overloaded; if that leg trips, workload exceeds the other leg → cascade. Rebalance via PDU re-cabling (smart-hands).
Overage cap hit¶
If you set a monthly-cap: - 80% → alert - 100% → CD pauses metering; you decide manually - If hit physical breaker too → workload offline
Raise cap mid-month if business-critical; else accept the trip.
Metering disputed¶
You think the meter is wrong:
bash cd colo power audit --rack <id> --month 2026-04
Returns minute-level samples; compare against your DCIM if you log power separately. Discrepancies > 1%: ticket for CD power-team review.
Right-sizing commit¶
After 6 months of usage data: - p95 of hourly peak = good commit baseline - p50 = under-committed, lots of overage - p99.9 = over-committed, wasted base cost
Set commit slightly above p95.