Skip to content

Power Overage (Metered)

Service ownership

Owner: dc-operations (colo-pm@clouddigit.ai) — Status: GA — Last audited: 2026-05-11

When your draw exceeds your committed kW, the difference is billed metered.

How it works

  • Each rack / cage has a committed kW (the flat amount you bought)
  • Smart-PDUs meter actual draw in 5-minute intervals
  • Monthly billing shows: committed-kW × 730 h flat + (sum of overage-kWh) × overage rate

The overage rate is intentionally higher than the rate baked into the committed amount — it's not the line-item to optimise for, it's the safety valve for unplanned bursts.

When you'd hit it

  • Seasonal commerce traffic doubling your traffic / compute draw
  • A new GPU node that pushed the cabinet over its commitment
  • Maintenance migration that briefly doubles draw

How to size correctly

Cloud Digit will work with you to right-size committed kW based on measured 95th-percentile draw plus headroom. After 3 months of operation, we recommend reviewing the commitment.

Pricing

See Pricing. Overage is in kWh, billed monthly.

Operate this service

Pay-as-you-go power for usage above your contracted baseline — useful when you can't predict peak load precisely.

How it works

  • Rack contract specifies a committed kW per circuit
  • Real usage is metered per minute
  • Below committed: included in monthly contract
  • Above committed: billed per metered kWh at overage rate (~1.5-2× standard)

IAM

Role Can do
colo.power.viewer View power usage and overage charges
colo.power.admin Configure overage policy, set alerts

When overage fits

  • Workload peak is rare but real (event-driven)
  • Right-sizing committed-kW would mean over-paying baseline
  • Burst-tolerant workloads where capacity > BDT savings

When NOT to use: steady-state workloads above committed kW — pay for a higher commit, not overage.

Hard limits

Even with overage enabled, the circuit breaker is physical: - 32A circuit: ~7.6 kW absolute max - 63A circuit: ~15 kW absolute max

Exceed = breaker trips = workload offline. Overage doesn't extend physical limits.

Alerts

bash cd colo power alert set \ --rack rack-acme-bd-dha-1-r042 \ --threshold 85% \ --notify slack-noc

Alert at 85% of breaker (well above committed); set hard cap if you want to avoid surprise overage bills.

Cost capping

Optional monthly cap:

```bash cd colo power overage-cap set --rack --monthly-cap-bdt 50000

When cap hit, the platform notifies; if exceeded, breaker trip protects you

```

Trade-off: trips kill workload; no-cap means surprise bill.

Metrics

Metric Healthy Alert
colo.rack.power_kw (current) < committed > committed (overage active)
colo.rack.power_kwh_overage_mtd small climbing
colo.rack.psu_imbalance_pct < 20% > 30% (one leg overloaded)
colo.rack.breaker_proximity_pct < 80% > 90% (breaker risk)

Daily review

Power-aware ops: - Morning check: yesterday's peak, any overage? - Identify the cause (cron jobs, schedule alignment) - Plan: tolerate, shift, or commit higher

Load distribution

Within a rack: balance load across PSUs. A rack with 60% on PSU-A and 5% on PSU-B is misbalanced: - Lose PSU-A → workload exceeds PSU-B → cascading shutdown

Use intelligent PDUs to monitor + alert per-outlet.

Cost vs commit tradeoff

Monthly: compare overage cost vs the cost of upgrading commit:

Monthly overage Commit upgrade cost Decision
5,000 BDT 15,000 BDT Stay with overage
25,000 BDT 15,000 BDT Upgrade commit

Re-evaluate every 6 months.

Optimization

  • Stagger cron jobs to spread peak across hours
  • Use GPU MIG for inference to load more workloads per chassis
  • Right-size servers; idle servers still draw 30-50% TDP

Surprise overage bill

Monthly bill higher than expected:

bash cd colo power usage --rack <id> --month current --granularity hour

Identify peak hours/days; correlate with workload. Common surprises: - Backup window concentrated power use - Cron-aligned tasks all firing at midnight - New deploy added 8 servers; total draw exceeded commit

Mitigation: stagger, upgrade commit, or accept ongoing overage.

Breaker tripped

Workload offline: 1. Smart-hands resets breaker (10-30 min response) 2. Document the peak that caused it 3. Either reduce load or upgrade circuit (32A → 63A)

Trip is hard limit; overage budget doesn't help.

PSU imbalance trip

WARN: Rack PSU-A drew 65% of total; lost PSU-B redundancy buffer

One leg overloaded; if that leg trips, workload exceeds the other leg → cascade. Rebalance via PDU re-cabling (smart-hands).

Overage cap hit

If you set a monthly-cap: - 80% → alert - 100% → CD pauses metering; you decide manually - If hit physical breaker too → workload offline

Raise cap mid-month if business-critical; else accept the trip.

Metering disputed

You think the meter is wrong:

bash cd colo power audit --rack <id> --month 2026-04

Returns minute-level samples; compare against your DCIM if you log power separately. Discrepancies > 1%: ticket for CD power-team review.

Right-sizing commit

After 6 months of usage data: - p95 of hourly peak = good commit baseline - p50 = under-committed, lots of overage - p99.9 = over-committed, wasted base cost

Set commit slightly above p95.