Kubernetes¶
The Kubernetes tab manages your project's managed Kubernetes clusters — provision the control plane, size node pools, get a kubeconfig, run upgrades. Underlying service: Managed Kubernetes (CaaS).
Overview¶
Kubernetes tab — clusters in this project, with their version, node-pool summary, and health.
Columns:
| Column | What it shows |
|---|---|
| Name | Cluster name |
| Version | Kubernetes minor (e.g. 1.30, 1.29) |
| Node pools | Count + total node count |
| Status | Active, Updating, Failed, Deleting |
| Region | Where the control plane lives |
| Created | Relative timestamp |
Click a cluster row to drill into its detail view: control-plane info, node-pool list, addons, kubeconfig download, upgrade history.
Administration¶
Quotas¶
Project Settings → Quota → Kubernetes:
- Clusters per region per project — default 25 (bumpable)
- Nodes per cluster — default 1,000
- Node pools per cluster — default 30
Worker nodes themselves consume the project's compute quota (vCPU + RAM), so cluster sizing is also gated by Project Settings → Quota → Compute.
Defaults¶
- Default Kubernetes version for new clusters — defaults to "Recommended" (currently 1.30); can pin to a specific minor for organizations that batch upgrades quarterly
- Default ingress controller — NGINX (default) or Traefik (per-cluster choice)
- Default CNI — Calico (default) or Cilium
Maintenance windows¶
Per-cluster — set the day-of-week + time window when minor patches are applied. Outside the window, only emergency CVEs land.
Operation¶
Creating a cluster¶
+ Create Cluster:
- Name — also becomes part of node DNS names
- Region — control plane lives here; node pools can span AZs in this region
- Kubernetes version — Recommended / specific minor
- VPC + subnet — the cluster's worker nodes live here; pod CIDR is non-overlapping (default
10.244.0.0/16) - Service CIDR — default
10.96.0.0/12 - CNI — Calico / Cilium (Cilium for advanced features like Hubble)
- Ingress controller — NGINX / Traefik
- Pod-to-pod encryption — opt-in (small CPU cost)
- Initial node pool — flavor, autoscale envelope, taints, labels
- Tags
The control plane provisions in ~5 minutes. Node pool comes up after that. Total time-to-ready ~7–10 minutes for a single-pool cluster.
Adding a node pool¶
Cluster detail → Node Pools → + Add pool:
- Name
- Flavor —
std-*,mem-*,cpu-*, orgpu-*for GPU pools - Autoscale —
min,max,desired - Subnet — within the cluster's VPC
- Taints / labels — for workload pinning
- Tags
Node pools with GPU flavors must be in regions that have GPU capacity (currently bd-dha-1 GA, others preview).
Getting kubeconfig¶
Cluster detail → Access → Download kubeconfig. The downloaded file embeds an OIDC token issued for your user identity, scoped to the cluster's namespace policies.
bash export KUBECONFIG=~/Downloads/kubeconfig-mycluster.yaml kubectl get nodes
For CI/CD, issue a service-account kubeconfig instead: cluster detail → Access → + New CI kubeconfig → name + RBAC scope.
Upgrading¶
Cluster detail → Version → Upgrade to
- Drains the control plane (HA — no downtime)
- Upgrades workers one node-pool at a time, with PDB-respecting drains
- Reports per-step progress
You can pause at any step. Major-version upgrades (one minor at a time) are explicit; Cloud Digit does not auto-upgrade across minors.
Deleting¶
Cluster detail → Settings → Danger zone → Delete cluster. Asks for the cluster name as confirmation. Worker nodes terminate, control plane is destroyed, and any LoadBalancer Services release their floating IPs. Persistent volumes survive (they're separate Block Storage objects).
Troubleshooting¶
| Symptom | Likely cause | Fix |
|---|---|---|
Cluster stuck in Updating for > 30 min | Stuck PDB on a workload that won't drain | kubectl describe pdb -A from the cluster admin's machine; manually evict the offending pod |
| Node pool won't scale up | Compute quota exhausted; or no capacity for the flavor in this region | Check Project Settings → Quota → Compute; try a different flavor |
kubectl returns Unauthorized | OIDC token expired (kubeconfig has a 12h embedded token) | Re-download kubeconfig, or run kubectl oidc-login if you've configured the helper |
LoadBalancer Service stuck <pending> | Floating IP quota exhausted; or LB controller not running | Check the cluster events; bump floating IP quota |
| Pod scheduling fails: "no nodes available" | Taints / node selectors don't match any pool | Review pool labels and pod nodeSelectors; add a matching pool if needed |
Persistent volume claim stuck Pending | CSI driver not provisioning; or storage class misconfigured | Check kubectl describe pvc <name>; common issue is wrong storageClassName (use nvme-hci or provisioned-iops) |
| Container Registry pulls fail | imagePullSecret missing or wrong | Use the cluster's built-in registry credential helper instead of imagePullSecrets — it uses cluster identity |
Cluster events show MemoryPressure on a node | Workload over-committed | Inspect with kubectl top nodes; resize pool or scale up |
What the platform manages vs what you manage¶
| Layer | Managed by Cloud Digit | You manage |
|---|---|---|
| Control plane (etcd, API, scheduler, controller-manager) | ✓ | — |
| K8s minor version upgrades | ✓ (you trigger via Console) | — |
| Worker OS patching | ✓ (in maintenance window) | — |
| System addons (CoreDNS, CNI, ingress, metrics-server, CSI) | ✓ | — |
| Worker node sizing + count | — | ✓ |
| Workloads (Deployments, Services, etc.) | — | ✓ |
| RBAC inside the cluster | — | ✓ |
| In-cluster observability (Prometheus, Grafana, etc.) | — | ✓ (or use Managed K8s Operations) |
Related¶
- Managed Kubernetes (CaaS) — full service docs
- Container Registry
- Serverless Containers
- Servers — the underlying VMs
- Networking — VPC + LoadBalancer Services
- Volumes — persistent volume backend
- Managed Kubernetes Operations — day-2 ops as a service