Kubernetes¶

The Kubernetes tab manages your project's managed Kubernetes clusters — provision the control plane, size node pools, get a kubeconfig, run upgrades. Underlying service: Managed Kubernetes (CaaS).

Overview¶

Kubernetes clusters list Kubernetes tab — clusters in this project, with their version, node-pool summary, and health.

Columns:

Column	What it shows
Name	Cluster name
Version	Kubernetes minor (e.g. `1.30`, `1.29`)
Node pools	Count + total node count
Status	`Active`, `Updating`, `Failed`, `Deleting`
Region	Where the control plane lives
Created	Relative timestamp

Click a cluster row to drill into its detail view: control-plane info, node-pool list, addons, kubeconfig download, upgrade history.

Administration¶

Quotas¶

Project Settings → Quota → Kubernetes:

Clusters per region per project — default 25 (bumpable)
Nodes per cluster — default 1,000
Node pools per cluster — default 30

Worker nodes themselves consume the project's compute quota (vCPU + RAM), so cluster sizing is also gated by Project Settings → Quota → Compute.

Defaults¶

Default Kubernetes version for new clusters — defaults to "Recommended" (currently 1.30); can pin to a specific minor for organizations that batch upgrades quarterly
Default ingress controller — NGINX (default) or Traefik (per-cluster choice)
Default CNI — Calico (default) or Cilium

Maintenance windows¶

Per-cluster — set the day-of-week + time window when minor patches are applied. Outside the window, only emergency CVEs land.

Operation¶

Creating a cluster¶

+ Create Cluster:

Name — also becomes part of node DNS names
Region — control plane lives here; node pools can span AZs in this region
Kubernetes version — Recommended / specific minor
VPC + subnet — the cluster's worker nodes live here; pod CIDR is non-overlapping (default 10.244.0.0/16)
Service CIDR — default 10.96.0.0/12
CNI — Calico / Cilium (Cilium for advanced features like Hubble)
Ingress controller — NGINX / Traefik
Pod-to-pod encryption — opt-in (small CPU cost)
Initial node pool — flavor, autoscale envelope, taints, labels
Tags

The control plane provisions in ~5 minutes. Node pool comes up after that. Total time-to-ready ~7–10 minutes for a single-pool cluster.

Adding a node pool¶

Cluster detail → Node Pools → + Add pool:

Name
Flavor — std-*, mem-*, cpu-*, or gpu-* for GPU pools
Autoscale — min, max, desired
Subnet — within the cluster's VPC
Taints / labels — for workload pinning
Tags

Node pools with GPU flavors must be in regions that have GPU capacity (currently bd-dha-1 GA, others preview).

Getting kubeconfig¶

Cluster detail → Access → Download kubeconfig. The downloaded file embeds an OIDC token issued for your user identity, scoped to the cluster's namespace policies.

bash export KUBECONFIG=~/Downloads/kubeconfig-mycluster.yaml kubectl get nodes

For CI/CD, issue a service-account kubeconfig instead: cluster detail → Access → + New CI kubeconfig → name + RBAC scope.

Upgrading¶

Cluster detail → Version → Upgrade to . The wizard:

Drains the control plane (HA — no downtime)
Upgrades workers one node-pool at a time, with PDB-respecting drains
Reports per-step progress

You can pause at any step. Major-version upgrades (one minor at a time) are explicit; Cloud Digit does not auto-upgrade across minors.

Deleting¶

Cluster detail → Settings → Danger zone → Delete cluster. Asks for the cluster name as confirmation. Worker nodes terminate, control plane is destroyed, and any LoadBalancer Services release their floating IPs. Persistent volumes survive (they're separate Block Storage objects).

Troubleshooting¶

Symptom	Likely cause	Fix
Cluster stuck in `Updating` for > 30 min	Stuck PDB on a workload that won't drain	`kubectl describe pdb -A` from the cluster admin's machine; manually evict the offending pod
Node pool won't scale up	Compute quota exhausted; or no capacity for the flavor in this region	Check Project Settings → Quota → Compute; try a different flavor
`kubectl` returns `Unauthorized`	OIDC token expired (kubeconfig has a 12h embedded token)	Re-download kubeconfig, or run `kubectl oidc-login` if you've configured the helper
LoadBalancer Service stuck `<pending>`	Floating IP quota exhausted; or LB controller not running	Check the cluster events; bump floating IP quota
Pod scheduling fails: "no nodes available"	Taints / node selectors don't match any pool	Review pool labels and pod nodeSelectors; add a matching pool if needed
Persistent volume claim stuck `Pending`	CSI driver not provisioning; or storage class misconfigured	Check `kubectl describe pvc <name>`; common issue is wrong `storageClassName` (use `nvme-hci` or `provisioned-iops`)
Container Registry pulls fail	imagePullSecret missing or wrong	Use the cluster's built-in registry credential helper instead of imagePullSecrets — it uses cluster identity
Cluster events show `MemoryPressure` on a node	Workload over-committed	Inspect with `kubectl top nodes`; resize pool or scale up

What the platform manages vs what you manage¶

Layer	Managed by Cloud Digit	You manage
Control plane (etcd, API, scheduler, controller-manager)	✓	—
K8s minor version upgrades	✓ (you trigger via Console)	—
Worker OS patching	✓ (in maintenance window)	—
System addons (CoreDNS, CNI, ingress, metrics-server, CSI)	✓	—
Worker node sizing + count	—	✓
Workloads (Deployments, Services, etc.)	—	✓
RBAC inside the cluster	—	✓
In-cluster observability (Prometheus, Grafana, etc.)	—	✓ (or use Managed K8s Operations)

Managed Kubernetes (CaaS) — full service docs
Container Registry
Serverless Containers
Servers — the underlying VMs
Networking — VPC + LoadBalancer Services
Volumes — persistent volume backend
Managed Kubernetes Operations — day-2 ops as a service