Skip to content

Kubernetes

The Kubernetes tab manages your project's managed Kubernetes clusters — provision the control plane, size node pools, get a kubeconfig, run upgrades. Underlying service: Managed Kubernetes (CaaS).

Overview

Kubernetes clusters list Kubernetes tab — clusters in this project, with their version, node-pool summary, and health.

Columns:

Column What it shows
Name Cluster name
Version Kubernetes minor (e.g. 1.30, 1.29)
Node pools Count + total node count
Status Active, Updating, Failed, Deleting
Region Where the control plane lives
Created Relative timestamp

Click a cluster row to drill into its detail view: control-plane info, node-pool list, addons, kubeconfig download, upgrade history.

Administration

Quotas

Project Settings → Quota → Kubernetes:

  • Clusters per region per project — default 25 (bumpable)
  • Nodes per cluster — default 1,000
  • Node pools per cluster — default 30

Worker nodes themselves consume the project's compute quota (vCPU + RAM), so cluster sizing is also gated by Project Settings → Quota → Compute.

Defaults

  • Default Kubernetes version for new clusters — defaults to "Recommended" (currently 1.30); can pin to a specific minor for organizations that batch upgrades quarterly
  • Default ingress controller — NGINX (default) or Traefik (per-cluster choice)
  • Default CNI — Calico (default) or Cilium

Maintenance windows

Per-cluster — set the day-of-week + time window when minor patches are applied. Outside the window, only emergency CVEs land.

Operation

Creating a cluster

+ Create Cluster:

  1. Name — also becomes part of node DNS names
  2. Region — control plane lives here; node pools can span AZs in this region
  3. Kubernetes version — Recommended / specific minor
  4. VPC + subnet — the cluster's worker nodes live here; pod CIDR is non-overlapping (default 10.244.0.0/16)
  5. Service CIDR — default 10.96.0.0/12
  6. CNI — Calico / Cilium (Cilium for advanced features like Hubble)
  7. Ingress controller — NGINX / Traefik
  8. Pod-to-pod encryption — opt-in (small CPU cost)
  9. Initial node pool — flavor, autoscale envelope, taints, labels
  10. Tags

The control plane provisions in ~5 minutes. Node pool comes up after that. Total time-to-ready ~7–10 minutes for a single-pool cluster.

Adding a node pool

Cluster detail → Node Pools → + Add pool:

  • Name
  • Flavorstd-*, mem-*, cpu-*, or gpu-* for GPU pools
  • Autoscalemin, max, desired
  • Subnet — within the cluster's VPC
  • Taints / labels — for workload pinning
  • Tags

Node pools with GPU flavors must be in regions that have GPU capacity (currently bd-dha-1 GA, others preview).

Getting kubeconfig

Cluster detail → Access → Download kubeconfig. The downloaded file embeds an OIDC token issued for your user identity, scoped to the cluster's namespace policies.

bash export KUBECONFIG=~/Downloads/kubeconfig-mycluster.yaml kubectl get nodes

For CI/CD, issue a service-account kubeconfig instead: cluster detail → Access → + New CI kubeconfig → name + RBAC scope.

Upgrading

Cluster detail → Version → Upgrade to . The wizard:

  1. Drains the control plane (HA — no downtime)
  2. Upgrades workers one node-pool at a time, with PDB-respecting drains
  3. Reports per-step progress

You can pause at any step. Major-version upgrades (one minor at a time) are explicit; Cloud Digit does not auto-upgrade across minors.

Deleting

Cluster detail → Settings → Danger zone → Delete cluster. Asks for the cluster name as confirmation. Worker nodes terminate, control plane is destroyed, and any LoadBalancer Services release their floating IPs. Persistent volumes survive (they're separate Block Storage objects).

Troubleshooting

Symptom Likely cause Fix
Cluster stuck in Updating for > 30 min Stuck PDB on a workload that won't drain kubectl describe pdb -A from the cluster admin's machine; manually evict the offending pod
Node pool won't scale up Compute quota exhausted; or no capacity for the flavor in this region Check Project Settings → Quota → Compute; try a different flavor
kubectl returns Unauthorized OIDC token expired (kubeconfig has a 12h embedded token) Re-download kubeconfig, or run kubectl oidc-login if you've configured the helper
LoadBalancer Service stuck <pending> Floating IP quota exhausted; or LB controller not running Check the cluster events; bump floating IP quota
Pod scheduling fails: "no nodes available" Taints / node selectors don't match any pool Review pool labels and pod nodeSelectors; add a matching pool if needed
Persistent volume claim stuck Pending CSI driver not provisioning; or storage class misconfigured Check kubectl describe pvc <name>; common issue is wrong storageClassName (use nvme-hci or provisioned-iops)
Container Registry pulls fail imagePullSecret missing or wrong Use the cluster's built-in registry credential helper instead of imagePullSecrets — it uses cluster identity
Cluster events show MemoryPressure on a node Workload over-committed Inspect with kubectl top nodes; resize pool or scale up

What the platform manages vs what you manage

Layer Managed by Cloud Digit You manage
Control plane (etcd, API, scheduler, controller-manager)
K8s minor version upgrades ✓ (you trigger via Console)
Worker OS patching ✓ (in maintenance window)
System addons (CoreDNS, CNI, ingress, metrics-server, CSI)
Worker node sizing + count
Workloads (Deployments, Services, etc.)
RBAC inside the cluster
In-cluster observability (Prometheus, Grafana, etc.) ✓ (or use Managed K8s Operations)