Skip to content

AI Notebook (Jupyter / VS Code)

Service ownership

Owner: ai-platform (ai-pm@clouddigit.ai) — Status: GA — Last audited: 2026-05-11

Managed JupyterLab and VS Code for Web — one-click attach to a GPU, pre-installed ML stack, persistent home directory.

What it is

A managed notebook environment that abstracts the VM lifecycle. Pick a flavor, pick (optionally) a GPU, and you have a JupyterLab or VS Code workspace at https://nb-<name>.<region>.notebook.clouddigit.ai/ in under a minute.

Editors

  • JupyterLab 4 — for the data-science workflow, notebooks-and-Python-first
  • VS Code for Web (code-server) — for general-purpose engineering

Both share a per-user home directory backed by File Storage, so switching editors doesn't lose state.

Pre-installed stack

  • Python 3.11 + 3.12, with virtualenv per kernel
  • PyTorch, JAX, Transformers, Datasets, Accelerate, vLLM
  • TensorFlow, scikit-learn, XGBoost
  • Pandas, NumPy, Polars, DuckDB
  • aws-cli and mc configured to talk to Object S3
  • git, gh, make, gcc, build essentials

Compute

Flavor When
nb-cpu-2 Light EDA, no GPU
nb-cpu-8 Medium-CPU work
nb-gpu-l4 Single-GPU prototyping
nb-gpu-l40s Single-GPU fine-tune of small models
nb-gpu-h100 Larger fine-tunes, multi-modal

Flavor changes rebuild the runtime but preserve the home directory.

Idle and shutdown

  • Auto-suspend after configurable idle window (default 60 min)
  • Resume preserves the kernel state (where the kernel is restartable; not all kernels)
  • Hard delete at user request only

Pricing

Per-hour by flavor, plus persistent home-directory storage. Idle suspend stops the compute charge but keeps the storage charge running. See Pricing.

Operate this service

Managed JupyterLab / VS Code Server for data scientists — no infra knowledge required.

Profile model

Notebooks are templated:

Profile Flavor Use
notebook-cpu-small std-2x8 Exploration, small data
notebook-cpu-large mem-4x32 Pandas on real data
notebook-gpu-t4 gpu-t4-1x Inference experiments
notebook-gpu-a100 gpu-a100-1x Real training

Each profile defines image, flavor, max idle time, max session duration.

IAM

Role Can do
notebook.user Launch own notebooks from approved profiles
notebook.admin Manage profiles, set quotas, view all sessions

notebook.user users see only their own sessions; admins see all (for cost & security audit).

Auto-stop

Default: 30-min idle stop, 8-h max-session. Tune per profile:

bash cd notebook profile set notebook-gpu-a100 \ --idle-stop 15min --max-session 6h

Auto-stop snapshots the workspace; user can resume.

Persistent workspaces

Each user gets a personal volume mounted at /home/user. Survives notebook stops; bills as block storage.

Pre-installed libraries

Base images include: - Python 3.11, R 4.3 - PyTorch 2.5, TensorFlow 2.16, JAX - pandas, numpy, scikit-learn, polars - jupyter, jupyterlab, vscode-server

Custom images supported for additional packages — push to the registry, set as the profile's image.

SSO and audit

Login via SSO/SAML/OIDC. Every session start, stop, and idle-stop is audit-logged.

For PII workloads, an admin can configure data-residency profiles that restrict the notebook's egress (no public internet; only internal endpoints).

Metrics (admin view)

Metric Notes
notebook.active_sessions Current count, by profile
notebook.idle_sessions Should auto-stop
notebook.gpu_hours.mtd Track for cost
notebook.workspace_size_gb (per user) Storage growth

Daily report by user emails to admins.

User onboarding

```bash cd notebook user invite --email jane@acme.com --quota gpu-a100=8h/day

Sends signup link

```

First-time users get a guided tour; an empty workspace; access to profiles their quota allows.

Custom image

Add team-specific packages:

Dockerfile FROM cloudigit/notebook-base:cuda-12.6 RUN pip install --no-cache-dir transformers==4.45 datasets accelerate

bash docker push registry.cloudigit.bd/acme-ml/notebook:v3 cd notebook profile create \ --name notebook-acme-llm \ --image registry.cloudigit.bd/acme-ml/notebook:v3 \ --flavor gpu-a100-1x

Quotas & cost

Per-user quotas across all profiles:

bash cd notebook quota set --user jane@acme.com --gpu-a100-hours-month 40

Soft warn at 80%; hard cut at 100%. Admin override available.

Sharing notebooks

  • For collaboration: nbgrader / jupyterlab-collaboration extension (if enabled by admin)
  • For results: export to HTML/PDF; share via internal portal
  • For pipelines: extract code into a versioned repo

Don't use notebooks as the source-of-truth for production code.

Egress restrictions

For PII profiles:

bash cd notebook profile set notebook-pii \ --egress-allowlist 'internal-data.acme.local,registry.cloudigit.bd' \ --no-public-internet

Users can still use VPC-internal services; can't reach public package repos. Pre-bake required packages in the image.

Notebook won't start

Cause Check
Quota exhausted cd notebook quota show --user <email>
No capacity for profile's flavor Wait, or pick a smaller profile
Custom image broken (won't pull) See Registry troubleshooting
Workspace volume offline Ticket

cd notebook session diagnose --session <id> shows the failure reason.

Kernel keeps dying

Reason Action
OOM (out of RAM) Bigger profile, or work on a subset
GPU OOM Bigger GPU profile, smaller batch
Long-running cell killed by idle-stop Tune idle-stop on the profile
Network drop during pip install Retry; check egress config

"No space left on device"

Workspace volume full:

bash df -h /home/user du -sh /home/user/*

Common culprits: - ~/.cache/huggingface (model downloads accumulate) - Conda envs cluttering - Untracked experiment outputs

Workspace volumes can be resized by admins.

VS Code extensions missing

VS Code Server runs the extensions in the server, not your local IDE. The pre-installed image has a baseline; missing extensions must be added by an admin to the custom image (not per-user via the marketplace, which is blocked in egress-restricted profiles).

Slow notebook UI

Symptom Cause
Slow to render long outputs Notebook with > 10MB of output; truncate / clear outputs
Slow file browser Workspace has 100k+ small files; clean up
Slow Python imports Cold cache; pre-import in profile startup script

Auto-stop killed my work

Workspace is persisted; code is saved (Jupyter auto-saves every minute). Notebook kernel state (variables in memory) is lost.

Mitigations: - Use %store to persist variables across sessions - Save intermediate results to disk frequently - Use a longer idle-stop for genuinely long-running computations

SSO login loop

Failed to authenticate: invalid_state

Cookie / session issue. Usually: - Browser blocking third-party cookies — allow for the IdP domain - Clock skew on user's device > 5 min — sync NTP