AI Notebook (Jupyter / VS Code)¶
Service ownership
Owner: ai-platform (ai-pm@clouddigit.ai) — Status: GA — Last audited: 2026-05-11
Managed JupyterLab and VS Code for Web — one-click attach to a GPU, pre-installed ML stack, persistent home directory.
What it is¶
A managed notebook environment that abstracts the VM lifecycle. Pick a flavor, pick (optionally) a GPU, and you have a JupyterLab or VS Code workspace at https://nb-<name>.<region>.notebook.clouddigit.ai/ in under a minute.
Editors¶
- JupyterLab 4 — for the data-science workflow, notebooks-and-Python-first
- VS Code for Web (code-server) — for general-purpose engineering
Both share a per-user home directory backed by File Storage, so switching editors doesn't lose state.
Pre-installed stack¶
- Python 3.11 + 3.12, with virtualenv per kernel
- PyTorch, JAX, Transformers, Datasets, Accelerate, vLLM
- TensorFlow, scikit-learn, XGBoost
- Pandas, NumPy, Polars, DuckDB
aws-cliandmcconfigured to talk to Object S3git,gh,make,gcc, build essentials
Compute¶
| Flavor | When |
|---|---|
nb-cpu-2 | Light EDA, no GPU |
nb-cpu-8 | Medium-CPU work |
nb-gpu-l4 | Single-GPU prototyping |
nb-gpu-l40s | Single-GPU fine-tune of small models |
nb-gpu-h100 | Larger fine-tunes, multi-modal |
Flavor changes rebuild the runtime but preserve the home directory.
Idle and shutdown¶
- Auto-suspend after configurable idle window (default 60 min)
- Resume preserves the kernel state (where the kernel is restartable; not all kernels)
- Hard delete at user request only
Pricing¶
Per-hour by flavor, plus persistent home-directory storage. Idle suspend stops the compute charge but keeps the storage charge running. See Pricing.
Related¶
- GPU VMs
- Inference Endpoints — promote a notebook-trained model
- Vector Database
Operate this service¶
Managed JupyterLab / VS Code Server for data scientists — no infra knowledge required.
Profile model¶
Notebooks are templated:
| Profile | Flavor | Use |
|---|---|---|
notebook-cpu-small | std-2x8 | Exploration, small data |
notebook-cpu-large | mem-4x32 | Pandas on real data |
notebook-gpu-t4 | gpu-t4-1x | Inference experiments |
notebook-gpu-a100 | gpu-a100-1x | Real training |
Each profile defines image, flavor, max idle time, max session duration.
IAM¶
| Role | Can do |
|---|---|
notebook.user | Launch own notebooks from approved profiles |
notebook.admin | Manage profiles, set quotas, view all sessions |
notebook.user users see only their own sessions; admins see all (for cost & security audit).
Auto-stop¶
Default: 30-min idle stop, 8-h max-session. Tune per profile:
bash cd notebook profile set notebook-gpu-a100 \ --idle-stop 15min --max-session 6h
Auto-stop snapshots the workspace; user can resume.
Persistent workspaces¶
Each user gets a personal volume mounted at /home/user. Survives notebook stops; bills as block storage.
Pre-installed libraries¶
Base images include: - Python 3.11, R 4.3 - PyTorch 2.5, TensorFlow 2.16, JAX - pandas, numpy, scikit-learn, polars - jupyter, jupyterlab, vscode-server
Custom images supported for additional packages — push to the registry, set as the profile's image.
SSO and audit¶
Login via SSO/SAML/OIDC. Every session start, stop, and idle-stop is audit-logged.
For PII workloads, an admin can configure data-residency profiles that restrict the notebook's egress (no public internet; only internal endpoints).
Related¶
Metrics (admin view)¶
| Metric | Notes |
|---|---|
notebook.active_sessions | Current count, by profile |
notebook.idle_sessions | Should auto-stop |
notebook.gpu_hours.mtd | Track for cost |
notebook.workspace_size_gb (per user) | Storage growth |
Daily report by user emails to admins.
User onboarding¶
```bash cd notebook user invite --email jane@acme.com --quota gpu-a100=8h/day
Sends signup link¶
```
First-time users get a guided tour; an empty workspace; access to profiles their quota allows.
Custom image¶
Add team-specific packages:
Dockerfile FROM cloudigit/notebook-base:cuda-12.6 RUN pip install --no-cache-dir transformers==4.45 datasets accelerate
bash docker push registry.cloudigit.bd/acme-ml/notebook:v3 cd notebook profile create \ --name notebook-acme-llm \ --image registry.cloudigit.bd/acme-ml/notebook:v3 \ --flavor gpu-a100-1x
Quotas & cost¶
Per-user quotas across all profiles:
bash cd notebook quota set --user jane@acme.com --gpu-a100-hours-month 40
Soft warn at 80%; hard cut at 100%. Admin override available.
Sharing notebooks¶
- For collaboration: nbgrader / jupyterlab-collaboration extension (if enabled by admin)
- For results: export to HTML/PDF; share via internal portal
- For pipelines: extract code into a versioned repo
Don't use notebooks as the source-of-truth for production code.
Egress restrictions¶
For PII profiles:
bash cd notebook profile set notebook-pii \ --egress-allowlist 'internal-data.acme.local,registry.cloudigit.bd' \ --no-public-internet
Users can still use VPC-internal services; can't reach public package repos. Pre-bake required packages in the image.
Related¶
Notebook won't start¶
| Cause | Check |
|---|---|
| Quota exhausted | cd notebook quota show --user <email> |
| No capacity for profile's flavor | Wait, or pick a smaller profile |
| Custom image broken (won't pull) | See Registry troubleshooting |
| Workspace volume offline | Ticket |
cd notebook session diagnose --session <id> shows the failure reason.
Kernel keeps dying¶
| Reason | Action |
|---|---|
| OOM (out of RAM) | Bigger profile, or work on a subset |
| GPU OOM | Bigger GPU profile, smaller batch |
| Long-running cell killed by idle-stop | Tune idle-stop on the profile |
| Network drop during pip install | Retry; check egress config |
"No space left on device"¶
Workspace volume full:
bash df -h /home/user du -sh /home/user/*
Common culprits: - ~/.cache/huggingface (model downloads accumulate) - Conda envs cluttering - Untracked experiment outputs
Workspace volumes can be resized by admins.
VS Code extensions missing¶
VS Code Server runs the extensions in the server, not your local IDE. The pre-installed image has a baseline; missing extensions must be added by an admin to the custom image (not per-user via the marketplace, which is blocked in egress-restricted profiles).
Slow notebook UI¶
| Symptom | Cause |
|---|---|
| Slow to render long outputs | Notebook with > 10MB of output; truncate / clear outputs |
| Slow file browser | Workspace has 100k+ small files; clean up |
| Slow Python imports | Cold cache; pre-import in profile startup script |
Auto-stop killed my work¶
Workspace is persisted; code is saved (Jupyter auto-saves every minute). Notebook kernel state (variables in memory) is lost.
Mitigations: - Use %store to persist variables across sessions - Save intermediate results to disk frequently - Use a longer idle-stop for genuinely long-running computations
SSO login loop¶
Failed to authenticate: invalid_state
Cookie / session issue. Usually: - Browser blocking third-party cookies — allow for the IdP domain - Clock skew on user's device > 5 min — sync NTP