VPN-as-a-Service¶
Service ownership
Owner: network-platform (network-pm@clouddigit.ai) — Status: GA — Last audited: 2026-05-11
Managed IPsec site-to-site VPN and IKEv2 client VPN, terminated on a redundant Cloud Digit gateway pair per region.
What it is¶
A VPN gateway attached to your VPC. Terminate up to two IKE/IPsec tunnels per gateway (active/active or active/standby) for site-to-site, or expose IKEv2 + EAP for client VPN.
Modes¶
| Mode | Use case |
|---|---|
| Site-to-site IPsec | Connect a customer DC, branch office, or another cloud |
| Client VPN (IKEv2) | Remote workforce, vendor access |
| BGP over IPsec | Dynamic routing into your network |
Crypto baseline¶
| Phase | Defaults |
|---|---|
| IKEv2 | AES-256-GCM, SHA-384, DH group 19/20/21, lifetime 8 h |
| ESP | AES-256-GCM, SHA-384, PFS group 19, lifetime 1 h, anti-replay on |
PSK or certificate-based; certificate is the right call for FI / regulated workloads.
Throughput¶
| Tier | Aggregate throughput |
|---|---|
| Small | 500 Mbps |
| Medium | 1.5 Gbps |
| Large | 5 Gbps |
| XL | 10 Gbps |
For higher / dedicated throughput, pair with a BDIX Peering Direct Connect.
High availability¶
The gateway is deployed as a redundant pair across two AZs. Failover is sub-second; you advertise both peer IPs to your far-end device.
Pricing¶
Per-gateway-hour (by tier) + per-GB egress over the tunnel (international only — domestic over BDIX is free). See Pricing.
Related¶
- Virtual Private Cloud
- BDIX Peering Direct Connect — alternative for very large or always-on private flows
Operate this service¶
Site-to-site IPsec and client-to-site (SSL/WireGuard) VPNs, managed.
Topology choice¶
| Need | Type |
|---|---|
| Connect on-prem to VPC | Site-to-site IPsec |
| Remote employees access internal services | Client VPN (SSL/TLS or WireGuard) |
| Cross-region cloud-to-cloud (intra-BD) | Use VPC peering or transit instead |
IAM¶
| Role | Can do |
|---|---|
vpn.viewer | List VPN connections, view metrics |
vpn.builder | Create / modify VPN gateways and connections |
vpn.client-admin | Manage client VPN users, certificates |
vpn.admin | Above + cryptographic policy, audit |
Site-to-site setup¶
Two-side configuration (Cloud Digit + on-prem):
```bash cd vpn s2s create \ --name acme-onprem \ --vpc acme-prod-vpc \ --remote-cidr 192.168.0.0/16 \ --remote-asn 65001 \ --pre-shared-key-secret openbao://acme-vpn/psk
Returns Cloud Digit endpoint + config snippet for on-prem¶
```
Use BGP-routed site-to-site when both sides support it — eliminates static-route maintenance.
Cryptographic policy¶
Default policy enforces: - IKEv2 only - AES-256-GCM - DH Group 19+ (256-bit ECC) - Perfect Forward Secrecy
Older devices may need legacy-compat policy — log it as tech debt.
Client VPN¶
Issue certificates via console; users self-onboard with:
bash cd vpn client config --user jane@acme.com --output jane.conf
Integrate with SSO/SAML/OIDC for user lifecycle automation.
Audit¶
VPN connections (site and client) log to Audit logs. Critical for compliance — every external-network connection is recorded.
Related¶
Metrics¶
| Metric | Healthy | Alert |
|---|---|---|
vpn.tunnel_state | up | down or flapping |
vpn.s2s.bgp_state (BGP-routed) | established | idle / connect |
vpn.bytes_in/out | matches traffic | sudden drop |
vpn.ike_rekey_count_24h | < 12 (every 2h) | > 24 (excessive renegotiation) |
vpn.client.active_sessions | varies | sudden drop |
Tunnel HA¶
Site-to-site VPNs always have two tunnels (active/standby) — for free. Always configure both on the on-prem side; the second is your hot spare.
Rekey strategy¶
Default: IKE rekey at 8h, Child SA rekey at 1h. Don't shorten unless required — adds CPU + brief packet loss at every rekey.
Client VPN user lifecycle¶
Integrated with SSO: when a user is removed from the IdP, the VPN cert is revoked on next sync (typically < 15 min).
Manual revoke:
bash cd vpn client revoke --user jane@acme.com --reason "left company"
Revoked certs are added to a CRL; active sessions are dropped within 60s.
Cryptographic policy migration¶
When upgrading to a stricter policy:
- Create the new policy on the new tunnel (next-gen)
- Migrate clients/peers one at a time
- Once all migrated, retire the legacy policy
Don't switch the policy on an active tunnel without coordination — peers may break.
Audit reports¶
Quarterly: - All VPN users - Last access timestamp per user - Certificate expiry — revoke any client cert not used in 90 days - IKE failures by source IP (potential brute-force)
bash cd vpn audit report --start-date 2026-04-01 --end-date 2026-04-30
Related¶
Site-to-site tunnel down¶
| Symptom | Likely cause |
|---|---|
| Both tunnels down | On-prem device down, or PSK mismatch |
| One tunnel down, one up | Normal (HA) — investigate the down one |
| Tunnel up but no traffic | Routing or SG issue |
| Tunnel flapping every few min | Crypto policy mismatch or MTU/MSS issue |
cd vpn s2s diagnose --name acme-onprem runs a packet trace and reports the failing phase.
BGP not establishing¶
Site-to-site with BGP, vpn.s2s.bgp_state = idle:
- Confirm IPSec tunnel is up first (BGP rides over IPSec)
- Verify ASN match (Cloud Digit's ASN as remote on the on-prem side)
- Verify BGP MD5 password matches both sides
- Firewall on either side blocking TCP 179?
bash cd vpn s2s bgp status --name acme-onprem
MTU / MSS issues¶
Tunnel is up, simple pings work, but TLS handshake fails or large packets drop:
- IPSec adds ~50–80 bytes; effective MTU drops to ~1400
- Lower on-prem MSS to 1360 (
ip tcp adjust-mss 1360on Cisco, equivalent elsewhere) - Cloud Digit side auto-adjusts MSS on the VPC gateway
Client VPN: "Authentication failed"¶
| Cause | Fix |
|---|---|
| User removed from SSO | Re-add (cert auto-issues on sync) or manually re-issue |
| Cert expired | Re-issue via console |
| MFA enrolled but not provided | Confirm MFA on this session |
| Clock skew on client > 5 min | Sync client NTP |
Client connects but can't reach VPC services¶
- DNS not pushed to client — verify VPN config includes Cloud Digit DNS
- Split-tunnel configured to exclude target CIDR
- VPC routes don't include the client CIDR
Rekey storms¶
vpn.ike_rekey_count_24h > 24:
- DPD timeout too aggressive (on-prem) — extend
- Path MTU discovery broken — fragmenting IKE packets, retries cascade
- Crypto policy mismatch causing renegotiation