Managed MySQL / MariaDB¶
Service ownership
Owner: data-platform (data-pm@clouddigit.ai) — Status: GA — Last audited: 2026-05-11
Managed MySQL 8 and MariaDB 10/11 clusters with HA and replicas.
What it is¶
Managed MySQL or MariaDB. We provision and operate the engine, you operate the schema and SQL. Same operational model as Managed PostgreSQL: HA via semi-sync or async replication, PITR via binlog archive, multiple read replicas.
Versions¶
| Engine | Versions |
|---|---|
| MySQL | 8.0, 8.4 (LTS) |
| MariaDB | 10.6 (LTS), 10.11 (LTS), 11.4 (LTS) |
Topologies¶
| Topology | Use case |
|---|---|
| Single instance | Dev / non-prod |
| HA (primary + replica with auto-failover) | Production default |
| HA + read replicas | Read scaling |
| Cross-region replica | DR / geo-read |
| Group replication (MySQL InnoDB Cluster, MariaDB Galera) | Multi-writer (advanced) |
Compute & storage¶
Same flavor families as Managed PostgreSQL. Storage on Provisioned IOPS by default for OLTP; NVMe HCI option for cost-sensitive workloads.
Backup & PITR¶
- Daily snapshots, retained 7 days default (configurable to 35)
- Binlog archived continuously for PITR
- Manual snapshots, cross-region copy
Maintenance & upgrades¶
Weekly 4-hour window for minor patches. Major upgrades opt-in.
Pricing¶
Compute + Provisioned-IOPS storage rates; see Pricing.
Related¶
Operate this service¶
InnoDB-backed MySQL 8.x and MariaDB 11.x clusters.
Engine choice¶
| MySQL 8.x | MariaDB 11.x |
|---|---|
| GTID-based replication | GTID + multi-source replication |
| JSON functions, window functions | JSON, window, sequence engine |
| Default for most apps | Some Drupal/legacy stacks prefer it |
Topology¶
| Topology | RPO | RTO | Cost |
|---|---|---|---|
| Single instance | 24h (backup) | minutes | 1× |
| Primary + semi-sync replica | <1s | < 30 s | 2.1× |
| Primary + 2 replicas (1 sync, 1 async) | <1s | < 30 s + read scale | 3×+ |
IAM¶
Same shape as PostgreSQL: viewer, connector, dba-operator, cluster-admin.
In-database: the platform provisions an admin role (acme_admin) with GRANT OPTION; you create app-scoped users from there. Root is never exposed.
Parameter groups¶
bash cd db mysql param-group create \ --name acme-prod \ --params innodb_buffer_pool_size=12G,max_connections=500,innodb_log_file_size=1G
innodb_buffer_pool_size should be ~70% of RAM. innodb_log_file_size larger = better write performance, longer crash recovery.
Backups & PITR¶
- Continuous binary-log archival → PITR within retention
- Nightly full backup via
xtrabackup(no lock for InnoDB) - Default retention 7 days; bump per workload
SSL/TLS¶
Required by default for client connections:
mysql --ssl-mode=REQUIRED --ssl-ca=cd-ca.pem -h cluster-acme.bd-dha-1 -u acme -p
Disable per-cluster (not recommended) via parameter group.
Related¶
Metrics¶
| Metric | Healthy | Alert |
|---|---|---|
mysql.connections.threads_connected | < 80% of max | > 90% |
mysql.replication.seconds_behind_master | < 1 s | > 5 s |
mysql.innodb.buffer_pool.hit_pct | > 99% | < 95% |
mysql.innodb.row_lock_waits | varies | spike |
mysql.innodb.deadlocks_per_min | 0 | > 0 |
mysql.slow_queries_per_min | varies | spike from baseline |
Failover¶
bash cd db mysql failover --cluster acme-prod
Promotes semi-sync replica. RTO < 30 s. Apps reconnect using the cluster endpoint (auto-redirects to new primary).
Slow query analysis¶
Enable slow log:
bash cd db mysql param set --cluster acme-prod --slow_query_log=1 --long_query_time=1
Stream to a S3 bucket; analyze with mysqldumpslow or pt-query-digest. Add indexes for the top offenders.
Schema migrations¶
For tables > 1 GB, ALTER TABLE blocks. Use pt-online-schema-change or gh-ost (recommended):
bash gh-ost \ --host=cluster-acme.bd-dha-1 \ --database=acme \ --table=orders \ --alter="ADD COLUMN customer_segment VARCHAR(32)" \ --execute
Both tools copy to a shadow table and swap; near-zero-downtime.
Read replicas¶
bash cd db mysql replica create --cluster acme-prod --az bd-dha-1-az3
Async; lag typically < 1 s. Route read-only queries via a separate connection string. Apps must understand the consistency tradeoff.
Major version upgrades¶
bash cd db mysql upgrade --cluster acme-prod --target-version 8.4 --window <ts>
In-place for minor versions; logical-replication-based for majors (low downtime, same as Postgres upgrade flow).
Related¶
Too many connections¶
ERROR 1040 (08004): Too many connections
- Connection pooler (ProxySQL, recommended) — apps go through pooler not raw cluster
- Drop idle connections:
KILL <id>for sessions idle > 1h - Raise
max_connections(RAM-costly)
Replication broken¶
mysql.replication.seconds_behind_master = NULL:
The replica IO/SQL thread stopped. Check:
sql SHOW SLAVE STATUS\G -- Look at Last_Errno and Last_Error
Common causes: - A row missing on replica (PK conflict) — usually a manual delete. Skip the event or reinitialize. - Schema drift — replica's table doesn't match primary - Storage full on replica
bash cd db mysql replica restart --cluster acme-prod --replica <id>
Long-running ALTER blocks queries¶
The straightforward fix: don't run ALTER directly on busy tables. Use gh-ost or pt-online-schema-change. If you must run ALTER directly:
- Run during off-hours
- Kill the query if the queue builds; the platform has a query timeout configurable per parameter group
InnoDB deadlocks¶
bash cd db mysql innodb status --cluster acme-prod | grep -A 20 "LATEST DETECTED DEADLOCK"
Most common: app code that updates rows in different orders across transactions. Fix at the app — consistent lock ordering, smaller transactions, or SELECT ... FOR UPDATE with ordered iteration.
Buffer pool hit ratio < 95%¶
Cold cache after restart is normal — should climb within hours.
Persistent low: - Working set > innodb_buffer_pool_size. Resize the cluster or shrink the working set. - New query pattern scanning large tables. Add indexes.
Slow query log filling disk¶
Slow query log can grow fast under bad-query bursts. Rotate:
bash cd db mysql slowlog rotate --cluster acme-prod
Better: stream to S3 instead of local disk via the slow-log shipper.
Replication lag during backup¶
Backup window can cause replica IO contention → replication lag. Schedule backups during low-traffic windows; consider taking backups from a dedicated backup replica.