Managed MySQL / MariaDB¶

Service ownership

Owner: data-platform (data-pm@clouddigit.ai) — Status: GA — Last audited: 2026-05-11

Managed MySQL 8 and MariaDB 10/11 clusters with HA and replicas.

What it is¶

Managed MySQL or MariaDB. We provision and operate the engine, you operate the schema and SQL. Same operational model as Managed PostgreSQL: HA via semi-sync or async replication, PITR via binlog archive, multiple read replicas.

Versions¶

Engine	Versions
MySQL	8.0, 8.4 (LTS)
MariaDB	10.6 (LTS), 10.11 (LTS), 11.4 (LTS)

Topologies¶

Topology	Use case
Single instance	Dev / non-prod
HA (primary + replica with auto-failover)	Production default
HA + read replicas	Read scaling
Cross-region replica	DR / geo-read
Group replication (MySQL InnoDB Cluster, MariaDB Galera)	Multi-writer (advanced)

Compute & storage¶

Same flavor families as Managed PostgreSQL. Storage on Provisioned IOPS by default for OLTP; NVMe HCI option for cost-sensitive workloads.

Backup & PITR¶

Daily snapshots, retained 7 days default (configurable to 35)
Binlog archived continuously for PITR
Manual snapshots, cross-region copy

Maintenance & upgrades¶

Weekly 4-hour window for minor patches. Major upgrades opt-in.

Pricing¶

Compute + Provisioned-IOPS storage rates; see Pricing.

Operate this service¶

AdministrationOperationTroubleshooting

InnoDB-backed MySQL 8.x and MariaDB 11.x clusters.

Engine choice¶

MySQL 8.x	MariaDB 11.x
GTID-based replication	GTID + multi-source replication
JSON functions, window functions	JSON, window, sequence engine
Default for most apps	Some Drupal/legacy stacks prefer it

Topology¶

Topology	RPO	RTO	Cost
Single instance	24h (backup)	minutes	1×
Primary + semi-sync replica	<1s	< 30 s	2.1×
Primary + 2 replicas (1 sync, 1 async)	<1s	< 30 s + read scale	3×+

IAM¶

Same shape as PostgreSQL: viewer, connector, dba-operator, cluster-admin.

In-database: the platform provisions an admin role (acme_admin) with GRANT OPTION; you create app-scoped users from there. Root is never exposed.

Parameter groups¶

bash cd db mysql param-group create \ --name acme-prod \ --params innodb_buffer_pool_size=12G,max_connections=500,innodb_log_file_size=1G

innodb_buffer_pool_size should be ~70% of RAM. innodb_log_file_size larger = better write performance, longer crash recovery.

Backups & PITR¶

Continuous binary-log archival → PITR within retention
Nightly full backup via xtrabackup (no lock for InnoDB)
Default retention 7 days; bump per workload

SSL/TLS¶

Required by default for client connections:

mysql --ssl-mode=REQUIRED --ssl-ca=cd-ca.pem -h cluster-acme.bd-dha-1 -u acme -p

Disable per-cluster (not recommended) via parameter group.

Related¶

Metrics¶

Metric	Healthy	Alert
`mysql.connections.threads_connected`	< 80% of max	> 90%
`mysql.replication.seconds_behind_master`	< 1 s	> 5 s
`mysql.innodb.buffer_pool.hit_pct`	> 99%	< 95%
`mysql.innodb.row_lock_waits`	varies	spike
`mysql.innodb.deadlocks_per_min`	0	> 0
`mysql.slow_queries_per_min`	varies	spike from baseline

Failover¶

bash cd db mysql failover --cluster acme-prod

Promotes semi-sync replica. RTO < 30 s. Apps reconnect using the cluster endpoint (auto-redirects to new primary).

Slow query analysis¶

Enable slow log:

bash cd db mysql param set --cluster acme-prod --slow_query_log=1 --long_query_time=1

Stream to a S3 bucket; analyze with mysqldumpslow or pt-query-digest. Add indexes for the top offenders.

Schema migrations¶

For tables > 1 GB, ALTER TABLE blocks. Use pt-online-schema-change or gh-ost (recommended):

bash gh-ost \ --host=cluster-acme.bd-dha-1 \ --database=acme \ --table=orders \ --alter="ADD COLUMN customer_segment VARCHAR(32)" \ --execute

Both tools copy to a shadow table and swap; near-zero-downtime.

Read replicas¶

bash cd db mysql replica create --cluster acme-prod --az bd-dha-1-az3

Async; lag typically < 1 s. Route read-only queries via a separate connection string. Apps must understand the consistency tradeoff.

Major version upgrades¶

bash cd db mysql upgrade --cluster acme-prod --target-version 8.4 --window <ts>

In-place for minor versions; logical-replication-based for majors (low downtime, same as Postgres upgrade flow).

Related¶

Too many connections¶

ERROR 1040 (08004): Too many connections

Connection pooler (ProxySQL, recommended) — apps go through pooler not raw cluster
Drop idle connections: KILL <id> for sessions idle > 1h
Raise max_connections (RAM-costly)

Replication broken¶

mysql.replication.seconds_behind_master = NULL:

The replica IO/SQL thread stopped. Check:

sql SHOW SLAVE STATUS\G -- Look at Last_Errno and Last_Error

Common causes: - A row missing on replica (PK conflict) — usually a manual delete. Skip the event or reinitialize. - Schema drift — replica's table doesn't match primary - Storage full on replica

bash cd db mysql replica restart --cluster acme-prod --replica <id>

Long-running ALTER blocks queries¶

The straightforward fix: don't run ALTER directly on busy tables. Use gh-ost or pt-online-schema-change. If you must run ALTER directly:

Run during off-hours
Kill the query if the queue builds; the platform has a query timeout configurable per parameter group

InnoDB deadlocks¶

bash cd db mysql innodb status --cluster acme-prod | grep -A 20 "LATEST DETECTED DEADLOCK"

Most common: app code that updates rows in different orders across transactions. Fix at the app — consistent lock ordering, smaller transactions, or SELECT ... FOR UPDATE with ordered iteration.

Buffer pool hit ratio < 95%¶

Cold cache after restart is normal — should climb within hours.

Persistent low: - Working set > innodb_buffer_pool_size. Resize the cluster or shrink the working set. - New query pattern scanning large tables. Add indexes.

Slow query log filling disk¶

Slow query log can grow fast under bad-query bursts. Rotate:

bash cd db mysql slowlog rotate --cluster acme-prod

Better: stream to S3 instead of local disk via the slow-log shipper.

Replication lag during backup¶

Backup window can cause replica IO contention → replication lag. Schedule backups during low-traffic windows; consider taking backups from a dedicated backup replica.