DBA_OB_LS_REPLICA_TASKS and its _HISTORY companion. Convergence is "the views are empty."A locality string is a promise: this many F replicas, of these types, in these zones. The cluster's job is to keep that promise as units rebalance, locality changes, and the occasional server goes dark — without ever pausing writes. This post is the machinery behind that promise: the background tasks the Root Service issues, the manual commands that do the same work by hand, the self-heal path when a replica falls behind, and the views that show you all of it.
Every background data movement in this post follows the same shape: one layer decides what should happen, another does the copy. Hold that split in mind, and the rest of the post — the verbs the Root Service issues, the SQL you can run by hand, the rebuild path that fires on its own — reads as variations on one pattern, not three unrelated mechanisms.
| Layer | Component | Role | Granularity |
| Control | Root Service — Disaster Recovery (DR) service | Diffs current state against locality; generates and schedules DR tasks; prioritizes; retries | Per log-stream (LS) replica |
| Control | Root Service — unit / resource balancer | Decides which OBServer hosts each resource unit; moving a unit drags its replicas along | Per unit |
| Execution | OBServer — storage HA layer | Reads macro blocks and clog from a source, builds the replica locally, joins the member list | Per tablet / LS |
The unit of work at the control layer is the log-stream replica, not the table or the partition. Since OceanBase 4.x, every partition a tenant has in a zone rides inside a log stream, and HA operates on the LS. "Migrate a replica" always means "migrate one LS replica from server A to server B" — and every partition in that LS comes along for free.
When current state drifts from declared locality, the DR service queues a task — a concrete action against a single log-stream replica — to close the gap. The common task types you'll encounter in DBA_OB_LS_REPLICA_TASKS.TASK_TYPE are the ones below. Three of them carry most of the story:
A few more handle removals and quorum adjustments. REMOVE PAXOS REPLICA and REMOVE NON PAXOS REPLICA drop a voter or a non-voter, respectively. MODIFY PAXOS REPLICA NUMBER changes only the quorum size without touching data — for example, adjusting from 3F to 4F — and is constrained to a delta of one per task: a 3F → 5F change runs as a 3 → 4 → 5 sequence, not a single jump.
Two distinctions matter:
Migrate moves; add creates. Migrate copies an existing replica to a new server, then retires the original. Add has no source replica of its own — it picks a healthy peer as a data source and builds a new copy, leaving the others untouched. "Replication" in the everyday sense — going from N copies to N+1 — is ADD REPLICA.
Type transform moves no data. Turning an R into an F doesn't re-copy the dataset — the bytes are already there. The task promotes a Learner into a Paxos voter (or the reverse). But only F replicas vote, so every type transform changes the quorum: locality type changes always carry consensus implications. (C replicas, the columnstore type, can't be converted to or from F/R.)
Replicas don't float freely. Each one lives inside a resource unit — its CPU/memory box — on some OBServer. Move the unit, and its replicas have to move with it. That's the most common reason a MIGRATE REPLICA task appears without anyone touching locality: the balancer relocated a unit to even out load, and the LS replicas inside it followed. When you read the task's COMMENT, you'll see strings like migrate replica due to unit group not match.
You can also move units by hand — useful for draining a host before maintenance:
-- Move unit 1001 to a specific OBServer; replicas migrate in the background
ALTER SYSTEM MIGRATE UNIT = 1001 DESTINATION = '11.0.0.5:2882';
-- Change your mind before it finishes
ALTER SYSTEM CANCEL MIGRATE UNIT 1001;The statement returns immediately. The LS replica copy runs asynchronously, and you watch it through the same task views as everything else.
For surgical control — usually during operations or testing — OceanBase exposes per-LS-replica commands that map one-to-one onto the DR verbs above. From the sys tenant, scoped with LS=, SERVER=, and TENANT=:
-- ADD REPLICA: create a new replica on a target server (DATA_SOURCE is an optional hint)
ALTER SYSTEM ADD REPLICA LS = 1001 SERVER = '11.0.0.7:2882'
REPLICA_TYPE = 'R' DATA_SOURCE = '11.0.0.3:2882' TENANT = 'tt1';
-- MIGRATE REPLICA: move an existing replica (same zone only)
ALTER SYSTEM MIGRATE REPLICA LS = 1001
SOURCE = '11.0.0.7:2882' DESTINATION = '11.0.0.8:2882' TENANT = 'tt1';
-- TYPE TRANSFORM: change a replica's type in place
-- PAXOS_REPLICA_NUM is optional; if set, it must be current quorum ± 1
-- (current + 1 when promoting R → F, current − 1 when demoting F → R)
ALTER SYSTEM MODIFY REPLICA LS = 1001 SERVER = '11.0.0.7:2882'
REPLICA_TYPE = 'F' TENANT = 'tt1';
-- REMOVE REPLICA: drop a replica
ALTER SYSTEM REMOVE REPLICA LS = 1001 SERVER = '11.0.0.8:2882' TENANT = 'tt1';In normal operation you rarely type these — you change locality and let the Root Service emit the equivalent tasks. They exist for the cases where you need to drive a single replica directly: forcing a placement, unsticking a stalled task, or staging a controlled rebalance.
A Follower or Learner normally stays current by replaying clog incrementally. But logs get recycled. If a replica is down or partitioned long enough that the log it still needs has been pruned on its peers, incremental replay is impossible — there's a hole it can't fill by catching up.
That's when rebuild fires. The lagging replica discards its stale state, re-copies a recent baseline — macro blocks plus a fresh log start point — from a healthy peer, then resumes normal replay. Unlike migrate / add / remove, rebuild is not a DR task and isn't scheduled by the Root Service. It's detected and driven at the storage layer on the OBServer holding the lagging replica — local trigger, local execution.
Three properties are worth pinning down:
ALTER SYSTEM REBUILD for routine use.One wrinkle for C (columnstore) replicas: the baseline arrives row-formatted, and a background row-to-column task runs after. During that window, the replica still serves weak reads — just against the row-format baseline.
Every DR task is recorded, so progress and failures are queryable rather than guessed at. Three views cover the lifecycle:
DBA_OB_LS_REPLICA_TASKS (tenant) / CDB_OB_LS_REPLICA_TASKS (sys) — tasks currently waiting or in flight.DBA_OB_LS_REPLICA_TASK_HISTORY / CDB_OB_LS_REPLICA_TASK_HISTORY — tasks that have finished, success or failure.V$OB_LS_REPLICA_TASK_PLAN (sys) — tasks the DR service has planned but not yet issued. Useful when you want to know what it intends to do next.A practical pattern after any locality change or unit move:
-- in-flight: what is happening right now, and why
SELECT ls_id, task_type, task_status, comment
FROM oceanbase.DBA_OB_LS_REPLICA_TASKS;
-- recently finished, newest first
SELECT ls_id, task_type, task_status, execute_result, finish_time
FROM oceanbase.DBA_OB_LS_REPLICA_TASK_HISTORY
ORDER BY finish_time DESC LIMIT 20;TASK_STATUS moves WAITING → INPROGRESS → COMPLETED, or lands in FAILED / CANCELED. The history view additionally keeps EXECUTE_RESULT, which holds the actual return code — OB_SUCCESS for a clean finish, or an error code like OB_TIMEOUT for a failed one. Filter the unhappy path with WHERE execute_result NOT LIKE '%OB_SUCCESS%'.
The COMMENT field explains why the task was generated. Strings like migrate replica due to unit group not match answer the most common operational question — "did I trigger this, or did the balancer?" — at a glance. When both queries come back clean, convergence is done: the cluster matches its contract again.
Online convergence is a throttling problem: copy fast enough to heal quickly, slow enough not to starve foreground traffic. Three cluster parameters shape it:
enable_rereplication — master switch for automatic re-replication of lost replicas. On by default; turn off only for controlled maintenance.sys_bkgd_net_percentage — caps the share of network bandwidth that background tasks may use.data_disk_usage_limit_percentage — refuses to place or migrate a replica onto a server past that disk-usage threshold. A safety brake on convergence under pressure.These replace the older data_copy_concurrency family of knobs, removed in 4.0 in favor of tenant-level IO isolation.
This post covered the expected drift: locality changed, the balancer moved a unit, or a replica fell behind — and the Root Service converged the cluster online. The harder case is the unexpected one: a server that goes permanently offline and isn't coming back. Then re-replication, ADD REPLICA, and rebuild combine into a self-healing flow that restores both the replica count and the quorum without operator intervention. That permanent-failure recovery path is the next post.
On a multi-zone OceanBase Community Edition tenant, change locality to add a read-only replica and watch the task system, not the data:
ALTER TENANT t1 LOCALITY = 'F@z1, F@z2, F@z3, R@z4';
-- Watch the ADD REPLICA task appear and run
SELECT ls_id, task_type, task_status, comment
FROM oceanbase.DBA_OB_LS_REPLICA_TASKS;You'll see one ADD REPLICA task per log stream march from WAITING to COMPLETED. Then drop the R@z4 clause and watch the REMOVE NON PAXOS REPLICA tasks clean it up — every clause of the contract, enforced by a task you can see.

AI era doesn't need another heavy, complex enterprise database. It needs agility. It needs flexibility. We went back to the drawing board to understand what an AI application actually needs from a database. Our answer is OceanBase seekdb


On the DABstep Global Leaderboard, OceanBase DataPilot agent has secured the top spot, maintaining a significant lead over the runner-up for a month. The secret to our SOTA results was a fundamental shift in engineering paradigm: moving from "Prompt Engineering" to "Asset Engineering."


How OceanBase uses Multi-Paxos, log streams, replica types, and leader election to achieve zero data loss and fast failover.
