OceanBase

AI agents writing directly to production tables is the new "pushing to main without a PR." seekdb 1.2.0 adds Git-like version control primitives — Fork, Diff, Merge — as native SQL statements.

FORK DATABASE creates an instant copy-on-write clone of an entire database with a single atomic snapshot. DIFF TABLE compares two tables row-by-row and outputs every difference. MERGE TABLE applies changes back with conflict resolution in one transaction.

The result: agents work on isolated copies, humans review the diff, and only approved changes reach production. One SQL statement replaces thirty minutes of hand-written JOINs.

oceanbase database

Your agent wrote some business logic. The output table has 500 rows. The expected-results table also has 500 rows. Which rows are correct? Which values are wrong? Are there extra rows, or missing ones?

You open a SQL editor and start writing LEFT JOIN ... CASE WHEN ... UNION ALL. You realize you forgot to handle NULLs. Fix that. The sort order is wrong. Fix that too. Thirty minutes later, you have a diff. You paste it into the agent's context and ask it to fix the code. It fixes the code. You run the whole comparison again.

Code has had git diff for this — one command, done. Data hasn't. You wouldn't let a developer push directly to main without a PR. So why is your agent writing directly to production tables?

seekdb 1.2.0 introduces three SQL primitives that close this gap:

DIFF TABLE actual_result AGAINST expected_result;

One statement. Every difference, down to the row and column. Combined with Fork Database (database-level copy-on-write cloning) and MERGE TABLE (conflict-aware merging), plus Fork Table from v1.1.0, seekdb now provides a complete data branching workflow — fork, change, diff, merge. The same loop you use with Git, applied to data.

What We Built

seekdb 1.2.0 adds three new SQL statements:

FORK DATABASE source_database TO destination_database — Instant copy-on-write clone of an entire database, all tables sharing the same atomic snapshot

DIFF TABLE incoming_db.incoming_t AGAINST current_db.current_t — Row-level comparison of two tables by primary key, outputting conflicts, additions, and deletions

MERGE TABLE incoming_db.incoming_t INTO current_db.current_t STRATEGY {FAIL | THEIRS | OURS} — Merge changes back with conflict resolution in a single transaction

All three work with standard MySQL clients — no special drivers or SDKs required

The Workflow: Fork → Change → Diff → Merge

Here's a concrete example. Say you have a coding agent that generates business logic, and you want to verify its output against expected results:

-- Baseline: input table + result template
CREATE TABLE test.orders_input (id INT PRIMARY KEY, amount DECIMAL(10,2), status VARCHAR(20));
INSERT INTO test.orders_input VALUES (1, 100, 'pending'), (2, 200, 'pending');

CREATE TABLE test.result_template (id INT PRIMARY KEY, final_amount DECIMAL(10,2));

-- Fork an isolated environment for each test run
FORK DATABASE test TO test_run;

-- Actual results produced by the agent's business logic
FORK TABLE test_run.result_template TO test_run.actual_result;
INSERT INTO test_run.actual_result
SELECT id, CASE WHEN amount >= 150 THEN amount * 0.9 ELSE amount END
FROM test_run.orders_input;

-- Expected results from human review or rule system
FORK TABLE test_run.result_template TO test_run.expected_result;
INSERT INTO test_run.expected_result VALUES (1, 100), (2, 180);

-- Compare actual results against expected results
DIFF TABLE test_run.actual_result AGAINST test_run.expected_result;

If DIFF returns nothing, the agent's logic matches expectations. If it returns rows, the differences are precise — which rows, which columns, what values. You can feed that directly back to the agent for another iteration. That's the write code → test → debug loop, closed at the data layer.

The same pattern works when an agent modifies data directly (labeling, cleaning, rule-based corrections): fork a copy, let the agent work on the copy, DIFF to review, MERGE to apply.

Fork Database: Atomic Snapshots Across Tables

Fork Table (shipped in seekdb 1.1.0) clones a single table via copy-on-write in milliseconds. But real workloads span multiple tables with foreign key relationships.

If you fork orders, users, and payments one at a time, each snapshot is taken at a slightly different moment. An order might reference a payment that hasn't been forked yet. Foreign keys break. Join results are inconsistent.

Fork Database solves this by snapshotting the entire database at a single point in time:

FORK DATABASE db_src TO db_fork;

All tables in db_fork share the same snapshot version. Foreign keys between tables within the database are preserved. The cloned database is fully readable and writable — changes don't affect the source.

Key properties:

Atomic snapshot — All tables share one fork point. Multi-table joins and foreign keys stay consistent.

Database properties inherited — Charset, collation, and other database-level settings carry over.

Intra-database foreign keys preserved — FK relationships within the database are maintained; cross-database FKs are skipped.

Multiple forks supported — Fork the same source to different targets, each with its own independent snapshot.

DIFF TABLE: See Exactly What Changed

DIFF TABLE incoming_db.incoming_t AGAINST current_db.current_t;

DIFF compares two tables row by row using primary keys. It's read-only — nothing gets modified.

The output categorizes every difference:

Category	Meaning
Conflict	Same primary key, different values — both rows are shown
Current-only	Row exists only in the current table
Incoming-only	Row exists only in the incoming table
Identical	Not shown

A concrete example makes this clearer:

CREATE TABLE demo.student_base (id INT PRIMARY KEY, name VARCHAR(20), score INT);
INSERT INTO demo.student_base VALUES (1, 'Alice', 80), (2, 'Bob', 90), (3, 'Charlie', 70);

FORK TABLE demo.student_base TO demo.t_current;
FORK TABLE demo.student_base TO demo.t_incoming;

UPDATE demo.t_incoming SET score = 85 WHERE id = 1;
DELETE FROM demo.t_incoming WHERE id = 2;
INSERT INTO demo.t_incoming VALUES (4, 'David', 95);

-- First, check the diff
DIFF TABLE demo.t_incoming AGAINST demo.t_current;

-- Then decide whether to merge; THEIRS means take the incoming version on conflict
MERGE TABLE demo.t_incoming INTO demo.t_current STRATEGY THEIRS;

-- Check the result after merge
SELECT * FROM demo.t_current ORDER BY id;

Output:

__table      __flag  id  name     score
t_current    INSERT   1  Alice    80
t_incoming   INSERT   1  Alice    85
t_current    INSERT   2  Bob      90
t_incoming   INSERT   4  David    95

Row id=3 (Charlie, 70) is identical in both tables, so it doesn't appear. Row id=1 shows as a conflict — score differs. Row id=2 is current-only (deleted in incoming). Row id=4 is incoming-only (new).

Requirements: both tables must have identical column definitions (name, type, order) and a primary key. NULL comparisons use <=> (NULL equals NULL).

MERGE TABLE: Bring Changes Back Safely

MERGE TABLE incoming_db.incoming_t INTO current_db.current_t  STRATEGY {FAIL | THEIRS | OURS};

MERGE applies incoming changes to the current table in a single transaction. If anything fails, the entire operation rolls back.

Non-conflict behavior is the same regardless of strategy: incoming-only rows get inserted into current; current-only rows stay untouched. MERGE never deletes rows.

The strategy only controls what happens when both tables have the same primary key with different values:

Strategy	On conflict	Use case
FAIL (default)	Rolls back the entire merge	Strict audit — no silent overwrites
THEIRS	Incoming overwrites current	"The agent's version is correct"
OURS	Current stays, incoming is ignored	"Just add new rows, don't touch existing data"

After running MERGE TABLE demo.t_incoming INTO demo.t_current STRATEGY THEIRS in the example above, t_current becomes:

id  name     score
1   Alice    85
2   Bob      90
3   Charlie  70
4   David    95

Where This Fits in Practice

The fork → diff → merge workflow maps to several real scenarios:

Agent code testing. Fork the database per test run. Run the agent's generated code against the forked copy. DIFF actual vs. expected results. Empty output means the test passed; non-empty output tells you exactly which rows and values are wrong. Feed that directly back to the agent. No shared test environments, no data pollution between runs.

Agent data modifications. Agent labels, cleans, or transforms data on a forked copy. DIFF to review what changed before anything touches production. MERGE with THEIRS to apply, or DROP the fork to discard. It's the data equivalent of a PR review.

Multi-branch development. Each developer forks the test database alongside their code branch. No more "your test data broke my test" — the data version that CI flakes on is your fork, not a shared environment. DIFF before merging back; MERGE when the feature ships.

Parallel team edits. Risk, ops, and ML teams each fork the same table. Work independently. MERGE sequentially with explicit conflict strategies — no more "who changed that row?" coordination overhead.

Data quality checks. DIFF an expected-results table against actual output. Conflict rows mean wrong values; unique rows mean missing or extra records. One SQL statement replaces a wall of JOINs and CASE WHENs.

Practical Tips

Start with a small table to validate the workflow, then expand to critical paths.

Treat DIFF output as review material — data changes get Code Review too.

Make sure key tables have a primary key. DIFF and MERGE rely on primary keys for row-level comparison.

Name forked databases so the purpose is obvious at a glance (e.g. prod_riskfix, exp_training_data_v2).

No More Hand-Written JOINs — Data Changes Get Code Review

Before	With seekdb
Edit production tables directly; roll back by hand when something breaks	Fork a copy first. The source table stays untouched. If something breaks, DROP the fork.
Write piles of JOINs + CASE WHENs to compare two tables	One `DIFF TABLE` statement outputs every difference, down to the row and value
Resolve conflicts by pulling people into a group chat and eyeballing rows	`MERGE TABLE` with three built-in strategies handles conflicts in a single SQL statement
AI Agent writes directly to production; when something goes wrong, nobody can explain what changed	Agent writes to a fork. A human reviews the DIFF, then MERGEs back.

From Fork Table in seekdb 1.1.0 to Fork Database + DIFF TABLE + MERGE TABLE in 1.2.0, seekdb is systematically building version control into the database engine. The design intent is straightforward:

Fork (Table / Database) provides branching — safely create an isolated copy.

DIFF TABLE provides comparison — see exactly what changed.

MERGE TABLE provides merging — bring changes back with a strategy you choose.

Together, they form the data equivalent of Git's branch / diff / merge loop. When data iterates as frequently as code, an observable, controllable change path is what makes data management safe, auditable, and collaborative.

Want to try it? seekdb D0 lets you experience Fork Database, DIFF TABLE, and MERGE TABLE hands-on — fork a database, change a few rows, DIFF to see differences, MERGE to bring them back. The whole workflow takes about five minutes.

Start at d0.seekdb.ai.

AI Databases Big Data Open Source

Content

What We Built

The Workflow: Fork → Change → Diff → Merge

Fork Database: Atomic Snapshots Across Tables

DIFF TABLE: See Exactly What Changed

MERGE TABLE: Bring Changes Back Safely

Where This Fits in Practice

Practical Tips

No More Hand-Written JOINs — Data Changes Get Code Review

Keep Reading

View all posts

PRODUCT

Exploring OceanBase 4.3: New Features and Enhancements

At the OceanBase DevCon 2024, we introduced the OceanBase 4.3.0 Beta, unveiling a brand new columnar engine. This release achieves near petabyte-scale, real-time analytics in seconds, and enhances the integration of TP and AP capabilities.

Ray YuJune 13, 2024

ENGINEERING

From Complex to Simple: How We Built seekdb for the AI Era

AI era doesn't need another heavy, complex enterprise database. It needs agility. It needs flexibility. We went back to the drawing board to understand what an AI application actually needs from a database. Our answer is OceanBase seekdb

Mike LiuNovember 28, 2025

PRODUCT

Dynamic AI-to-Database Queries with Claude, MCP, and OceanBase

Learn how to connect Claude to OceanBase using the Model Context Protocol (MCP) and enable dynamic schema discovery and natural-language SQL without brittle prompt engineering.

Nassal LEEHARDMay 8, 2026