

Your agent wrote some business logic. The output table has 500 rows. The expected-results table also has 500 rows. Which rows are correct? Which values are wrong? Are there extra rows, or missing ones?
You open a SQL editor and start writing LEFT JOIN ... CASE WHEN ... UNION ALL. You realize you forgot to handle NULLs. Fix that. The sort order is wrong. Fix that too. Thirty minutes later, you have a diff. You paste it into the agent's context and ask it to fix the code. It fixes the code. You run the whole comparison again.
Code has had git diff for this — one command, done. Data hasn't. You wouldn't let a developer push directly to main without a PR. So why is your agent writing directly to production tables?
seekdb 1.2.0 introduces three SQL primitives that close this gap:
DIFF TABLE actual_result AGAINST expected_result;One statement. Every difference, down to the row and column. Combined with Fork Database (database-level copy-on-write cloning) and MERGE TABLE (conflict-aware merging), plus Fork Table from v1.1.0, seekdb now provides a complete data branching workflow — fork, change, diff, merge. The same loop you use with Git, applied to data.
seekdb 1.2.0 adds three new SQL statements:
FORK DATABASE source_database TO destination_database — Instant copy-on-write clone of an entire database, all tables sharing the same atomic snapshotDIFF TABLE incoming_db.incoming_t AGAINST current_db.current_t — Row-level comparison of two tables by primary key, outputting conflicts, additions, and deletionsMERGE TABLE incoming_db.incoming_t INTO current_db.current_t STRATEGY {FAIL | THEIRS | OURS} — Merge changes back with conflict resolution in a single transactionHere's a concrete example. Say you have a coding agent that generates business logic, and you want to verify its output against expected results:
-- Baseline: input table + result template
CREATE TABLE test.orders_input (id INT PRIMARY KEY, amount DECIMAL(10,2), status VARCHAR(20));
INSERT INTO test.orders_input VALUES (1, 100, 'pending'), (2, 200, 'pending');
CREATE TABLE test.result_template (id INT PRIMARY KEY, final_amount DECIMAL(10,2));
-- Fork an isolated environment for each test run
FORK DATABASE test TO test_run;
-- Actual results produced by the agent's business logic
FORK TABLE test_run.result_template TO test_run.actual_result;
INSERT INTO test_run.actual_result
SELECT id, CASE WHEN amount >= 150 THEN amount * 0.9 ELSE amount END
FROM test_run.orders_input;
-- Expected results from human review or rule system
FORK TABLE test_run.result_template TO test_run.expected_result;
INSERT INTO test_run.expected_result VALUES (1, 100), (2, 180);
-- Compare actual results against expected results
DIFF TABLE test_run.actual_result AGAINST test_run.expected_result;If DIFF returns nothing, the agent's logic matches expectations. If it returns rows, the differences are precise — which rows, which columns, what values. You can feed that directly back to the agent for another iteration. That's the write code → test → debug loop, closed at the data layer.
The same pattern works when an agent modifies data directly (labeling, cleaning, rule-based corrections): fork a copy, let the agent work on the copy, DIFF to review, MERGE to apply.
Fork Table (shipped in seekdb 1.1.0) clones a single table via copy-on-write in milliseconds. But real workloads span multiple tables with foreign key relationships.
If you fork orders, users, and payments one at a time, each snapshot is taken at a slightly different moment. An order might reference a payment that hasn't been forked yet. Foreign keys break. Join results are inconsistent.
Fork Database solves this by snapshotting the entire database at a single point in time:
FORK DATABASE db_src TO db_fork;All tables in db_fork share the same snapshot version. Foreign keys between tables within the database are preserved. The cloned database is fully readable and writable — changes don't affect the source.
Key properties:
DIFF TABLE incoming_db.incoming_t AGAINST current_db.current_t;DIFF compares two tables row by row using primary keys. It's read-only — nothing gets modified.
The output categorizes every difference:
Category | Meaning |
|---|---|
| Conflict | Same primary key, different values — both rows are shown |
| Current-only | Row exists only in the current table |
| Incoming-only | Row exists only in the incoming table |
| Identical | Not shown |
A concrete example makes this clearer:
CREATE TABLE demo.student_base (id INT PRIMARY KEY, name VARCHAR(20), score INT);
INSERT INTO demo.student_base VALUES (1, 'Alice', 80), (2, 'Bob', 90), (3, 'Charlie', 70);
FORK TABLE demo.student_base TO demo.t_current;
FORK TABLE demo.student_base TO demo.t_incoming;
UPDATE demo.t_incoming SET score = 85 WHERE id = 1;
DELETE FROM demo.t_incoming WHERE id = 2;
INSERT INTO demo.t_incoming VALUES (4, 'David', 95);
-- First, check the diff
DIFF TABLE demo.t_incoming AGAINST demo.t_current;
-- Then decide whether to merge; THEIRS means take the incoming version on conflict
MERGE TABLE demo.t_incoming INTO demo.t_current STRATEGY THEIRS;
-- Check the result after merge
SELECT * FROM demo.t_current ORDER BY id;Output:
__table __flag id name score
t_current INSERT 1 Alice 80
t_incoming INSERT 1 Alice 85
t_current INSERT 2 Bob 90
t_incoming INSERT 4 David 95Row id=3 (Charlie, 70) is identical in both tables, so it doesn't appear. Row id=1 shows as a conflict — score differs. Row id=2 is current-only (deleted in incoming). Row id=4 is incoming-only (new).
Requirements: both tables must have identical column definitions (name, type, order) and a primary key. NULL comparisons use <=> (NULL equals NULL).
MERGE TABLE incoming_db.incoming_t INTO current_db.current_t STRATEGY {FAIL | THEIRS | OURS};MERGE applies incoming changes to the current table in a single transaction. If anything fails, the entire operation rolls back.
Non-conflict behavior is the same regardless of strategy: incoming-only rows get inserted into current; current-only rows stay untouched. MERGE never deletes rows.
The strategy only controls what happens when both tables have the same primary key with different values:
Strategy | On conflict | Use case |
|---|---|---|
| FAIL (default) | Rolls back the entire merge | Strict audit — no silent overwrites |
| THEIRS | Incoming overwrites current | "The agent's version is correct" |
| OURS | Current stays, incoming is ignored | "Just add new rows, don't touch existing data" |
After running MERGE TABLE demo.t_incoming INTO demo.t_current STRATEGY THEIRS in the example above, t_current becomes:
id name score
1 Alice 85
2 Bob 90
3 Charlie 70
4 David 95The fork → diff → merge workflow maps to several real scenarios:
Agent code testing. Fork the database per test run. Run the agent's generated code against the forked copy. DIFF actual vs. expected results. Empty output means the test passed; non-empty output tells you exactly which rows and values are wrong. Feed that directly back to the agent. No shared test environments, no data pollution between runs.
Agent data modifications. Agent labels, cleans, or transforms data on a forked copy. DIFF to review what changed before anything touches production. MERGE with THEIRS to apply, or DROP the fork to discard. It's the data equivalent of a PR review.
Multi-branch development. Each developer forks the test database alongside their code branch. No more "your test data broke my test" — the data version that CI flakes on is your fork, not a shared environment. DIFF before merging back; MERGE when the feature ships.
Parallel team edits. Risk, ops, and ML teams each fork the same table. Work independently. MERGE sequentially with explicit conflict strategies — no more "who changed that row?" coordination overhead.
Data quality checks. DIFF an expected-results table against actual output. Conflict rows mean wrong values; unique rows mean missing or extra records. One SQL statement replaces a wall of JOINs and CASE WHENs.
prod_riskfix, exp_training_data_v2).Before | With seekdb |
|---|---|
| Edit production tables directly; roll back by hand when something breaks | Fork a copy first. The source table stays untouched. If something breaks, DROP the fork. |
| Write piles of JOINs + CASE WHENs to compare two tables | One DIFF TABLE statement outputs every difference, down to the row and value |
| Resolve conflicts by pulling people into a group chat and eyeballing rows | MERGE TABLE with three built-in strategies handles conflicts in a single SQL statement |
| AI Agent writes directly to production; when something goes wrong, nobody can explain what changed | Agent writes to a fork. A human reviews the DIFF, then MERGEs back. |
From Fork Table in seekdb 1.1.0 to Fork Database + DIFF TABLE + MERGE TABLE in 1.2.0, seekdb is systematically building version control into the database engine. The design intent is straightforward:
Together, they form the data equivalent of Git's branch / diff / merge loop. When data iterates as frequently as code, an observable, controllable change path is what makes data management safe, auditable, and collaborative.
Want to try it? seekdb D0 lets you experience Fork Database, DIFF TABLE, and MERGE TABLE hands-on — fork a database, change a few rows, DIFF to see differences, MERGE to bring them back. The whole workflow takes about five minutes.
Start at d0.seekdb.ai.

At the OceanBase DevCon 2024, we introduced the OceanBase 4.3.0 Beta, unveiling a brand new columnar engine. This release achieves near petabyte-scale, real-time analytics in seconds, and enhances the integration of TP and AP capabilities.


AI era doesn't need another heavy, complex enterprise database. It needs agility. It needs flexibility. We went back to the drawing board to understand what an AI application actually needs from a database. Our answer is OceanBase seekdb


Learn how to connect Claude to OceanBase using the Model Context Protocol (MCP) and enable dynamic schema discovery and natural-language SQL without brittle prompt engineering.
