Stop Letting Agents Write Directly to Production: Git-Like Version Control for Your Data

Xingkun Fan
Xingkun Fan
Published on April 14, 2026
9 minute read
Key Takeaways
  • AI agents writing directly to production tables is the new "pushing to main without a PR." seekdb 1.2.0 adds Git-like version control primitives — Fork, Diff, Merge — as native SQL statements.
  • FORK DATABASE creates an instant copy-on-write clone of an entire database with a single atomic snapshot. DIFF TABLE compares two tables row-by-row and outputs every difference. MERGE TABLE applies changes back with conflict resolution in one transaction.
  • The result: agents work on isolated copies, humans review the diff, and only approved changes reach production. One SQL statement replaces thirty minutes of hand-written JOINs.
  • oceanbase database

    Your agent wrote some business logic. The output table has 500 rows. The expected-results table also has 500 rows. Which rows are correct? Which values are wrong? Are there extra rows, or missing ones?

    You open a SQL editor and start writing LEFT JOIN ... CASE WHEN ... UNION ALL. You realize you forgot to handle NULLs. Fix that. The sort order is wrong. Fix that too. Thirty minutes later, you have a diff. You paste it into the agent's context and ask it to fix the code. It fixes the code. You run the whole comparison again.

    Code has had git diff for this — one command, done. Data hasn't. You wouldn't let a developer push directly to main without a PR. So why is your agent writing directly to production tables?

    seekdb 1.2.0 introduces three SQL primitives that close this gap:

    DIFF TABLE actual_result AGAINST expected_result;

    One statement. Every difference, down to the row and column. Combined with Fork Database (database-level copy-on-write cloning) and MERGE TABLE (conflict-aware merging), plus Fork Table from v1.1.0, seekdb now provides a complete data branching workflow — fork, change, diff, merge. The same loop you use with Git, applied to data.

    What We Built

    seekdb 1.2.0 adds three new SQL statements:

  • FORK DATABASE source_database TO destination_database — Instant copy-on-write clone of an entire database, all tables sharing the same atomic snapshot
  • DIFF TABLE incoming_db.incoming_t AGAINST current_db.current_t — Row-level comparison of two tables by primary key, outputting conflicts, additions, and deletions
  • MERGE TABLE incoming_db.incoming_t INTO current_db.current_t STRATEGY {FAIL | THEIRS | OURS} — Merge changes back with conflict resolution in a single transaction
  • All three work with standard MySQL clients — no special drivers or SDKs required
  • The Workflow: Fork → Change → Diff → Merge

    Here's a concrete example. Say you have a coding agent that generates business logic, and you want to verify its output against expected results:

    -- Baseline: input table + result template
    CREATE TABLE test.orders_input (id INT PRIMARY KEY, amount DECIMAL(10,2), status VARCHAR(20));
    INSERT INTO test.orders_input VALUES (1, 100, 'pending'), (2, 200, 'pending');
    
    CREATE TABLE test.result_template (id INT PRIMARY KEY, final_amount DECIMAL(10,2));
    
    -- Fork an isolated environment for each test run
    FORK DATABASE test TO test_run;
    
    -- Actual results produced by the agent's business logic
    FORK TABLE test_run.result_template TO test_run.actual_result;
    INSERT INTO test_run.actual_result
    SELECT id, CASE WHEN amount >= 150 THEN amount * 0.9 ELSE amount END
    FROM test_run.orders_input;
    
    -- Expected results from human review or rule system
    FORK TABLE test_run.result_template TO test_run.expected_result;
    INSERT INTO test_run.expected_result VALUES (1, 100), (2, 180);
    
    -- Compare actual results against expected results
    DIFF TABLE test_run.actual_result AGAINST test_run.expected_result;

    If DIFF returns nothing, the agent's logic matches expectations. If it returns rows, the differences are precise — which rows, which columns, what values. You can feed that directly back to the agent for another iteration. That's the write code → test → debug loop, closed at the data layer.

    The same pattern works when an agent modifies data directly (labeling, cleaning, rule-based corrections): fork a copy, let the agent work on the copy, DIFF to review, MERGE to apply.

    Fork Database: Atomic Snapshots Across Tables

    Fork Table (shipped in seekdb 1.1.0) clones a single table via copy-on-write in milliseconds. But real workloads span multiple tables with foreign key relationships.

    If you fork orders, users, and payments one at a time, each snapshot is taken at a slightly different moment. An order might reference a payment that hasn't been forked yet. Foreign keys break. Join results are inconsistent.

    Fork Database solves this by snapshotting the entire database at a single point in time:

    FORK DATABASE db_src TO db_fork;

    All tables in db_fork share the same snapshot version. Foreign keys between tables within the database are preserved. The cloned database is fully readable and writable — changes don't affect the source.

    Key properties:

  • Atomic snapshot — All tables share one fork point. Multi-table joins and foreign keys stay consistent.
  • Database properties inherited — Charset, collation, and other database-level settings carry over.
  • Intra-database foreign keys preserved — FK relationships within the database are maintained; cross-database FKs are skipped.
  • Multiple forks supported — Fork the same source to different targets, each with its own independent snapshot.
  • DIFF TABLE: See Exactly What Changed

    DIFF TABLE incoming_db.incoming_t AGAINST current_db.current_t;

    DIFF compares two tables row by row using primary keys. It's read-only — nothing gets modified.

    The output categorizes every difference:

    Category
    Meaning
    ConflictSame primary key, different values — both rows are shown
    Current-onlyRow exists only in the current table
    Incoming-onlyRow exists only in the incoming table
    IdenticalNot shown

    A concrete example makes this clearer:

    CREATE TABLE demo.student_base (id INT PRIMARY KEY, name VARCHAR(20), score INT);
    INSERT INTO demo.student_base VALUES (1, 'Alice', 80), (2, 'Bob', 90), (3, 'Charlie', 70);
    
    FORK TABLE demo.student_base TO demo.t_current;
    FORK TABLE demo.student_base TO demo.t_incoming;
    
    UPDATE demo.t_incoming SET score = 85 WHERE id = 1;
    DELETE FROM demo.t_incoming WHERE id = 2;
    INSERT INTO demo.t_incoming VALUES (4, 'David', 95);
    
    -- First, check the diff
    DIFF TABLE demo.t_incoming AGAINST demo.t_current;
    
    -- Then decide whether to merge; THEIRS means take the incoming version on conflict
    MERGE TABLE demo.t_incoming INTO demo.t_current STRATEGY THEIRS;
    
    -- Check the result after merge
    SELECT * FROM demo.t_current ORDER BY id;

    Output:

    __table      __flag  id  name     score
    t_current    INSERT   1  Alice    80
    t_incoming   INSERT   1  Alice    85
    t_current    INSERT   2  Bob      90
    t_incoming   INSERT   4  David    95

    Row id=3 (Charlie, 70) is identical in both tables, so it doesn't appear. Row id=1 shows as a conflict — score differs. Row id=2 is current-only (deleted in incoming). Row id=4 is incoming-only (new).

    Requirements: both tables must have identical column definitions (name, type, order) and a primary key. NULL comparisons use <=> (NULL equals NULL).

    MERGE TABLE: Bring Changes Back Safely

    MERGE TABLE incoming_db.incoming_t INTO current_db.current_t  STRATEGY {FAIL | THEIRS | OURS};

    MERGE applies incoming changes to the current table in a single transaction. If anything fails, the entire operation rolls back.

    Non-conflict behavior is the same regardless of strategy: incoming-only rows get inserted into current; current-only rows stay untouched. MERGE never deletes rows.

    The strategy only controls what happens when both tables have the same primary key with different values:

    Strategy
    On conflict
    Use case
    FAIL (default)Rolls back the entire mergeStrict audit — no silent overwrites
    THEIRSIncoming overwrites current"The agent's version is correct"
    OURSCurrent stays, incoming is ignored"Just add new rows, don't touch existing data"

    After running MERGE TABLE demo.t_incoming INTO demo.t_current STRATEGY THEIRS in the example above, t_current becomes:

    id  name     score
    1   Alice    85
    2   Bob      90
    3   Charlie  70
    4   David    95

    Where This Fits in Practice

    The fork → diff → merge workflow maps to several real scenarios:

    Agent code testing. Fork the database per test run. Run the agent's generated code against the forked copy. DIFF actual vs. expected results. Empty output means the test passed; non-empty output tells you exactly which rows and values are wrong. Feed that directly back to the agent. No shared test environments, no data pollution between runs.

    Agent data modifications. Agent labels, cleans, or transforms data on a forked copy. DIFF to review what changed before anything touches production. MERGE with THEIRS to apply, or DROP the fork to discard. It's the data equivalent of a PR review.

    Multi-branch development. Each developer forks the test database alongside their code branch. No more "your test data broke my test" — the data version that CI flakes on is your fork, not a shared environment. DIFF before merging back; MERGE when the feature ships.

    Parallel team edits. Risk, ops, and ML teams each fork the same table. Work independently. MERGE sequentially with explicit conflict strategies — no more "who changed that row?" coordination overhead.

    Data quality checks. DIFF an expected-results table against actual output. Conflict rows mean wrong values; unique rows mean missing or extra records. One SQL statement replaces a wall of JOINs and CASE WHENs.

    Practical Tips

  • Start with a small table to validate the workflow, then expand to critical paths.
  • Treat DIFF output as review material — data changes get Code Review too.
  • Make sure key tables have a primary key. DIFF and MERGE rely on primary keys for row-level comparison.
  • Name forked databases so the purpose is obvious at a glance (e.g. prod_riskfix, exp_training_data_v2).
  • No More Hand-Written JOINs — Data Changes Get Code Review

    Before
    With seekdb
    Edit production tables directly; roll back by hand when something breaksFork a copy first. The source table stays untouched. If something breaks, DROP the fork.
    Write piles of JOINs + CASE WHENs to compare two tablesOne DIFF TABLE statement outputs every difference, down to the row and value
    Resolve conflicts by pulling people into a group chat and eyeballing rowsMERGE TABLE with three built-in strategies handles conflicts in a single SQL statement
    AI Agent writes directly to production; when something goes wrong, nobody can explain what changedAgent writes to a fork. A human reviews the DIFF, then MERGEs back.

    From Fork Table in seekdb 1.1.0 to Fork Database + DIFF TABLE + MERGE TABLE in 1.2.0, seekdb is systematically building version control into the database engine. The design intent is straightforward:

  • Fork (Table / Database) provides branching — safely create an isolated copy.
  • DIFF TABLE provides comparison — see exactly what changed.
  • MERGE TABLE provides merging — bring changes back with a strategy you choose.
  • Together, they form the data equivalent of Git's branch / diff / merge loop. When data iterates as frequently as code, an observable, controllable change path is what makes data management safe, auditable, and collaborative.

    Want to try it? seekdb D0 lets you experience Fork Database, DIFF TABLE, and MERGE TABLE hands-on — fork a database, change a few rows, DIFF to see differences, MERGE to bring them back. The whole workflow takes about five minutes.

    Start at d0.seekdb.ai.

    Share
    X
    linkedin
    mail