Fork Table: Safe, Instant Data Branching for AI Agents

右侧logo

oceanbase database

Every database feature you rely on — transactions, locks, audit logs — was built for a human operator who thinks before querying, knows what they're changing, and can explain what happened when something breaks.

Agents operate differently. In a typical Vibe Coding session, an agent executes dozens of INSERT, UPDATE, and DELETE statements with no human supervision — high-frequency, unwatched writes. Every write is probabilistic inference; the agent has no certainty its output is correct. And when your data is wrong, the agent cannot tell you which step caused it.

This is the default operating mode for RAG pipelines, multi-agent systems, and AI-driven data workflows. seekdb's Fork Table addresses this directly — giving every agent its own isolated branch where it can write freely, with zero risk to production data.

The scenarios driving this

Recently, one question kept appearing in our user community:

An AI modified my data. How do I get back to where I was?"

Here are three real situations:

  • Lost work. A data scientist spent weeks tuning a dataset through feature engineering. An agent ran a "harmless-looking" cleanup script during a Vibe Coding session. The tuned dataset was gone.
  • Polluted embeddings. A RAG knowledge base was contaminated with low-quality embeddings by a misbehaving agent. No way to identify when the contamination started — every query now returns degraded results.
  • Write collisions. Two agents in a multi-agent system kept overwriting each other's outputs to a shared knowledge base, leaving the data in an unpredictable state.

The common thread: agents need sandboxes. Not backups. Not read-only snapshots. Complete, read-write, isolated private spaces where an agent can do whatever it wants, and you can discard the whole thing if it goes wrong, or merge it back if it works.

Why Git's insight applies to data

In 2005, Git shipped with one insight: branches should be cheap. When branch creation costs nothing, developer behavior shifts from "be careful on main" to "branch everything, experiment freely, merge the best result."

Data needs the same shift — and for the same reason. AI development today looks like this:

  • A data scientist validating three feature engineering approaches simultaneously
  • A team A/B testing two prompt strategies against the same production data
  • A multi-agent system where five agents concurrently read from and write to a shared knowledge base

All share one structure: the same dataset, multiple parallel evolution paths, and you need to pick the best outcome.

What are people actually doing? CREATE TABLE features_v1 AS SELECT ..., features_v2, features_v2_final, features_v2_final_REAL. This isn't version control — it's digital hoarding. At terabyte scale, every "copy the table" operation means hours of waiting and doubled storage costs.

How Fork Table works

We built Fork Table into seekdb's storage engine — not added as a layer on top. This enables three capabilities we designed for AI workloads:

Instant branching at any scale

Branch creation is O(1) regardless of table size. A 1TB table forks in the same time as a 1MB table — under a second. Copy-on-write means no data duplication until the branch actually diverges. This changes what's practical. When branching is this cheap, you can auto-snapshot before every agent write. When it's expensive, you skip it.

Full read-write isolation

Each branch is a complete sandbox — not a read-only snapshot. Agents can INSERT, UPDATE, and DELETE freely. Two agents writing to separate branches cannot affect each other's data.

-- Create a branch
FORK TABLE main_table TO experiment_branch;

-- Work on the branch (any SQL operations)
UPDATE experiment_branch SET ...;
INSERT INTO experiment_branch ...;

-- Discard if failed
DROP TABLE experiment_branch;

-- Or promote if successful
RENAME TABLE main_table TO main_backup, experiment_branch TO main_table;

Standard SQL

No new syntax to learn beyond FORK TABLE.

Vector index sharing

When you fork a table with HNSW vector indexes, the indexes are shared across branches using copy-on-write. Forking a RAG knowledge base with millions of embeddings doesn't rebuild the index — it shares it until modifications diverge. This is where our AI-native design matters. Systems that bolt branching onto existing architectures can't share vector indexes efficiently — they either rebuild or skip indexing entirely.

Fork Table is not just about making experiments faster. It changes the safety model of agent-driven data systems. Instead of letting multiple agents mutate shared state and hoping logs are enough to recover, you give each run its own isolated branch and decide later what deserves to be promoted.

⚠️ Current limitaitions:
Fork Table is introduced an experimental feature in seekdb V1.1.0. It has limitations including mutual exclusion with concurrent DDL, no support for partitioned tables with global indexes, and no semantic/IVF/spatial index support. We're actively expanding coverage.

What changes for your workflow

For teams building AI applications, native data branching removes a category of operational risk:

  1. Agent development becomes safe by default. Instead of hoping agents don't corrupt shared data, you isolate them in branches. Bad run? Drop the branch. Good run? Promote it with an atomic rename.
  2. Parallel experimentation works. A/B tests run on two branches from a common snapshot, so comparisons are fair. Three feature engineering approaches can evolve simultaneously without duplicating terabytes.
  3. AI workflows become reproducible. Feature versions for model training get pinned to branch snapshots. When you need to debug why a model performed differently three weeks ago, the data state is still there.

The pattern is the same thing Git did for code: don't tell developers to be more careful. Make the cost of being wrong approach zero.

Scaling to full databases

Fork Table handles single-table branching. But real AI applications rarely touch just one table — an agent's knowledge base might span structured data, vector indexes, and metadata tables that need to be forked together at a consistent snapshot point. In this case, Fork Database (available in seekdb V1.2.0) allows users to create a complete, isolated copy of the entire database at a globally consistent point in time.


Try Fork Table now

Fork Table is available locally with no signup:

  1. Install seekdb:
# macOS brew tap oceanbase/seekdb && brew install seekdb && seekdb-start # Docker docker run -d --name seekdb oceanbase/seekdb:latest

2. Connect with any MySQL client:

mysql -h127.0.0.1 -uroot -P2881 -A -Dtest

3. Start branching:

FORK TABLE my_table TO my_branch;

For documentation and online trial: oceanbase.ai/docs.



ICON_SHARE
ICON_SHARE
linkedin
Contact Us