OceanBase

AI agents write to databases at high frequency with no accountability. FORK TABLE in seekdb gives every agent its own isolated, read-write branch — created in under a second regardless of table size via copy-on-write.

Branches share underlying data and vector indexes until they diverge, so forking a terabyte table or a RAG knowledge base with millions of embeddings costs almost nothing. Bad run? Drop the branch. Good run? Promote it with an atomic rename.

This is Git's insight applied to data: when branching is cheap, the workflow shifts from "be careful on main" to "branch everything, experiment freely, merge the best result."

oceanbase database

Many of the database primitives we rely on — transactions, locks, audit logs — were designed around a human operator who thinks before querying, understands what they are changing, and can usually reconstruct what happened when something breaks.

Agents operate very differently. In a typical vibe-coding or agentic workflow, an agent may execute dozens of INSERT, UPDATE, and DELETE statements with minimal human review — high-frequency writes generated from probabilistic decisions rather than deterministic business logic. And when the resulting data is wrong, it is often difficult to reconstruct which step introduced the error from conventional logs alone.

This is the default operating mode for RAG pipelines, multi-agent systems, and AI-driven data workflows. seekdb's Fork Table addresses this directly — giving every agent its own isolated branch where it can write freely, with zero risk to production data.

The scenarios driving this

Recently, one question kept appearing in our user community:

"An AI modified my data. How do I get back to where I was?"

Here are three real-life situations:

Lost work. A data scientist spent weeks tuning a dataset through feature engineering. An agent ran a "harmless-looking" cleanup script during a vibe-coding session. The tuned dataset was gone.

Polluted embeddings. A RAG knowledge base was polluted with low-quality embeddings by a misbehaving agent. The team could see retrieval quality degrading, but had no easy way to identify when the contamination started or isolate the affected state.

Write collisions. Two agents in a multi-agent system kept overwriting each other's outputs to a shared knowledge base, leaving the data in an unpredictable state.

The common thread is simple: agents need sandboxes. Not backups. Not read-only snapshots. They need complete, read-write, isolated workspaces where an agent can modify data freely, and where the entire branch can be discarded if the run fails or promoted if it succeeds.

Why Git's insight applies to data

In 2005, Git shipped with one insight: branches should be cheap. When branch creation costs nothing, developer behavior shifts from "be careful on main" to "branch everything, experiment freely, merge the best result."

Data needs the same shift — and for the same reason. AI development today looks like this:

A data scientist validating three feature engineering approaches simultaneously

A team A/B testing two prompt strategies against the same production data

A multi-agent system where several agents concurrently read from and write to a shared knowledge base

All share one structure: the same dataset, multiple parallel evolution paths, and you need to pick the best outcome.

What are people actually doing? CREATE TABLE features_v1 AS SELECT ..., features_v2, features_v2_final, features_v2_final_REAL. That is not version control; it is manual table sprawl. At terabyte scale, each full-table copy can mean long wait times, significant storage overhead, and operational friction that discourages experimentation.

How Fork Table works

Fork Table creates a new branch from a transactionally consistent snapshot of an existing table. The branch initially shares underlying data structures through copy-on-write, so creation is fast and storage-efficient. Once the branch diverges, only changed data and affected index state need to be materialized.

Instant branching at any scale

Branch creation is O(1) with respect to table size because the operation creates new branch metadata rather than copying table contents. In practice, a 1 TB table can be forked in roughly the same time as a 1 MB table — typically in under a second in our implementation. Copy-on-write means data is not duplicated until the branch actually diverges.

This changes what becomes operationally practical. When branching is this cheap, you can create an isolated writable branch before an agent run or before a risky transformation step. When branching is expensive, teams avoid doing it and accept shared-state risk instead.

Full read-write isolation

Each branch is a complete sandbox — not a read-only snapshot. Agents can INSERT, UPDATE, and DELETE freely inside the branch. Because writes are isolated at the branch level, two agents working on separate branches do not mutate each other’s visible state.

-- Create a branchFORK TABLE main_table TO experiment_branch;-- Work on the branch (any SQL operations)UPDATE experiment_branch SET ...;INSERT INTO experiment_branch ...;-- Discard if failedDROP TABLE experiment_branch;-- Or promote if successfulRENAME TABLE main_table TO main_backup, experiment_branch TO main_table;

Standard SQL — no new syntax to learn beyond FORK TABLE.

When you fork a table with HNSW vector indexes, the index structures are initially shared across branches using copy-on-write. That means forking a RAG knowledge base with millions of embeddings does not require an immediate full index rebuild. The branch reuses existing index state until data or index paths diverge, which keeps branching fast and avoids making experimentation prohibitively expensive.

This is where AI-oriented storage semantics start to matter. In systems where branching and vector indexing are introduced as separate layers, efficient index reuse can become difficult, and teams may face tradeoffs between rebuild cost, freshness, and branch isolation.

What changes for your workflow

Fork Table is not just about making experiments faster. It changes the safety model of agent-driven data systems. Instead of letting multiple agents mutate shared state and hoping logs are enough to recover, you give each run its own isolated branch and decide later what deserves to be promoted.

For teams building AI applications, native data branching removes a category of operational risk.

Agent development becomes safe by default. Instead of hoping agents don't corrupt shared data, you isolate them in branches. Bad run? Drop the branch. Good run? Promote it to main with an atomic rename.

Parallel experimentation works. A/B tests run on two branches from a common snapshot, so comparisons are fair. Three feature engineering approaches can evolve simultaneously without duplicating terabytes.

AI workflows become reproducible. Feature versions for model training get pinned to branch snapshots. When you need to explain why a model behaved differently three weeks ago, the relevant data state is still recoverable.

The pattern is the same one Git brought to code: do not rely on people being more careful. Reduce the cost of experimentation and the cost of being wrong.

Scaling to full databases

Fork Table handles single-table branching. But real AI applications rarely touch just one table — an agent's knowledge base might span structured data, vector indexes, and metadata tables that need to be forked together at a consistent snapshot point. In this case, Fork Database (available in seekdb V1.2.0) allows developers to create a complete, isolated copy of the entire database at a globally consistent point in time.

Try Fork Table now

seekdb's Fork Table is available locally on either macOS or Docker environment.

# macOSbrew tap oceanbase/seekdb && brew install seekdb && seekdb-start# Dockerdocker run -d --name seekdb oceanbase/seekdb:latest

Connect with any MySQL client (mysql -h127.0.0.1 -uroot -P2881 -A -Dtest) and start branching.

For documentation and online trial environments, see oceanbase.ai.

AI seekdb Sql

Content

The scenarios driving this

Why Git's insight applies to data

How Fork Table works

What changes for your workflow

Scaling to full databases

Try Fork Table now

Keep Reading

View all posts

PRODUCT

Exploring OceanBase 4.3: New Features and Enhancements

At the OceanBase DevCon 2024, we introduced the OceanBase 4.3.0 Beta, unveiling a brand new columnar engine. This release achieves near petabyte-scale, real-time analytics in seconds, and enhances the integration of TP and AP capabilities.

Ray YuJune 13, 2024

ENGINEERING

From Complex to Simple: How We Built seekdb for the AI Era

AI era doesn't need another heavy, complex enterprise database. It needs agility. It needs flexibility. We went back to the drawing board to understand what an AI application actually needs from a database. Our answer is OceanBase seekdb

Mike LiuNovember 28, 2025

PRODUCT

Why Critical Workloads Need a New Architecture

Centralized databases, manual sharding, and middleware-based scaling are reaching their limits. Here's why mission-critical workloads now demand native distributed SQL — and what that architecture actually looks like.

Ray YuMay 21, 2026

OceanBase

Customer Stories

Documentation

FORK TABLE: Get Your Database Ready for Agents

The scenarios driving this

Why Git's insight applies to data

How Fork Table works

Instant branching at any scale

Full read-write isolation

Vector index sharing

What changes for your workflow

Scaling to full databases

Try Fork Table now

Keep Reading

Exploring OceanBase 4.3: New Features and Enhancements

From Complex to Simple: How We Built seekdb for the AI Era

Why Critical Workloads Need a New Architecture