OceanBase logo

OceanBase

A unified distributed database ready for your transactional, analytical, and AI workloads.

DEPLOY YOUR WAY

OceanBase Cloud

The best way to deploy and scale OceanBase

OceanBase Enterprise

Run and manage OceanBase on your infra

TRY OPEN SOURCE

OceanBase Community Edition

The free, open-source distributed database

OceanBase seekdb

Open source AI native search database

Customer Stories

Real-world success stories from enterprises across diverse industries.

View All
BY USE CASES

Mission-Critical Transactions

Global & Multicloud Application

Elastic Scaling for Peak Traffic

Real-time Analytics

Active Geo-redundancy

Database Consolidation

Resources

Comprehensive knowledge hub for OceanBase.

Blog

Live Demos

Training & Certification

Documentation

Official technical guides, tutorials, API references, and manuals for all OceanBase products.

View All
PRODUCTS

OceanBase Cloud

OceanBase Database

Tools

Connectors and Middleware

QUICK START

OceanBase Cloud

OceanBase Database

BEST PRACTICES

Practical guides for utilizing OceanBase more effectively and conveniently

Company

Learn more about OceanBase – our company, partnerships, and trust and security initiatives.

About OceanBase

Partner

Trust Center

Contact Us

International - English
中国站 - 简体中文
日本 - 日本語
Sign In
Start on Cloud

A unified distributed database ready for your transactional, analytical, and AI workloads.

DEPLOY YOUR WAY

OceanBase Cloud

The best way to deploy and scale OceanBase

OceanBase Enterprise

Run and manage OceanBase on your infra

TRY OPEN SOURCE

OceanBase Community Edition

The free, open-source distributed database

OceanBase seekdb

Open source AI native search database

Customer Stories

Real-world success stories from enterprises across diverse industries.

View All
BY USE CASES

Mission-Critical Transactions

Global & Multicloud Application

Elastic Scaling for Peak Traffic

Real-time Analytics

Active Geo-redundancy

Database Consolidation

Comprehensive knowledge hub for OceanBase.

Blog

Live Demos

Training & Certification

Documentation

Official technical guides, tutorials, API references, and manuals for all OceanBase products.

View All
PRODUCTS
OceanBase CloudOceanBase Database
ToolsConnectors and Middleware
QUICK START
OceanBase CloudOceanBase Database
BEST PRACTICES

Practical guides for utilizing OceanBase more effectively and conveniently

Learn more about OceanBase – our company, partnerships, and trust and security initiatives.

About OceanBase

Partner

Trust Center

Contact Us

Start on Cloud
编组
All Products
    • Databases
    • iconOceanBase Database
    • iconOceanBase Cloud
    • iconOceanBase Tugraph
    • iconInteractive Tutorials
    • iconOceanBase Best Practices
    • Tools
    • iconOceanBase Cloud Platform
    • iconOceanBase Migration Service
    • iconOceanBase Developer Center
    • iconOceanBase Migration Assessment
    • iconOceanBase Admin Tool
    • iconOceanBase Loader and Dumper
    • iconOceanBase Deployer
    • iconKubernetes operator for OceanBase
    • iconOceanBase Diagnostic Tool
    • iconOceanBase Binlog Service
    • Connectors and Middleware
    • iconOceanBase Database Proxy
    • iconEmbedded SQL in C for OceanBase
    • iconOceanBase Call Interface
    • iconOceanBase Connector/C
    • iconOceanBase Connector/J
    • iconOceanBase Connector/ODBC
    • iconOceanBase Connector/NET
icon

OceanBase Best Practices

All Versions

  • Deploy
    • Configuration guide for read-write splitting in AP scenarios
    • Best practices for read-write splitting
  • Migrate
    • Data transfer solutions in OceanBase Database
    • Overview on data migration
    • Best practices for importing data files to OceanBase Database
    • Best practice for migrating data from other databases to OceanBase Database
    • Massive data migration strategy
    • Best practices for migrating data from MyCat to OceanBase Database
    • Best practices for migrating PostgreSQL to OceanBase MySQL-compatible mode
  • Route
    • ODP routing best practices
  • Table Design
    • Best practices for table design and index optimization
    • Best practices for creating indexes on large tables
    • Best practices for database development
  • Develop
    • Best practices for connecting Java applications to OceanBase Database
    • Best practices for integrating Spark Catalog with OceanBase Database
    • Best practices for achieving optimal performance in batch DML using JDBC and OBServer
    • Best practices for bulk data cleanup in OceanBase Database
    • Best practices for PDML processing in OceanBase Database
    • Best practices for hot tables in OceanBase Database
    • Best practices for auto-increment columns and sequences in OceanBase Database
  • Manage
    • Best practices for resource throttling
    • Best practices for data load balancing
    • Best practices for security certification
    • Best practices for access control
    • Best practices for data encryption
  • Diagnose
    • Best practices for log interpretation in common scenarios
    • Best practices for end-to-end tracing
    • Best practices for using obdiag to collect performance information
    • Best practices for using obdiag to collect diagnostic information of parallel and slow SQL statements
    • Best practices for troubleshooting OceanBase Database performance issues
  • Performance Tuning
    • Best practices for handling slow queries
    • Best practices for collecting statistics to generate an efficient execution plan
    • Best practices for updating hotspot rows
    • Best practices for large object storage performance
    • Best practices for semi-structured storage performance
    • Best practices for OceanBase materialized views
  • Cloud Database
    • Best practices for achieving high availability through cross-cloud active-active deployment
    • High availability through primary and standby databases across clouds
    • High host CPU usage
    • Best practices for read/write splitting in OceanBase Cloud

Download PDF

Configuration guide for read-write splitting in AP scenarios Best practices for read-write splitting Data transfer solutions in OceanBase Database Overview on data migration Best practices for importing data files to OceanBase Database Best practice for migrating data from other databases to OceanBase Database Massive data migration strategy Best practices for migrating data from MyCat to OceanBase Database Best practices for migrating PostgreSQL to OceanBase MySQL-compatible mode ODP routing best practices Best practices for table design and index optimization Best practices for creating indexes on large tables Best practices for database development Best practices for connecting Java applications to OceanBase Database Best practices for integrating Spark Catalog with OceanBase Database Best practices for achieving optimal performance in batch DML using JDBC and OBServer Best practices for bulk data cleanup in OceanBase Database Best practices for PDML processing in OceanBase Database Best practices for hot tables in OceanBase Database Best practices for auto-increment columns and sequences in OceanBase Database Best practices for resource throttling Best practices for data load balancing Best practices for security certification Best practices for access control Best practices for data encryption Best practices for log interpretation in common scenarios Best practices for end-to-end tracing Best practices for using obdiag to collect performance information Best practices for using obdiag to collect diagnostic information of parallel and slow SQL statements Best practices for troubleshooting OceanBase Database performance issues Best practices for handling slow queries Best practices for collecting statistics to generate an efficient execution plan Best practices for updating hotspot rows Best practices for large object storage performance Best practices for semi-structured storage performance Best practices for OceanBase materialized views Best practices for achieving high availability through cross-cloud active-active deployment High availability through primary and standby databases across clouds High host CPU usage Best practices for read/write splitting in OceanBase Cloud
OceanBase logo

The Unified Distributed Database for the AI Era.

Follow Us
Products
OceanBase CloudOceanBase EnterpriseOceanBase Community EditionOceanBase seekdb
Resources
DocsBlogLive DemosTraining & Certification
Company
About OceanBaseTrust CenterLegalPartnerContact Us
Follow Us

© OceanBase 2026. All rights reserved

Cloud Service AgreementPrivacy PolicySecurity
Contact Us
Document Feedback
  1. Documentation Center
  2. OceanBase Best Practices
  3. master
iconOceanBase Best Practices
master
  • master

Best practices for PDML processing in OceanBase Database

Last Updated:2025-07-07 02:17:51  Updated
share
What is on this page
How PDML works
PDML use cases
Enable PDML
Use case: Data archiving
Challenges
Solution
PDML usage limitations
References

folded

share

This topic provides best practices for performing large-scale DML operations (INSERT, UPDATE, and DELETE) in OceanBase Database. It focuses on performance optimization strategies for using PDML and provides recommendations for database configuration. Parallel Data Manipulation Language (PDML) is a high-performance data processing technology provided by OceanBase Database, primarily used to accelerate large-scale INSERT, UPDATE, and DELETE operations. PDML improves data operation throughput and efficiency by decomposing and distributing data manipulation tasks across multiple parallel execution units.

How PDML works

PDML improves performance through the following mechanisms:

  • Task parallelism: Breaks down large DML operations into multiple independent subtasks
  • Data partition processing: For partitioned tables, each partition can be processed in parallel by different execution threads
  • Resource optimization: Makes full use of multi-core CPUs and distributed computing resources

PDML use cases

  • Large-scale data operations (batch processing jobs)
  • Data warehouse ETL processes
  • Archiving or purging historical data
  • Large-scale data initialization loading

Enable PDML

OceanBase Database allows you to enable PDML in the following ways:

  • Specify the degree of parallelism by using SQL hints

    You can add hints to an SQL statement to specify the degree of parallelism.

    DELETE /*+ enable_parallel_dml parallel(4) */ FROM table_name WHERE condition;
    
    UPDATE /*+ enable_parallel_dml parallel(8) */ table_name SET column_name = value WHERE condition;
    
    INSERT /*+ enable_parallel_dml parallel(6) */ INTO target_table SELECT * FROM source_table WHERE condition;
    

    Here, enable_parallel_dml is used to enable PDML, and parallel(N) is used to specify the degree of parallelism.

    Scenarios: This method is suitable for temporarily optimizing the performance of a single SQL statement without affecting other operations. For example, you can use this method to quickly optimize a specific query or data modification operation.

  • Specify the degree of parallelism at the session level

    You can set the degree of parallelism at the session level, which affects all subsequent DML operations in the current session.

    In MySQL-compatible mode:

    SET _force_parallel_query_dop = 3;
    SET _force_parallel_dml_dop = 3;
    
    -- DML statements after this will automatically apply parallel processing.
    DELETE FROM table_name WHERE column_name = 'EXPIRED';
    

    In Oracle-compatible mode:

    ALTER SESSION FORCE PARALLEL QUERY PARALLEL 3;
    ALTER SESSION FORCE PARALLEL DML PARALLEL 3;
    
    -- DML statements after this will automatically apply parallel processing.
    DELETE FROM table_name WHERE column_name = 'EXPIRED';
    
    -- Disable parallel processing at the session level.
    ALTER SESSION DISABLE PARALLEL QUERY;
    

    Scenarios: This method is suitable for batch processing jobs that need to perform multiple parallel DML operations within a single session. For example, during ETL processes, you can perform multiple data conversion and loading operations within the same session.

  • Enable auto DOP

    You can enable auto DOP so that the database automatically adjusts the degree of parallelism based on system resources and workload.

    /* Enable auto DOP at the GLOBAL level */
    SET global parallel_degree_policy = AUTO;
    
    /* Enable auto DOP at the SESSION level */
    SET session parallel_degree_policy = AUTO;
    SET parallel_degree_policy = AUTO;
    
    /* Enable auto DOP at the query level by using a hint */
    SELECT /*+PARALLEL(AUTO)*/ * FROM ...;
    

    Scenarios: This method is suitable for scenarios where the database automatically optimizes the degree of parallelism without manual specification. For example, when resources are sufficient, the database can automatically increase the degree of parallelism to speed up queries.

  • Specify the degree of parallelism at the table level

    You can specify the default degree of parallelism when you create or modify a table.

    CREATE TABLE sales_history (
    id INT PRIMARY KEY,
    sale_date DATE,
    amount DECIMAL(10,2)
    ) parallel 8;
    
    -- Modify an existing table.
    ALTER TABLE customer_data parallel 6;
    

    When you perform DML operations on a table with a specified degree of parallelism and do not explicitly disable parallel processing, OceanBase Database will automatically execute those operations in parallel (PDML).

    Scenarios: This method is suitable for large tables that frequently require batch processing, as it can significantly improve DML performance over the long term. For example, for a large log table that often requires data cleansing and transformation, you can set the table-level degree of parallelism to enhance processing efficiency.

The parallelism modes are prioritized in the following order:

  • Hint-based parallelism at the table level
  • Hint-based parallelism at the global level
  • Session-level parallelism
  • Table-level parallelism

Use case: Data archiving

A platform generates a large amount of order data every day. As time goes by, the volume of historical order data increases, which leads to a decline in query performance. To improve query efficiency and reduce storage costs, the platform decides to archive order data older than one year into a history table.

Challenges

  • Large data volume: Millions of orders need to be archived daily.
  • Limited time window: The archiving operation needs to be completed during off-peak hours, leaving a very narrow window.
  • Minimal business impact: The archiving process must not significantly affect online operations.

Solution

Use the PDML feature of OceanBase Database to execute data archiving in parallel.

  1. Create a historical table: Create a historical table named order_history with the same schema as the orders table to store the archived data.

    CREATE TABLE order_history LIKE orders;
    
  2. Enable PDML: Set table-level parallelism to enable PDML for the orders table.

    ALTER TABLE orders parallel 8;
    
  3. Perform data archiving: Use the INSERT INTO ... SELECT statement to migrate historical data from the orders table to the history table, and use the DELETE statement to delete the historical data from the orders table.

    INSERT /*+ enable_parallel_dml parallel(8) */ INTO order_history
    SELECT * FROM orders WHERE order_date < DATE_SUB(CURDATE(), INTERVAL 1 YEAR);
    
    DELETE /*+ enable_parallel_dml parallel(8) */ FROM orders
    WHERE order_date < DATE_SUB(CURDATE(), INTERVAL 1 YEAR);
    
  4. Monitor the archiving progress: Use the monitoring tool of OceanBase Database to track PDML execution progress and resource usage.

PDML usage limitations

V4.3.5
V4.2.5

Use batch optimization

  • Scenario: Use batch DML (such as multi-queries or array binding)
  • Cause: Enabling PDML and batch DML simultaneously can lead to lock conflicts and transaction isolation issues.

Unsupported DML statement types

  • Scenario: Only INSERT, DELETE, UPDATE, and MERGE operations are supported.

INSERT FROM SELECT

  • Scenario: The values to be inserted are obtained from the SELECT clause.

Auto-increment columns for primary keys or partitioning keys

  • Scenario: The target table contains an auto-increment column for a primary key or a partitioning key.
  • Cause: PDML cannot guarantee monotonicity of auto-increment columns, so it does not support this feature.

IGNORE keyword

  • Scenario: DML statements that contain the IGNORE keyword (such as INSERT IGNORE or DELETE IGNORE)
  • Cause: The IGNORE keyword ignores errors such as unique key conflicts, but parallel data processing in PDML can cause data consistency issues.

Multi-table operations

  • Scenario: DML operations involving multiple tables, such as JOIN or subquery updates
  • Cause: PDML only supports single-table operations. Multi-table operations require distributed transaction coordination, which is complex.

ON DUPLICATE KEY UPDATE

  • Scenario: The INSERT statement contains ON DUPLICATE KEY UPDATE (that is, INSERT_UP)
  • Condition: The target table has a unique index or global index, or the primary key or unique key needs to be updated
  • Cause: Unique key conflict detection and updates require atomicity. Parallel execution in PDML may lead to inconsistent conflict detection.

Explicitly disable PDML for DML statements

  • Scenario: DML statements are marked for disabling PDML through optimizer rules or conversion logic.
  • Cause: Some optimization rules, such as triggers or foreign key constraints, rely on serial execution logic.

Cross-database operations

  • Scenario: DML operations involve DBLinks across databases.
  • Cause: Cross-database transactions require coordination with external resources, and parallel execution of PDML may introduce consistency risks.

Nested SQL

  • Scenario: DML statements contain foreign key constraints or triggers
  • Cause: The execution of foreign key constraints and triggers depends on the relationships between tables. Parallel execution of PDML may lead to data consistency issues.

Partition-specific operations

  • Scenario: A DELETE statement specifies particular partitions by using the PARTITION clause, such as DELETE FROM t1 PARTITION(p0)
  • Cause: The parallel execution after partition pruning may conflict with the partition scheduling strategy of PDML

Non-online DDL operations

  • Scenarios:
  • DML operations performed during non-online DDL operations (such as index creation or table schema changes)
  • UPDATE statements that specify partitions
  • Modifying the ON UPDATE CURRENT_TIMESTAMP column
  • Cause: PDML may interfere with internal state synchronization and cause timestamp inconsistencies

Variable assignment

  • Scenario: Statements contain variable assignment operations, especially when these assignments appear in subqueries or nested statements.
  • Cause: Parallel execution in PDML requires statements to be deterministic (without side effects). Variable assignments may introduce runtime state dependencies (such as cross-shard variable modifications), which can compromise the atomicity and consistency of parallel execution.

Orthogonal scenarios

  • Scenarios:
  • Foreign key constraints
  • Triggers
  • SQL statements containing PL/UDFs

HINT disabled

  • Scenario: PDML is disabled by using hints

Use batch optimization

  • Scenario: Use batch DML (such as multi-queries or array binding)
  • Cause: Enabling PDML and batch DML simultaneously can result in lock conflicts and transaction isolation issues.

Unsupported DML statement types

  • Scenario: Only INSERT,DELETE, UPDATE, and MERGE operations are supported.

INSERT FROM SELECT

  • Scenario: The values to be inserted are obtained from the SELECT clause.

Auto-increment columns for primary keys or partitioning keys

  • Scenario: The target table contains an auto-increment column for a primary key or a partitioning key.
  • Cause: PDML cannot guarantee the monotonicity of auto-increment columns, so it does not support this feature.

IGNORE keyword

  • Scenario: DML statements that contain the IGNORE keyword, such as INSERT IGNORE or DELETE IGNORE.
  • Cause: The IGNORE keyword ignores errors such as unique key conflicts, but parallel execution of PDML statements can lead to data consistency issues.

Multi-table operations

  • Scenario: DML operations involving multiple tables (such as JOIN or subquery updates)
  • Cause: PDML only supports single-table operations. Multi-table operations require distributed transaction coordination, which is complex.

ON DUPLICATE KEY UPDATE

  • Scenario: The INSERT statement contains ON DUPLICATE KEY UPDATE (i.e., INSERT_UP)
  • Cause: Unique key conflict detection and updates require atomicity, and parallel execution of PDML may lead to inconsistent conflict detection.

Explicitly disable PDML for DML statements

  • Scenario: DML statements are marked for disabling PDML through optimizer rules or conversion logic.
  • Cause: Certain optimization rules, such as triggers or foreign key constraints, rely on serial execution logic.

Cross-database operations

  • Scenario: DML operations involve DBLinks across databases
  • Cause: Cross-database transactions require coordination with external resources, and parallel execution of PDML may introduce consistency risks

Nested SQL statements

  • Scenario: DML statements contain foreign key constraints or triggers.
  • Cause: The execution of foreign key constraints and triggers depends on the relationships between tables. Parallel execution of PDML may lead to data consistency issues.

Specify partitions

  • Scenario: A DELETE statement specifies specific partitions by using the PARTITION clause (such as DELETE FROM t1 PARTITION(p0))
  • Cause: The parallel execution after partition pruning may conflict with the partition scheduling strategy of PDML

Non-online DDL operations

  • Scenarios:
  • DML operations during non-online DDL operations (such as index creation or table schema changes)
  • UPDATE statements that specify partitions
  • Modifying the ON UPDATE CURRENT_TIMESTAMP column
  • Reasons: PDML may interfere with internal state synchronization and cause timestamp inconsistencies.

Variable assignment

  • Scenario: The operation involves variable assignment, and the variable assignment occurs within a subquery or nested statement.
  • Cause: Parallel execution of PDML requires statements to be deterministic (with no side effects). Variable assignment can introduce runtime state dependencies (such as variable modifications across shards), which can compromise the atomicity and consistency of parallel execution.

Orthogonal scenarios

  • Scenario:
  • Foreign key constraints
  • Triggers
  • SQL statements contain PL UDFs

HINT disabled

  • Scenario: Disable PDML by using hints

References

  • Set parallel execution parameters
  • Perform parameter tuning for PX
  • Set the degree of parallelism for parallel execution
  • Enable and disable parallel queries
  • Parallel execution methods and their priorities
  • Classification and optimization of parallel execution
  • Best practices for optimizing batch DML performance
  • Best practices for creating indexes on large tables
  • Best practices for handling slow queries
  • Diagnose parallel execution issues

Previous topic

Best practices for bulk data cleanup in OceanBase Database
Last

Next topic

Best practices for hot tables in OceanBase Database
Next
What is on this page
How PDML works
PDML use cases
Enable PDML
Use case: Data archiving
Challenges
Solution
PDML usage limitations
References