Quick start

2025-01-26 08:21:59  Updated

Parallel execution is a deep and complex subject. You need to have a proper understanding of the technical mechanism of parallel execution to make full use of its capabilities. In OceanBase Database V3.1 and later, a quick start guide is provided for beginners to quickly understand the essentials of parallel execution. The parameter values provided in this topic are not necessarily the best choices but can effectively avoid most unfavorable conditions, helping beginners easily use the parallel execution feature.

Initialize the environment

Complete the following settings in an online analytical processing (OLAP) tenant:

/* MySQL */
SET GLOBAL parallel_servers_target = MIN_CPU * 20;

/* Oracle */
ALTER SYSTEM SET parallel_servers_target = MIN_CPU * 20;

Collect statistics

In OceanBase Database V3.x, statistics collection is bound with major compactions. Therefore, after you import data, you must initiate a major compaction before you collect statistics.

In OceanBase Database V4.x, after you import data, you can directly call the database management system (DBMS) package for statistics collection to collect statistics.

For more information about statistics, see Overview of statistics.

Set a hint

Make sure that the maximum degree of parallelism (DOP) of an SQL statement does not exceed 1.5 times the number of physical CPU cores. Generally, if multiple SQL statements will not be executed in parallel, you can set the DOP of a single SQL statement to the number of CPU cores. For example, if the system has 32 physical CPU cores, you can set the hint as /*+ PARALLEL(32) */.

Perform performance tuning

  1. Run the top -H command to view the CPU utilization of the current tenant.

  2. If the performance of a single SQL statement is not as expected, query the sql_plan_monitor view for the performance report.

    -- `open_dt` indicates the interval from when the operator is opened to when it is closed.
    -- `row_dt` indicates the interval from when the operator generates the first row to when it generates OB_ITER_END.
    -- If the value of `row_dt` is NULL, the operator has not generated the first row or OB_ITER_END.
    
    -- If the SQL statement is considered a slow statement, add the hint /*+ monitor */ to the statement.
    -- Execute the slow SQL statement added with the hint. Replace `Yxxxxxxxxx` in the following statement with the obtained trace ID and then execute the following statement:
    
    -- MySQL tenant:
    -- Summary:
    SELECT op_id, op, rows, rescan, threads, (close_time - open_time) open_dt, (last_row_eof_time-first_row_time) row_dt,  open_time, close_time,  first_row_time, last_row_eof_time FROM
    (
    select plan_line_id op_id, concat(lpad('', plan_depth, ' '), plan_operation) op, sum(output_rows) rows, sum(STARTS) rescan, min(first_refresh_time) open_time, max(last_refresh_time) close_time, min(first_change_time) first_row_time, max(last_change_time) last_row_eof_time, count(1) threads from oceanbase.gv$sql_plan_monitor where trace_id = 'Yxxxxxxxxx' group by  plan_line_id, plan_operation order by plan_line_id
    ) a;
    
    -- Details
    SELECT op_id, thread, op, rows, rescan, (close_time - open_time) open_dt, (last_row_eof_time-first_row_time) row_dt, open_time, close_time, first_row_time, last_row_eof_time  FROM
    (
    select plan_line_id op_id, PROCESS_NAME thread, concat(lpad('', plan_depth, ' '), plan_operation) op, output_rows rows, STARTS rescan, first_refresh_time open_time, last_refresh_time close_time, first_change_time first_row_time, last_change_time last_row_eof_time from oceanbase.gv$sql_plan_monitor where trace_id = 'Yxxxxxxxxx' order by plan_line_id, PROCESS_NAME
    ) a;
    
    -- Oracle tenant:
    -- Note: `open_dt` indicates the interval from when the operator is opened to when the operator is closed.
    -- Note: `row_dt` indicates the interval from when the operator generates the first row to when the operator generates OB_ITER_END.
    -- Note: If the value of `row_dt` is NULL, the operator has not generated the first row or OB_ITER_END.
    
    -- Summary
    SELECT op_id, op, output_rows, rescan,threads ,(close_time - open_time) open_dt, (last_row_eof_time-first_row_time) row_dt,  open_time, close_time,  first_row_time, last_row_eof_time FROM
    (
    select plan_line_id op_id, concat(lpad(' ', max(plan_depth), ' '), plan_operation) op, sum(output_rows) output_rows, sum(STARTS) rescan, min(first_refresh_time) open_time, max(last_refresh_time) close_time, min(first_change_time) first_row_time, max(last_change_time) last_row_eof_time, count(1) threads from sys.gv$sql_plan_monitor where trace_id = 'Yxxxxxxxxx' group by  plan_line_id, plan_operation,plan_depth order by plan_line_id
    ) a;
    
    -- Details
    SELECT op_id, thread, op, output_rows, rescan, (close_time - open_time) open_dt, (last_row_eof_time-first_row_time) row_dt, open_time, close_time, first_row_time, last_row_eof_time FROM
    (
    select plan_line_id op_id, PROCESS_NAME thread, concat(lpad(' ', plan_depth, ' '), plan_operation) op, output_rows, STARTS rescan, first_refresh_time open_time, last_refresh_time close_time, first_change_time first_row_time, last_change_time last_row_eof_time  from sys.gv$sql_plan_monitor where trace_id = 'Yxxxxxxxxx' order by plan_line_id, process_name
    ) a;
    

FAQ

  1. What do I do if the query performance is not as expected while the CPU resources are not fully used?

    Execute the show variables like 'parallel_servers_target statement and check that the value of parallel_servers_target is not less than MIN_CPU × 20.

  2. What do I do if the PDML performance is not as expected?

    Execute the explain extended statement to verify whether PDML is used. If PDML is not used, the Note field at the bottom of the plan describes the reason. Generally, if the target table contains triggers, foreign keys, or local unique indexes, PDML will not be used.

    Keywords such as DISTRIBUTED INSERT, DISTRIBUTED UPDATE, and DISTRIBUTED DELETE indicate that PDML is not used.

  3. What do I do when the error -4138 OB_SNAPSHOT_DISCARDED is returned upon a PDML timeout?

    Set the undo_retention parameter to a value that is not less than the maximum execution time of a PDML statement. The default value of undo_retention is 30 minutes. If the execution time of a PDML statement exceeds 30 minutes, this error may be returned and the statement will be aborted and retried until it times out.

    This issue never occurs in OceanBase Database V4.1 and later. Therefore, you do not need to set the undo_retention parameter in OceanBase Database V4.1 and later.

  4. How do I enable parallel execution for business SQL statements without making any modifications to the business?

    OceanBase Database Proxy (ODP) provides a web UI for you to modify connection configurations to enable parallel execution. For example, you can set the DOP of all SQL statements in a read/write splitting connection to 2.

Contact Us