Description
This alert is triggered when the partitioning daemon of MetaDB and MonitorDB fail to create a partition.
Principle
The partitioning daemon is responsible for managing partitions of the corresponding tables. It runs a partitioning task once every hour. If it fails to create a partition, the alert is triggered.
Alert rule
| Metric | Default threshold | Duration | Detection cycle | Time before clearance |
|---|---|---|---|---|
| None | None | 0 seconds | 1 hour | 1 hour |
Alert information
| Trigger method | Alert level | Scope |
|---|---|---|
| Timed task of OCP | Warning | OCP |
Alert templates
Overview: ${alarm_target} ${alarm_name}
Details: ${alarm_target} ${alarm_name}. The creation of the (${partition_name}) partition has failed.
Overview example: datasource=Monitordb:table_name=ob_hist_sql_audit_stat Partition creation failed {datasource=Monitordb:table_name=ob_hist_sql_audit_stat}.
Details example: datasource=Monitordb:table_name=ob_hist_sql_audit_stat. Failed to create the (P20210902) partition.
Impact on the system
The partition creation failure may make the system unable to save data to the partition that failed to be created, leading to serious data loss.
Possible causes
The total number of partitions in the database exceeds the limit.
The partition value of an existing partition is greater than that of the new partition to be created.
Suggested solutions
Check whether the server where the MetaDB resides has sufficient disk space.
Log on to the MetaDB server and run the
df -hcommand to view the usage of the /data/1 directory.If the usage is greater 95%, run the
rm -rf filenamecommand to delete unwanted files in this directory.If the usage is normal, proceed to the next step.
Check whether the partition value of an existing partition is greater than that of the new partition.
This problem is not likely to occur if all partitions of the table are created by the daemon process.
This problem may occur if some partitions are manually created by using the CLI tool. To solve this problem, perform the following steps:
Run the following command to view the partition values of the table and check whether it has any manually created partitions:
obclient>show create table ob_hist_sql_audit_stat; CREATE TABLE `ob_hist_sql_audit_stat` ( ... ... partition by range columns(`end_interval_time`) (partition DUMMY values less than (0), partition P202106 values less than ( 1625097600000000 ), partition P202107 values less than ( 1627776000000000 ), partition P202108 values less than ( 1630454400000000 ), partition P202109 values less than ( 1633046400000000 ))In the returned message, numbers enclosed by parentheses after the phrase "less than" are partition values. Adjacent partitions created by the daemon process have fixed differences in partition values, because these partitions have the same sizes. An abnormal difference in the partition values of adjacent partitions often indicates the existence of a manually created partition. In this case, proceed to the next step.
Run the following command to manually create a partition that was failed to be created by the daemon process. Note that you need to set the partition value toa number less than or equal to the original partition value of the partition that the daemon process failed to create.
ALTER TABLE metric_table_space_view ADD PARTITION (PARTITION P20210902 VALUES LESS THAN ( xxxxxxxxxxxxxx ));P20210902 is the partition name, which can be viewed in alert details. xxxxxxxxxxxxxx is the partition value.
After the modification takes effect, manually clear the alert in the OCP console.
The manual creation of the partition ensures that the partition value of this partition is not greater than that of the next partition. This alert will not be triggered when the daemon process creates a partition the next time.
Check whether the number of partitions exceeds the limit.
Run the following command to view the number of partitions in the MetaDB.
select count(*) from oceanbase.gv$partition where table_id in (select table_id from oceanbase.gv$table where tenant_name = 'ocp_meta24' and database_name = 'ocp_monitor');If the returned value exceeds 3000, it indicates that the number of partitions is too large. To solve the problem, perform the following steps:
Run the following command to drop tables or partitions that are no longer in use.
# Drop a table. DROP TABLE table_name; # Drop a partition. In this example, P180625 is the name of the partition to be dropped. alter table metric_table_space_perm drop partition( P180625 );If the MetaDB is created in an OceanBase cluster of an earlier version, you can release partition resources by performing two rounds of daily compactions.
For more information about triggering the compaction on the CLI, see Major compaction triggering methods.