OceanBase logo

OceanBase

A unified distributed database ready for your transactional, analytical, and AI workloads.

DEPLOY YOUR WAY

OceanBase Cloud

The best way to deploy and scale OceanBase

OceanBase Enterprise

Run and manage OceanBase on your infra

TRY OPEN SOURCE

OceanBase Community Edition

The free, open-source distributed database

OceanBase seekdb

Open source AI native search database

Customer Stories

Real-world success stories from enterprises across diverse industries.

View All
BY USE CASES

Mission-Critical Transactions

Global & Multicloud Application

Elastic Scaling for Peak Traffic

Real-time Analytics

Active Geo-redundancy

Database Consolidation

Resources

Comprehensive knowledge hub for OceanBase.

Blog

Live Demos

Training & Certification

Documentation

Official technical guides, tutorials, API references, and manuals for all OceanBase products.

View All
PRODUCTS

OceanBase Cloud

OceanBase Database

Tools

Connectors and Middleware

QUICK START

OceanBase Cloud

OceanBase Database

BEST PRACTICES

Practical guides for utilizing OceanBase more effectively and conveniently

Company

Learn more about OceanBase – our company, partnerships, and trust and security initiatives.

About OceanBase

Partner

Trust Center

Contact Us

International - English
中国站 - 简体中文
日本 - 日本語
Sign In
Start on Cloud

A unified distributed database ready for your transactional, analytical, and AI workloads.

DEPLOY YOUR WAY

OceanBase Cloud

The best way to deploy and scale OceanBase

OceanBase Enterprise

Run and manage OceanBase on your infra

TRY OPEN SOURCE

OceanBase Community Edition

The free, open-source distributed database

OceanBase seekdb

Open source AI native search database

Customer Stories

Real-world success stories from enterprises across diverse industries.

View All
BY USE CASES

Mission-Critical Transactions

Global & Multicloud Application

Elastic Scaling for Peak Traffic

Real-time Analytics

Active Geo-redundancy

Database Consolidation

Comprehensive knowledge hub for OceanBase.

Blog

Live Demos

Training & Certification

Documentation

Official technical guides, tutorials, API references, and manuals for all OceanBase products.

View All
PRODUCTS
OceanBase CloudOceanBase Database
ToolsConnectors and Middleware
QUICK START
OceanBase CloudOceanBase Database
BEST PRACTICES

Practical guides for utilizing OceanBase more effectively and conveniently

Learn more about OceanBase – our company, partnerships, and trust and security initiatives.

About OceanBase

Partner

Trust Center

Contact Us

Start on Cloud
编组
All Products
    • Databases
    • iconOceanBase Database
    • iconOceanBase Cloud
    • iconOceanBase Tugraph
    • iconInteractive Tutorials
    • iconOceanBase Best Practices
    • Tools
    • iconOceanBase Cloud Platform
    • iconOceanBase Migration Service
    • iconOceanBase Developer Center
    • iconOceanBase Migration Assessment
    • iconOceanBase Admin Tool
    • iconOceanBase Loader and Dumper
    • iconOceanBase Deployer
    • iconKubernetes operator for OceanBase
    • iconOceanBase Diagnostic Tool
    • iconOceanBase Binlog Service
    • Connectors and Middleware
    • iconOceanBase Database Proxy
    • iconEmbedded SQL in C for OceanBase
    • iconOceanBase Call Interface
    • iconOceanBase Connector/C
    • iconOceanBase Connector/J
    • iconOceanBase Connector/ODBC
    • iconOceanBase Connector/NET
icon

OceanBase Binlog Service

V4.0.1

  • Overview
  • Compatibility
  • Deployment guide
  • Parameters
  • Management Command
    • Overview
    • Node management
    • Task management
    • Instance management
    • Session management
    • File management
  • Install obcdc
  • Monitoring and alerting
  • Troubleshooting
  • Release Notes
    • Release notes
    • obbinlog Community Edition V4.0.1

Download PDF

Overview Compatibility Deployment guide Parameters Overview Node management Task management Instance management Session management File management Install obcdc Monitoring and alerting Troubleshooting Release notes obbinlog Community Edition V4.0.1
OceanBase logo

The Unified Distributed Database for the AI Era.

Follow Us
Products
OceanBase CloudOceanBase EnterpriseOceanBase Community EditionOceanBase seekdb
Resources
DocsBlogLive DemosTraining & Certification
Company
About OceanBaseTrust CenterLegalPartnerContact Us
Follow Us

© OceanBase 2026. All rights reserved

Cloud Service AgreementPrivacy PolicySecurity
Contact Us
Document Feedback
  1. Documentation Center
  2. OceanBase Binlog Service
  3. V4.0.1
iconOceanBase Binlog Service
V 4.0.1
  • V 4.2.5
  • V 4.2.0

Monitoring and alerting

Last Updated:2025-03-21 09:41:05  Updated
share
What is on this page
Host status
Monitoring metrics
Alert rules
OBM status
Monitoring metrics
Alert rules
OBI instance status
Monitoring metrics
Alert rules
OBI instance performance
Monitoring metrics
Alert rules

folded

share

The binlog service integrates Prometheus metrics, which are exposed on port 2984 by default. You can visit http://{IP address of the binlog server}:2984/metrics to query the current metrics, and use Prometheus Alertmanager to monitor and alert on specific metrics.

Host status

Monitoring metrics

Metric Description Tag
binlog_cpu_count The number of CPU cores.
  • host_name: the unique ID of the node.
  • ip: the IP address of the node.
  • binlog_cpu_used_ratio The CPU utilization.
    binlog_disk_total_size_mb The total size of the disk space, in MB.
    binlog_disk_used_ratio The usage of the disk space.
    binlog_mem_total_size_mb The total size of the memory space, in MB.
    binlog_mem_used_ratio The memory usage.
    binlog_mem_used_size_mb The size of the memory space that has been occupied, in MB.
    binlog_network_rx_bytes The number of data bytes received per second.
    binlog_network_wx_bytes The number of data bytes sent per second.
    binlog_load1 The average number of running processes within 1 minute.
    binlog_load5 The average number of running processes within 5 minutes.
    binlog_load15 The average number of running processes within 15 minutes.

    Alert rules

    • Alert related to CPU utilization

      - alert: HighCpuUsage
        expr: binlog_cpu_used_ratio > 0.9
        for: 10s
        labels:
          severity: critical
        annotations:
          summary: "CPU utilization exceeding the threshold"
          description: "The CPU utilization on the {{ $labels.ip }} node exceeds 90%."
      
    • Alert related to memory usage

      - alert: HighMemUsage
        expr: binlog_mem_used_ratio > 0.8
        for: 10s
        labels:
          severity: critical
        annotations:
          summary: "Memory usage exceeding the threshold"
          description: "The memory usage on the {{ $labels.ip }} node exceeds 80%."
      
    • Alert related to the average memory usage in a cluster

      - alert: HighMemUsage
        expr: avg(binlog_mem_used_ratio) > 0.65
        for: 10s
        labels:
          severity: critical
        annotations:
          summary: "Memory usage exceeding the threshold"
          description: "The average memory usage in the cluster exceeds 65%."
      
    • Alert related to the server load

      - alert: HighLoad1
        expr: binlog_load1 > 16
        for: 10s
        labels:
          severity: critical
        annotations:
          summary: "Server load exceeding the threshold"
          description: "The server load on the {{ $labels.ip }} node exceeds 16."
      
    • Alert related to disk usage

      - alert: HighDiskUsage
        expr: binlog_disk_used_ratio > 0.8
        for: 10s
        labels:
          severity: critical
        annotations:
          summary: "Disk usage exceeding the threshold"
          description: "The disk usage on the {{ $labels.ip }} node exceeds 80%."
      

    OBM status

    Monitoring metrics

    Metric Description Tag
    binlog_instance_num The number of binlog instances. host_name: the unique ID of the node.
    binlog_manager_down_count The number of times that the OBM process fails.
    binlog_create The number of binlog tasks created.
  • ob_cluster_name: the name of the cluster.
  • tenant_name: the name of the tenant.
  • binlog_release The number of binlog tasks released.

    Alert rules

    • Alert related to binlog task creation

      rules:
        - alert: BinlogCreateAlert
          expr: increase(binlog_create[1m]) > 0
          for: 1m
          labels:
            severity: info
          annotations:
            summary: "Binlog service enabled"
            description: "The binlog service is enabled for the {{ $labels.ob_cluster_name }}.{{ $labels.tenant_name }} tenant."
      
    • Alert related to binlog task release

      rules:
      - alert: BinlogReleaseAlert
        expr: increase(binlog_release[1m]) > 0
        for: 1m
        labels:
          severity: info
        annotations:
          summary: "Binlog service disabled"
          description: "The binlog service is disabled for the {{ $labels.ob_cluster_name }}.{{ $labels.tenant_name }} tenant."
      
    • Alert related to OBM process failures

      - alert: OBMDownAlert
        expr: increase(binlog_manager_down_count[1m]) > 0
        for: 10s
        labels:
          severity: critical
        annotations:
          summary: "OBM process failed"
          description: "The OBM process on the {{ $labels.host_name }} node fails."
      

    OBI instance status

    Monitoring metrics

    Metric Description Tag
    binlog_allocate_node_fail_count The number of failures to allocate a service node.
  • instance_id: the ID of the binlog instance.
  • ob_cluster_name: the name of the cluster.
  • tenant_name: the name of the tenant.
  • binlog_instance_gtid_inconsistent_count The number of times that OBI instances have inconsistent global transaction identifiers (GTIDs).
    binlog_instance_master_switch_count The number of times to switch the primary OBI instance.
  • host_name: the unique ID of the node.
  • instance_id: the ID of the binlog instance.
  • ob_cluster_name: the name of the cluster.
  • tenant_name: the name of the tenant.
  • binlog_instance_master_switch_failed_count The number of failures to switch the primary OBI instance.
    binlog_instance_no_master_count The number of times that no primary OBI instance is available.
    binlog_instance_down The number of OBI instance failures.
    binlog_instance_failover_fail_count The number of failures to automatically start an OBI instance after a failover.

    Alert rules

    • Alert related to service node allocation failures

      - alert: BinlogAllocateFailedAlert
        expr: increase(binlog_allocate_node_fail_count[1m]) > 0
        for: 10s
        labels:
          severity: critical
        annotations:
          summary: "Node allocation failure"
          description: "Failed to allocate a binlog service node to the {{ $labels.tenant_name }} tenant in the {{ $labels.ob_cluster_name }} cluster. Resolve the issue immediately."
      
    • Alert related to inconsistent GTIDs of OBI instances

      - alert: GtidInconsistentFailedAlert
        expr: increase(binlog_instance_gtid_inconsistent_count[1m]) > 0
        for: 10s
        labels:
          severity: critical
        annotations:
          summary: "Inconsistent GTIDs detected during inspection"
          description: "The GTIDs of OBI instances in the {{ $labels.tenant_name }} tenant of the {{ $labels.ob_cluster_name }} cluster are inconsistent."
      
    • Alert related to primary OBI instance switching

      rules:
      - alert: MasterSwitchAlert
        expr: increase(binlog_instance_master_switch_count[1m]) > 0 
        for: 1m
        labels:
          severity: info
        annotations:
          summary: "Primary OBI instance switching"
          description: "A primary OBI instance switching event occurred in the {{ $labels.tenant_name }} tenant of the {{ $labels.ob_cluster_name }} cluster."
      
    • Alert related to frequent primary OBI instance switching

      rules:
      - alert: MasterSwitchAlert
        expr: increase(binlog_instance_master_switch_count[1m]) > 2
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "Frequent primary OBI instance switching"
          description: "Primary OBI instance switching frequently occurred in the {{ $labels.tenant_name }} tenant of the {{ $labels.ob_cluster_name }} cluster."
      
    • Alert related to primary OBI instance switching failures

      - alert: MasterSwitchFailedAlert
        expr: increase(binlog_instance_master_switch_failed_count[1m]) > 0
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "Primary OBI instance switching failure"
          description: "A primary OBI instance switching failure occurred in the {{ $labels.tenant_name }} tenant of the {{ $labels.ob_cluster_name }} cluster. Resolve the issue immediately."
      
    • Alert related to the absence of a primary OBI instance

      - alert: NoMasterAlert
        expr: increase(binlog_instance_no_master_count[1m]) > 0
        for: 10s
        labels:
          severity: critical
        annotations:
          summary: "Absence of a primary OBI instance"
          description: "No primary OBI instance exists in the {{ $labels.tenant_name }} tenant of the {{ $labels.ob_cluster_name }} cluster. Resolve the issue immediately."
      
    • Alert related to OBI instance failures

      - alert: InstanceDownAlert
        expr: changes(binlog_instance_down[15m]) > 0 or (binlog_instance_convert_delay==0)
        for: 10s
        labels:
          severity: critical
        annotations:
          summary: "OBI instance failure"
          description: "The {{ $labels.instance_id }} instance failed."
      
    • Alert related to a failure to automatically start an OBI instance after a failover

      - alert: FailoverFailedAlert
        expr: increase(binlog_instance_failover_fail_count[1m]) > 0
        for: 10s
        labels:
          severity: critical
        annotations:
          summary: "Automatic instance startup failure after a failover"
          description: "Failed to automatically start the { {$labels.instance_id} } instance in the {{ $labels.tenant_name }} tenant of the {{ $labels.ob_cluster_name }} cluster after a failover. Resolve the issue immediately."
      

    OBI instance performance

    Monitoring metrics

    Metric Description Tag
    binlog_instance_convert_checkpoint The security checkpoint for binlog conversion by the OBI instance, in microseconds.
  • host_name: the unique ID of the node.
  • instance_id: the ID of the binlog instance.
  • ob_cluster_name: the name of the cluster.
  • tenant_name: the name of the tenant.
  • binlog_instance_convert_delay The delay in binlog conversion by the OBI instance, in milliseconds.
    binlog_instance_convert_fetch_rps The RPS in pulling clogs by the OBI instance.
    binlog_instance_convert_iops The IOPS in binlog conversion by the OBI instance, in bytes.
    binlog_instance_convert_storage_rps The RPS in storing binlogs to the disk by the OBI instance.
    binlog_instance_dump_count The number of subscriptions to the OBI instance.
    binlog_instance_dump_error_count The number of exceptions in subscribing to the OBI instance.
    binlog_instance_dump_checkpoint The security checkpoint in the subscription connection, in microseconds.
  • host_name: the unique ID of the node.
  • instance_id: the ID of the binlog instance.
  • ob_cluster_name: the name of the cluster.
  • tenant_name: the name of the tenant.
  • trace_id: the trace ID of the connection.
  • binlog_instance_dump_rps The RPS in the subscription connection.
    binlog_instance_dump_delay The subscription delay in the subscription connection, in seconds.
    binlog_instance_dump_heartbeat_rps The heartbeat RPS in the subscription connection.
    binlog_instance_dump_iops The heartbeat IOPS in the subscription connection, in bytes.

    Alert rules

    • Alert related to the binlog conversion delay

      - alert: ConversionDelayAlert
        expr: |
          (binlog_instance_convert_delay > 120000) and
          (binlog_instance_convert_fetch_rps < 5) and
          (binlog_instance_convert_storage_rps < 5) or
          (time() - binlog_instance_convert_checkpoint - binlog_instance_convert_delay) > 120000
        for: 10s
        labels:
          severity: critical
        annotations:
          summary: "Conversion delay exceeding the threshold"
          description: "The binlog conversion by the { {$labels.instance_id} } instance in the {{ $labels.tenant_name }} tenant of the {{ $labels.ob_cluster_name }} cluster is delayed by {{ $binlog_instance_convert_delay }}s."
      
    • Alert related to the number of subscriptions

      - alert: InstanceDumpAlert
        expr: binlog_instance_dump_count > 100
        for: 1m
        labels:
          severity: warning
        annotations:
          summary: "Binlog instance subscriptions exceeding the threshold"
          description: "The number of subscriptions to the {{ $labels.instance_id }} instance exceeds the threshold {{$binlog_instance_dump_count}}."
      
      - alert: InstanceDumpResolved
        expr: binlog_instance_dump_count <= 100
        for: 1m
        labels:
          severity: normal
        annotations:
          summary: "Subscription threshold exceeding alert cleared"
          description: "The number of subscriptions to the{{ $labels.instance_id }} instance has fallen below 100."
      

    Previous topic

    Install obcdc
    Last

    Next topic

    Troubleshooting
    Next
    What is on this page
    Host status
    Monitoring metrics
    Alert rules
    OBM status
    Monitoring metrics
    Alert rules
    OBI instance status
    Monitoring metrics
    Alert rules
    OBI instance performance
    Monitoring metrics
    Alert rules