Data flow is one of the most critical aspects in analytical business scenarios. This topic explains how to export data from OceanBase Database to a local file system or object storage service. For different data sources, this topic provides various data export solutions.
obdumper
obdumper is a Java-based client tool designed for OceanBase Database. It allows you to export database objects and table data to a local file system, HDFS, or object storage services (such as Alibaba Cloud Object Storage Service (OSS), Amazon Simple Storage Service (S3), Tencent Cloud Object Storage (COS), Huawei Cloud Object Storage Service (OBS), or Google Cloud Storage (GCS). You can use obdumper commands to export data in CSV, SQL, Apache ORC, and Apache Parquet formats.
Key features:
obdumper has the following features:
- Allows you to specify partition names to export only data in specified table partitions.
- Allows you to specify global filter conditions to export only data that meets the conditions.
- Allows you to specify custom query statements to export only the result sets of the queries.
- Allows you to specify a specific System Change Number (SCN) or timestamp to export historical snapshot data at the corresponding transaction point or time point.
Scenarios:
- Suitable for efficiently exporting large-scale data, such as in backup operations for analytical and compute clusters. It is also useful for preprocessing data, such as desensitization or encryption.
References:
- For more information, see What is obdumper?.
OUTFILE
You can use the SELECT INTO OUTFILE statement to export the query result to a file. The SELECT INTO OUTFILE statement allows you to specify the fields to be exported, making it suitable for scenarios where you do not need to export the primary key fields.
Key features:
- Allows you to export fields in the following formats: TXT, SQL, CSV, fixed-length, XML, and JSON.
- Allows you to specify the fields to be exported, with particular support for excluding primary key fields.
Scenarios:
- Suitable for exporting small-scale data, such as simple backup and migration tasks.
References:
- For more information, see Export data by using the OUTFILE statement.
DataX
Key features:
- Efficient data transmission
- Supports multiple types of data sources
- Flexible configurations
Scenarios:
- DataX is suitable for data migration tasks ranging from several GBs to several TBs, or even larger.
References:
- For more information, see DataX documentation.
Export solution selection
| Parameter | obdumper | OUTFILE | DataX |
|---|---|---|---|
| Recommended scenarios | Logical backup and data processing with high compression ratio | Simple data export | Strong data source support. Data sources can be running databases or message queues. Suitable for offline database migration. |
| Data sources | Static files | Static files | Structured and unstructured data sources |
| Supported file formats | Text formats: CSV (RFC-4180), INSERT SQL, delimited text, and fixed-length text Binary formats: Apache ORC, Apache Parquet, and Apache Avro | Text files | Relational databases, text files, HBase, and Kafka |
| Performance | High | General | High |
| Data compression | Supported | Not supported | Not supported |
| Conditional export | Supported | Not supported | Supported |
| DDL | Supported | Not supported | Not supported |
| Data preprocessing | Supported | Supported | Not supported |
| Real-time monitoring | Supported | Not supported | Supported |
| File encoding | User-defined encoding supported, provided that the system supports it | System default encoding | UTF-8 |