This document provides an overview of the file formats and encoding formats supported by Data Import in OceanBase Cloud. Through this document, you will learn how to prepare files that meet the requirements for smooth data import into OceanBase Cloud.
File encoding formats
Data Import in OceanBase Cloud supports these file encoding formats at present:
- UTF8
- UTF16
- UTF32
- ISO-8859-1
- ASCII
- GB2312
- GBK
- GB18030
File formats
Data Import in OceanBase Cloud supports these file formats at present:
- CSV (RFC-4180)
- INSERT SQL
- Apache Parquet (currently only supported for OceanBase MySQL compatible mode)
- Apache ORC (currently only supported for OceanBase MySQL compatible mode)
You can use Data Import to import data from a single file into a single data table supported by OceanBase Cloud. For more information about the specific restrictions and requirements for each file format, see sections below.
CSV files
Data Import in OceanBase Cloud has these requirements for CSV files:
The data file must strictly adhere to the RFC 4180 specification. For more information, see the RFC 4180 specification.
- Each record in the file occupying a single line.
- The first record can contain field names.
- Fields are separated by commas, and spaces around the commas are ignored.
- Fields containing commas, newline characters, spaces, or double quotes must be enclosed in double quotes ("").
- It only supports files with the
.csvand.txtextensions.
It does not support binary data.
The CSV format supports uploading and importing files with a maximum size of 1GB, and supports importing files with a maximum size of 10GB from object storage.
INSERT SQL files
Data Import in OceanBase Cloud has these requirements for INSERT SQL files:
- It only supports files with the
.sqland.txtextensions. - The SQL format supports uploading and importing files with a maximum size of 1GB, and supports importing files with a maximum size of 10GB from object storage.
- The file content should only contain INSERT ... VALUES statements.
- It supports SQL content with a single table name.
OceanBase Cloud will automatically identify the table name based on the SQL content. If the table does not exist, OceanBase Cloud will create a new table for importing. By default, OceanBase Cloud does not set a primary key. You can set one or more columns as the primary key. After setting the primary key, the field non-null constraint is set by default.
| Mode | Field Content | Constraint | Field Type |
|---|---|---|---|
| OBMySQL | Numeric | Default allow null | int |
| Non-numeric | Default allow null | varchar(N), N as character length | |
| OBOracle | Numeric | Default allow null | NUMBER |
| Non-numeric | Default allow null | varchar(N), N as character length |
Apache Parquet
Data Import in OceanBase Cloud has these requirements for Apache Parquet files:
- Supports importing a single Parquet file into a single table.
- The Parquet format supports uploading and importing files with a maximum size of 1GB, and supports importing files with a maximum size of 10GB from object storage.
Apache ORC
Data Import in OceanBase Cloud has these requirements for Apache ORC files:
- Supports importing a single ORC file into a single table.
- The ORC format supports uploading and importing files with a maximum size of 1GB, and supports importing files with a maximum size of 10GB from object storage.