Data type mapping ensures that data is accurately mapped from the original data type to the target data type when data is exported from OceanBase Database in Parquet, ORC, or MaxCompute (ODPS) format. In OceanBase Database V4.3.5, a mapping table is provided to map the data types of MySQL and Oracle databases to the data types supported by Parquet, ORC, and MaxCompute (ODPS). This ensures that data is not lost, overloaded, or semantically altered during the export process.
Note
After export, CSV files will be in the string data type.
Parquet format
Parquet physical type |
Parquet logical type |
Hive data type |
Data type under Oracle tenant |
Remarks |
|---|---|---|---|---|
| FLOAT | NONE | FLOAT | BINARY_FLOAT | |
| DOUBLE | NONE | DOUBLE | BINARY_DOUBLE | |
| FIXED_LEN_BYTE_ARRAY | DECIMAL | DECIMAL | NUMBER | You must specify precision and scale. |
| BYTE_ARRAY | STRING | CHAR | CHAR | Parquet string types are UTF-8 encoded. |
| BYTE_ARRAY | STRING | VARCHAR | VARCHAR2 | |
| BYTE_ARRAY | STRING | STRING | RAW, BLOB, CLOB | |
| INT64 | TIMESTAMP(is_adjusted_to_utc=false, parquet::LogicalType::TimeUnit::MICROS) | TIMESTAMP | DATE | |
| INT96 | NONE | TIMESTAMP | TIMESTAMP, TIMESTAMP WITH LOCAL TIME ZONE | |
ORC format
ORC type |
Hive data type |
Data type under Oracle tenant |
|---|---|---|
| FLOAT | FLOAT | BINARY_FLOAT |
| DOUBLE | DOUBLE | BINARY_DOUBLE |
| DECIMAL | DECIMAL | NUMBER |
| CHAR | CHAR | CHAR |
| VARCHAR | VARCHAR | VARCHAR2 |
| STRING | STRING | CLOB |
| BINARY | BINARY | BLOB/RAW |
| TIMESTAMP | TIMESTAMP | DATE/TIMESTAMP/TIMESTAMP WITH LOCAL TIME ZONE |
Export to MaxCompute (ODPS) format
MaxCompute (ODPS) data type |
Oracle tenant data type |
|---|---|
| BOOLEAN | NUMBER(1,0) |
| TINYINT | NUMBER(3,0) |
| SMALLINT | NUMBER(5,0) |
| INT | NUMBER(10,0) |
| BIGINT | NUMBER(20,0) |
| FLOAT | BINARY_FLOAT |
| DOUBLE | BINARY_DOUBLE |
| DECIMAL | NUMBER(P,S) (requires the user to explicitly specify P and S) |
| CHAR (Maximum length is 255 bytes, with spaces appended as needed) | CHAR |
| VARCHAR(MAX) | VARCHAR |
| STRING (length limit: 8 MB) | VARCHAR / CLOB |
| BINARY (length ≤ 8 MB) | RAW BLOB: a binary large object that stores large binary objects in databases. You can regard a BLOB as a stream of bits without semantic character sets. The maximum length of bytes is 536,870,910. |
| TIMESTAMP (stores the UTC time, with up to nine digits after the decimal point, but in the local time zone when displayed) | TIMESTAMP(9) WITH LOCAL TIME ZONE |
| TIMESTAMP_NTZ (stores UTC time with up to 9-precision digits in nanoseconds, and is zone-ignorant. | TIMESTAMP |
| DATE (in the yyyy-mm-dd format, with a value range from 0001-01-01 to 9999-12-31) | DATE |
| DATETIME (accurate to milliseconds, with up to 3 digits after the decimal point, range: 0001-01-01 00:00:00.000 to 9999-12-31 23:59:59.999) | TIMESTAMP |
| ARRAY | Unsupported. |
| MAP | Not supported for now. |
| STRUCT | Not supported. |
| JSON | Not supported. |
