Data type mapping ensures that data is accurately mapped from the original data type to the target data type when data is exported from OceanBase Database in Parquet, ORC, MaxCompute (ODPS), or CSV format. In OceanBase Database V4.3.5, a mapping table is provided to map the data types of MySQL and Oracle databases to the data types supported by Parquet, ORC, and MaxCompute (ODPS). This ensures that data is not lost, overloaded, or semantically altered during the export process.
Note
After the export, a CSV file contains data of the string type.
Parquet format
Parquet physical type |
Parquet logical type |
Hive data type |
Data type under MySQL tenant |
Remarks |
|---|---|---|---|---|
| INT32 | INT(8,TRUE) | TINYINT | TINYINT | |
| INT32 | INT(16,TRUE) | SMALLINT | SMALLINT | |
| INT32 | INT(32,TRUE) | INT | INT | |
| INT64 | INT(64,TRUE) | BIGINT | BIGINT | |
| INT32 | INT(8,FALSE) | TINYINT(overflow value is NULL) | TINYINT UNSIGNED | |
| INT32 | INT(16,FALSE) | SMALLINT(overflow value is NULL) | SMALLINT UNSIGNED | |
| INT32 | INT(32,FALSE) | INT(overflow value is NULL) | INT UNSIGNED | |
| INT64 | INT(64,FALSE) | BIGINT(overflow value is NULL) | BIGINT UNSIGNED | |
| FLOAT | NONE | FLOAT | FLOAT | |
| DOUBLE | NONE | DOUBLE | DOUBLE | |
| FIXED_LEN_BYTE_ARRAY | DECIMAL | DECIMAL | DECIMAL, DECIMAL UNSIGNED | You must specify the precision and scale. |
| BYTE_ARRAY | STRING | CHAR | CHAR, BINARY | The string type in Parquet is encoded in UTF-8. |
| BYTE_ARRAY | STRING | VARCHAR | VARCHAR, VARBINARY | |
| BYTE_ARRAY | STRING | STRING | TINYTEXT, TEXT, MEDIUMTEXT, LONGTEXT, TINYBLOB, BLOB, MEDIUMBLOB, LONGBLOB | |
| INT64 | TIMESTAMP(is_adjusted_to_utc=true, parquet::LogicalType::TimeUnit::MICROS) | TIMESTAMP | TIMESTAMP | |
| INT64 | TIMESTAMP(is_adjusted_to_utc=false, parquet::LogicalType::TimeUnit::MICROS) | TIMESTAMP | DATETIME | |
| INT32 | DATE | DATE | DATE | |
| INT64 | TIME | \ | TIME | |
| INT32 | INT(8,FALSE) | \ | YEAR |
ORC format
ORC type |
Hive data type |
Data type under MySQL tenant |
|---|---|---|
| BYTE | TINYINT | TINYINT |
| SHORT | SMALLINT | SMALLINT |
| INT | INT | INT |
| LONG | BIGINT | BIGINT |
| FLOAT | FLOAT | FLOAT |
| DOUBLE | DOUBLE | DOUBLE |
| DECIMAL | DECIMAL | DECIMAL |
| CHAR | CHAR | CHAR |
| VARCHAR | VARCHAR | VARCHAR |
| STRING | STRING | TINYTEXT/TEXT/MEDIUMTEXT/LONGTEXT |
| BINARY | BINARY | TINYBLOB/BLOB/MEDIUMBLOB/LONGBLOB/BINARY/VARBINARY |
| DATE | DATE | DATE |
| TIMESTAMP | TIMESTAMP | DATETIME/TIMESTAMP |
Export to the MaxCompute (ODPS) format
MaxCompute (ODPS) data type |
MySQL tenant data type |
Remarks |
|---|---|---|
| BOOLEAN | TINYINT (a value of 0 indicates an error, and a value of 1 indicates a success.) | |
| TINYINT | TINYINT | |
| SMALLINT | SMALLINT | |
| INT | INT | |
| BIGINT | BIGINT | |
| FLOAT | FLOAT | Precision specified by users will result in an error. |
| DOUBLE | DOUBLE | Precision specified by users will result in an error. |
| DECIMAL | DECIMAL(M,D) | Users must explicitly specify M and D to ensure consistency with the precision and scale (P/S) in MaxCompute (ODPS). Otherwise, an error will be reported during execution. |
| CHAR (length, limited to 255 bytes, with trailing spaces padded if necessary) | CHAR (length between 0 and 255) | The specified length must be greater than or equal to that of the corresponding column in MaxCompute (ODPS). Otherwise, an error will be returned during execution. |
| VARCHAR (max length: 65535) | VARCHAR (length: 0-262144) | The length specified by users must be greater than or equal to the length of the corresponding column in MaxCompute (ODPS). Otherwise, an error will be reported at execution time. |
| STRING (length up to 8 MB) | VARCHAR, TINYTEXT, TEXT, MEDIUMTEXT, or LONGTEXT (The MEDIUMTEXT type is a TEXT type that supports a character set and has a maximum length of 16,777,215 bytes.) | You need to know the maximum length of STRING data in MaxCompute (ODPS) and map it to a compatible type in OceanBase Database. If the length of data in OceanBase Database is insufficient to accommodate the data in MaxCompute (ODPS), an error will be reported at execution time. |
| BINARY (length limit: 8 MB) | VARBINARY, TINYBLOB, BLOB, MEDIUMBLOB, or LONGBLOB A BLOB value is treated as a binary string (a byte string) with a binary character set and collating sequence. Therefore, it is compared and sorted based on the numerical values of its bytes. A MEDIUMBLOB is a BLOB type with a maximum length of 16,777,215 bytes. | You must specify the maximum length of BINARY data in MaxCompute (ODPS). Then, you can decide what type to map it to in OceanBase Database. If the length of a BLOB value in MaxCompute (ODPS) exceeds the maximum length of the target type in OceanBase Database, an error is reported at runtime. |
| TIMESTAMP (stores UTC time with up to 9 digits after the decimal point, but displays the time in the current time zone) | TIMESTAMP | Only TIMESTAMP(6) is supported. Other precision settings are not supported. The precision of the TIMESTAMP data type is up to 6 digits after the decimal point, so there is a loss of precision. |
| TIMESTAMP_NTZ (stores UTC time as nanoseconds with nine digits after the decimal point, and is zone-independent) | DATETIME | Only DATETIME(6) is supported. Other DATETIME types are not supported. TIMESTAMP is zone-dependent, so it is not supported. |
| DATE (yyyy-mm-dd, 0001-01-01 to 9999-12-31) | DATE | Only DATE is supported. |
| DATETIME (accurate to milliseconds, with 3 digits after the decimal point, range: 0001-01-01 00:00:00.000 to 9999-12-31 23:59:59.999) | DATETIME | Only DATETIME(3) and DATETIME(6) are supported. DATETIME(4) or DATETIME(5) is not supported. |
| ARRAY | Not supported | Not supported |
| MAP | Not supported | Not supported |
| STRUCT | Not supported | Not supported |
| JSON | JSON | Not supported. |
