Data type mapping ensures that when data is exported from OceanBase Database to Parquet, ORC, MaxCompute (ODPS), or CSV format, the original data types are accurately mapped to the target format data types. OceanBase Database uses explicit mapping tables to associate various data types of MySQL and Oracle with the data types supported by Parquet, ORC, and MaxCompute (ODPS), ensuring that data is not lost, overloaded, or semantically altered during the export process.
Note
Exported CSV files use the string type.
| Parquet physical type |
Parquet logical type |
Hive data type |
Data type under MySQL-compatible tenant |
Remarks |
| INT32 |
INT(8,TRUE) |
TINYINT |
TINYINT |
|
| INT32 |
INT(16,TRUE) |
SMALLINT |
SMALLINT |
|
| INT32 |
INT(32,TRUE) |
INT |
INT |
|
| INT64 |
INT(64,TRUE) |
BIGINT |
BIGINT |
|
| INT32 |
INT(8,FALSE) |
TINYINT(overflow value is NULL) |
TINYINT UNSIGNED |
|
| INT32 |
INT(16,FALSE) |
SMALLINT(overflow value is NULL) |
SMALLINT UNSIGNED |
|
| INT32 |
INT(32,FALSE) |
INT(overflow value is NULL) |
INT UNSIGNED |
|
| INT64 |
INT(64,FALSE) |
BIGINT(overflow value is NULL) |
BIGINT UNSIGNED |
|
| FLOAT |
NONE |
FLOAT |
FLOAT |
|
| DOUBLE |
NONE |
DOUBLE |
DOUBLE |
|
| FIXED_LEN_BYTE_ARRAY |
DECIMAL |
DECIMAL |
DECIMAL, DECIMAL UNSIGNED |
You must specify the precision and scale. |
| BYTE_ARRAY |
STRING |
CHAR |
CHAR, BINARY |
The string type in Parquet is encoded in UTF-8. |
| BYTE_ARRAY |
STRING |
VARCHAR |
VARCHAR, VARBINARY |
|
| BYTE_ARRAY |
STRING |
STRING |
TINYTEXT, TEXT, MEDIUMTEXT, LONGTEXT, TINYBLOB, BLOB, MEDIUMBLOB, LONGBLOB |
|
| INT64 |
TIMESTAMP(is_adjusted_to_utc=true, parquet::LogicalType::TimeUnit::MICROS) |
TIMESTAMP |
TIMESTAMP |
|
| INT64 |
TIMESTAMP(is_adjusted_to_utc=false, parquet::LogicalType::TimeUnit::MICROS) |
TIMESTAMP |
DATETIME |
|
| INT32 |
DATE |
DATE |
DATE |
|
| INT64 |
TIME |
\ |
TIME |
|
| INT32 |
INT(8,FALSE) |
\ |
YEAR |
|
| ORC type |
Hive data type |
Data type under MySQL-compatible tenant |
| BYTE |
TINYINT |
TINYINT |
| SHORT |
SMALLINT |
SMALLINT |
| INT |
INT |
INT |
| LONG |
BIGINT |
BIGINT |
| FLOAT |
FLOAT |
FLOAT |
| DOUBLE |
DOUBLE |
DOUBLE |
| DECIMAL |
DECIMAL |
DECIMAL |
| CHAR |
CHAR |
CHAR |
| VARCHAR |
VARCHAR |
VARCHAR |
| STRING |
STRING |
TINYTEXT/TEXT/MEDIUMTEXT/LONGTEXT |
| BINARY |
BINARY |
TINYBLOB/BLOB/MEDIUMBLOB/LONGBLOB/BINARY/VARBINARY |
| DATE |
DATE |
DATE |
| TIMESTAMP |
TIMESTAMP |
DATETIME/TIMESTAMP |
| MaxCompute (ODPS) data type |
Data type under MySQL-compatible tenant |
Remarks |
| BOOLEAN |
TINYINT (0 indicates false, 1 indicates true) |
| TINYINT |
TINYINT |
| SMALLINT |
SMALLINT |
| INT |
INT |
| BIGINT |
BIGINT |
| FLOAT |
FLOAT |
Users cannot specify precision; otherwise an error is returned. |
| DOUBLE |
DOUBLE |
Users cannot specify precision; otherwise an error is returned. |
| DECIMAL |
DECIMAL(M,D) |
Users must explicitly specify M and D to match P/S on MaxCompute (ODPS); otherwise a runtime error occurs. |
| CHAR (max length 255 bytes, padded with spaces if shorter) |
CHAR (length 0 to 255) |
The length specified by the user must be greater than or equal to the length of the corresponding column on MaxCompute (ODPS); otherwise a runtime error occurs. |
| VARCHAR (max length 65535) |
VARCHAR (length 0 to 262144) |
The length specified by the user must be greater than or equal to the length of the corresponding column on MaxCompute (ODPS); otherwise a runtime error occurs. |
| STRING (max length 8MB bytes) |
VARCHAR/TINYTEXT/TEXT/MEDIUMTEXT/LONGTEXT (MEDIUMTEXT has a max length of 16,777,215 bytes, with character set) |
Users must know the maximum length of STRING data on MaxCompute (ODPS) to decide the mapping. If the OceanBase Database type length cannot accommodate MaxCompute (ODPS) data, a runtime error occurs. |
| BINARY (max length 8MB) |
VARBINARY/TINYBLOB/BLOB/MEDIUMBLOB/LONGBLOB (BLOB values are treated as binary strings with binary charset and collation; MEDIUMBLOB has a max length of 16,777,215 bytes) |
Users must know the maximum length of BINARY data on MaxCompute (ODPS) to decide the mapping. If the OceanBase Database type length cannot accommodate MaxCompute (ODPS) data, a runtime error occurs. |
| TIMESTAMP (stores UTC time with nanosecond precision to 9 decimal places, displayed in current timezone) |
TIMESTAMP |
Only TIMESTAMP(6) is supported; others are not. TIMESTAMP has a max precision of 6 decimal places, so precision loss may occur. |
| TIMESTAMP_NTZ (stores UTC time with nanosecond precision to 9 decimal places, timezone-independent) |
DATETIME |
Only DATETIME(6) is supported; others are not. TIMESTAMP is timezone-dependent, so it is not supported. |
| DATE (format yyyy-mm-dd, range 0001-01-01 to 9999-12-31) |
DATE |
Only DATE is supported; others are not. |
| DATETIME (millisecond precision, 3 decimal places, range 0001-01-01 00:00:00.000 to 9999-12-31 23:59:59.999) |
DATETIME |
Only DATETIME(3) and higher decimal places (e.g., DATETIME(4)/DATETIME(5)/DATETIME(6)) are supported; others are not. |
| ARRAY |
ARRAY |
For more information, see OceanBase Database ARRAY type to ODPS mapping below. |
| MAP |
Not supported |
Not supported |
| STRUCT |
Not supported |
Not supported |
| JSON |
JSON |
Not supported |
OceanBase Database ARRAY type to ODPS mapping
The mapping between OceanBase Database ARRAY types and ODPS types is as follows:
| OceanBase Database |
ODPS |
| ARRAY(tinyint) |
ARRAY<tinyint> |
| ARRAY(tinyint) |
ARRAY<boolean> |
| ARRAY(smallint) |
ARRAY<smallint> |
| ARRAY(int) |
ARRAY<int> |
| ARRAY(bigint) |
ARRAY<bigint> |
| ARRAY(float) |
ARRAY<float> |
| ARRAY(double) |
ARRAY<double> |
| ARRAY(varchar) |
ARRAY<varchar> |
| ARRAY(varchar) |
ARRAY<string> |
| ARRAY(array()) |
ARRAY<array<>> |
References
SELECT INTO