Type conversion during expression evaluation
Data type conversion occurs when an operator in an expression is used with operands of different types to make the operands compatible.
Conversion methods
OceanBase Database supports explicit data type conversion and implicit data type conversion.
Implicit data type conversion occurs when an operation requires a parameter of a specific data type, but the actual value of the statement parameter is not of the specified data type. In this case, OceanBase Database converts the actual parameter value to the specified data type and then performs the operation.
For example, OceanBase Database automatically converts strings to numbers and vice versa as needed.
obclient> SELECT 1+'1';
+-------+
| 1+'1' |
+-------+
| 2 |
+-------+
1 row in set
Explicit data type conversion in OceanBase Database is implemented by using the CAST function. Here is an example:
obclient> SELECT 31.4, CAST(31.4 AS TIME);
+------+--------------------+
| 31.4 | CAST(31.4 AS TIME) |
+------+--------------------+
| 31.4 | 00:00:31 |
+------+--------------------+
1 row in set
For more information about the CAST function, see CAST.
Conversion rules
Conversion rules for comparison operations
If one or both parameters are
NULL, the result isNULL, except for the<=>comparison operator. ForNULL <=> NULL, the result is true, and no conversion is needed.If both parameters in a comparison operation are strings, they are compared as strings.
If both parameters are integers, they are compared as integers.
If the parameters are not compared with numbers, hexadecimal values are treated as binary strings.
A single-row subquery that involves one or more tables is not considered a constant. For example, if an integer returned by the subquery is compared with a
DATETIMEvalue, both operands are treated as integers. Integers cannot be converted to time values, so to compare the operands asDATETIMEvalues, you must explicitly convert the subquery value toDATETIMEusingCAST().If one of the operands is a
TIMESTAMPorDATETIMEcolumn and the other is a constant, the constant is converted to a timestamp before the comparison is performed. For safety, always use complete date and time strings when comparing withBETWEENand date or time values. For example, when usingBETWEENwith date or time values, you can useCAST()to explicitly convert the values to the desired data type to obtain the best results.If one of the operands is a decimal value, the comparison type depends on the other operand. If the other operand is a decimal or integer value, the operands are compared as decimal values. If the other operand is a floating-point value, the operands are compared as floating-point values.
In all other cases, the operands are compared as floating-point (real) numbers. For example, strings and numeric operands are compared as floating-point numbers.
Comparisons between large values of the
INTEGERtype and floating-point numbers are approximate because integers are converted to double-precision floating-point numbers before the comparison and cannot accurately represent all 64-bit integers.
Type degradation
Constants are aligned with column types as much as possible to ensure consistent comparison semantics, thereby improving comparison efficiency. Most comparisons between columns and constants are supported under standard semantics. For unsupported comparisons, a CAST operation is performed on the column. These cases are described as follows:
- An
intoruintcolumn can be compared with a character,decimal, or floating-point (doubleorfloat) constant. - A
decimalcolumn can be compared with a floating-point (doubleorfloat) constant. - A
yearcolumn can be compared with a numeric, character, or time (date,datetime,timestamp, ortime) constant. - A
datecolumn can be compared with a character,datetime,timestamp, ortimeconstant. - A
timestampcolumn can be compared with adatetimeconstant.
A CAST operation is performed on the column in the following scenarios (excluding complex data types such as arrays, XML, and JSON):
- A
floatcolumn: Comparisons for afloatcolumn are always performed usingdouble. Therefore, acastoperation is performed when afloatcolumn is compared with any constant. - A fixed-precision floating-point column: When the number of decimal places in the constant is greater than that specified in the
psparameter in the CREATE TABLE statement, the comparison is performed based on the actual number of decimal places. In this case, acastoperation is performed on the column. - A character column: When a character is compared with a numeric value, the character column is
castinto a numeric data type. In addition, the comparison result may vary depending on the character set used, which may also require acastoperation on the column.
Non-standard comparison
The logic of non-standard comparison is similar to that of type degradation, which only handles the comparison between constants and columns. If the two operands are columns, the comparison is not processed. At present, only comparisons between integer constants and character columns are supported. Note that if you set the parameter to range, the sorting result may not conform to the standard semantics. Therefore, you need to select a proper value based on the data characteristics.
The following examples explain the parameter settings and the corresponding execution plans.
A table named
t1is created. The table has abiz_daycolumn of theVARCHAR(10)data type.obclient> CREATE TABLE t1 (biz_day varchar(10)); Query OK, 0 rows affected (0.216 sec)View the execution plan without any parameter settings.
obclient> EXPLAIN SELECT * FROM t1 WHERE biz_day BETWEEN 20200101 AND 20200201; +-------------------------------------------------------------------------------------------------------------------------------------------------+ | Query Plan | +-------------------------------------------------------------------------------------------------------------------------------------------------+ | ====================================================== | | |ID|OPERATOR |NAME|EST.ROWS|EST.TIME(us)| | | ------------------------------------------------------ | | |0 |COLUMN TABLE FULL SCAN|T1 |2 |3 | | | ====================================================== | | Outputs & filters: | | ------------------------------------- | | 0 - output([T1.BIZ_DAY]), filter([cast(T1.BIZ_DAY, NUMBER(-1, -85)) >= 20200101], [cast(T1.BIZ_DAY, NUMBER(-1, -85)) <= 20200201]), rowset=16 | | access([T1.BIZ_DAY]), partitions(p0) | | is_index_back=false, is_global_index=false, filter_before_indexback[false,false], | | range_key([T1.__pk_increment]), range(MIN ; MAX)always true | +-------------------------------------------------------------------------------------------------------------------------------------------------+ 11 rows in set (0.004 sec)- Query plan: By default, the query optimizer converts the
biz_daycolumn from the string type to the number type and then compares the values. - Filter condition:
filter([cast(T1.BIZ_DAY, NUMBER(-1, -85)) >= 20200101], [cast(T1.BIZ_DAY, NUMBER(-1, -85)) <= 20200201])indicates that thebiz_daycolumn is compared in a range after it is converted to the number type.
- Query plan: By default, the query optimizer converts the
Set
non_standard_comparison_leveltoequaland view the execution plan.obclient> EXPLAIN SELECT /*+opt_param('non_standard_comparison_level', 'equal')*/ * FROM t1 WHERE biz_day BETWEEN 20200101 AND 20200201; +----------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Query Plan | +----------------------------------------------------------------------------------------------------------------------------------------------------------------+ | ====================================================== | | |ID|OPERATOR |NAME|EST.ROWS|EST.TIME(us)| | | ------------------------------------------------------ | | |0 |COLUMN TABLE FULL SCAN|t1 |1 |3 | | | ====================================================== | | Outputs & filters: | | ------------------------------------- | | 0 - output([t1.biz_day]), filter([cast(t1.biz_day, DECIMAL(-1, -1)) >= cast(20200101, DECIMAL(20, 0))], [cast(t1.biz_day, DECIMAL(-1, -1)) <= cast(20200201, | | DECIMAL(20, 0))]), rowset=16 | | access([t1.biz_day]), partitions(p0) | | is_index_back=false, is_global_index=false, filter_before_indexback[false,false], | | range_key([t1.__pk_increment]), range(MIN ; MAX)always true | +----------------------------------------------------------------------------------------------------------------------------------------------------------------+ 12 rows in set (0.003 sec)- Query plan: When you set
non_standard_comparison_leveltoequal, the query optimizer converts thebiz_daycolumn from the string type to the decimal type and then compares the values. - Filter condition:
filter([cast(t1.biz_day, DECIMAL(-1, -1)) >= cast(20200101, DECIMAL(20, 0))], [cast(t1.biz_day, DECIMAL(-1, -1)) <= cast(20200201, DECIMAL(20, 0))])indicates that thebiz_daycolumn is compared in a range after it is converted to the decimal type. In a range query, the equal value comparison rule does not convert the constant type.
- Query plan: When you set
Set
non_standard_comparison_leveltorangeand view the execution plan.obclient> EXPLAIN SELECT /*+opt_param('non_standard_comparison_level', 'range')*/ * FROM t1 WHERE biz_day BETWEEN 20200101 AND 20200201; +-------------------------------------------------------------------------------------------------------------------------------------------------------+ | Query Plan | +-------------------------------------------------------------------------------------------------------------------------------------------------------+ | ====================================================== | | |ID|OPERATOR |NAME|EST.ROWS|EST.TIME(us)| | | ------------------------------------------------------ | | |0 |COLUMN TABLE FULL SCAN|t1 |1 |3 | | | ====================================================== | | Outputs & filters: | | ------------------------------------- | | 0 - output([t1.biz_day]), filter([t1.biz_day >= demote_cast(20200101, VARCHAR(10))], [t1.biz_day <= demote_cast(20200201, VARCHAR(10))]), rowset=16 | | access([t1.biz_day]), partitions(p0) | | is_index_back=false, is_global_index=false, filter_before_indexback[false,false], | | range_key([t1.__pk_increment]), range(MIN ; MAX)always true | +-------------------------------------------------------------------------------------------------------------------------------------------------------+ 11 rows in set (0.004 sec)- Query plan: When you set
non_standard_comparison_leveltorange, the query optimizer converts the integer constant to the string type and then compares the values. - Filter condition:
filter([t1.biz_day >= demote_cast(20200101, VARCHAR(10))], [t1.biz_day <= demote_cast(20200201, VARCHAR(10))])indicates that the integer constant is converted to the string type and then compared in a range. After non-standard comparison, the values are compared in character order. For example,'202002010'belongs to the character sequence['20200101', '20200201'], but it does not belong to the numeric sequence[20200101, 20200201].
- Query plan: When you set
Other conversion rules
The way a string is converted to a floating-point value may differ from the way an integer is converted to a floating-point value. The result may depend on factors such as the compiler version. To prevent an implicit conversion of the value to a floating-point value, use
CAST().When a numeric or time value is implicitly converted to a string, the result has a character set and collation determined by the
character_set_connectionandcollation_connectionsystem variables, respectively.This means that the result is a nonbinary string (
CHAR,VARCHAR, orLONGTEXTvalue) unless the connection character set is binary. In that case, the result is a binary string (BINARY,VARBINARY, orLONGBLOBvalue).The type conversion rules for evaluating integer expressions are different. For example, if you use an integer expression as part of a CREATE TABLE statement, the new table will contain columns of the
INTorBIGINTtype depending on the result type of the expression:CREATE TABLE t SELECT integer_expr;If the maximum length of the expression does not fit into an
INT,BIGINTis used. However, you can use a sufficiently long expression to force the use ofBIGINTinstead ofINT:obclient> CREATE TABLE t SELECT 000000000000000000000; Query OK, 1 row affected
Implicit data type conversion rules
When a data type conversion makes sense, OceanBase Database automatically converts a value from one data type to another data type.
The following table shows the implicit conversion matrix for all data types. This table does not take conversion directions or contexts into account.
| Data type | BOOL | INT | SMALLINT | MEDIUMINT | BIGINT | SERIAL | DECIMAL | NUMERIC | FLOAT | DOUBLE | BIT | DATETIME | TIMESTAMP | DATE | TIME |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| YEAR | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
| CHAR | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
| VARCHAR | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
| BINARY | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
| VARBINARY | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
| TEXT | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
| TINYBLOB | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
| MEDIUMBLOB | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
| LONGBLOB | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
| TINYTEXT | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
| MEDIUMTEXT | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
| LONGTEXT | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
| ENUM | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
| SET | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
| BLOB | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
Example of implicit data type conversion
obclient> SELECT CAST(BOOL_COLUMN AS YEAR) FROM YOUR_TABLE;
+---------------------------+
| CAST(BOOL_COLUMN AS YEAR) |
+---------------------------+
| 2001 |
+---------------------------+
2 rows in set (0.001 sec)