What are the application scope and limits of ETL?

2023-06-25 03:32:32  Updated

This topic describes the application scope and limits of extract, transform, and load (ETL) supported by OceanBase Migration Service (OMS).

Applicable versions

OMS V3.2.2 and later

Implementation principle

OMS uses the MySQL syntax as the input syntax for ETL. You can only filter data. You can simply enter the data filtering conditions to filter the full and incremental data.

The filter conditions entered are parsed into a syntax tree composed of operators. OMS uses the operators in the syntax tree to compute each record and then determines whether to filter the record.

OMS supports the following operators:

expr indicates an operator, and [|] indicates the selection of one in regex matching.
Unary operator:
    For integers
        +expr, -erpr, ~expr, !expr
    For the Boolean type
        expr NOT TRUE, expr IS TRUE
    For comparison against null values
        expr IS NULL, expr IS NOT NULL
Binary operators:
    For the Boolean type
        expr [AND | OR | XOR] expr
    For arithmetic operations of integers
        expr [ & | >> | << | + | - | * | / | % | ^] expr
        expr | expr
    For operations of value comparison
        expr [> | >= | < | <= | = | != | <>] expr
For judgement based on Boolean values:
    BETWEEN expr AND expr
    IN (expr, ...., expr)
For functions (custom functions):
    CONCAT (expr1, expr2)
        IF expr1 == NULL, RETURN expr2
        ELSE IF expr2 == null, RETURN expr1
        ELSE RETURN expr1 + expr2
        If both expr1 and expr2 are NULL, NULL is returned.
        Otherwise, the CONCAT value of one or both non-NULL parties is returned.
    SUBSTR (expr, exprS (,exprC)?)
        SUBSTR (expr, exprS (,exprC)?) is equivalent to the SUBSTRING function of a database.
        The index subscript of the string starts from 1. exprS is the position where the string copying starts, and exprC is the number of copied strings.
        If exprC <= 0 or exprS == 0, an empty string is returned.
        str = '1234567';
        If exprS > 0 and exprS <= str.length, the string returned is as follows:
            SUBSTRING(str, 1, 2) = '12'
            SUBSTRING(str, 1) = '1234567'
            SUBSTRING(str, 6, 3) = '67'
            SUBSTRING(str, 7, 2) = '7'
        If exprS > str.length, an empty string is returned.
            SUBSTRING(str, 8, 2) = ''
        If exprS < 0 and-exprS <= str.length, the start position is str.length + exprS.
            SUBSTRING(str, -1, 2) = '7'
            SUBSTRING(str, -4, 2) = '45'
            SUBSTRING(str, -4) = '4567'
        If exprS < 0 and -exprS > str.length, an empty string is returned.
    REPLACE(exprO, expr1, expr2)
        If exprO is null, null is returned.
        If expr1 is null, exprO is returned.
        If expr2 is null, it is equivalent to REPLACE(exprO, expr1, "") or REMOVE(exprO, expr1).

Usage notes

  • Unsupported types

    Calculation and comparison of DOUBLE/FLOAT numbers are not supported. Otherwise, errors may occur in the results.

  • Comparison of NULL values

    FALSE is returned for any comparison against NULL values, including =, >=, <=, ! =, >, and <, and FALSE is returned for NULL = NULL as well.

    If you want to compare NULL values, use IS NULL or IS NOT NULL.

  • Comparison of CHAR strings

    A trailing space is ignored in comparison, that is, 'abc' = 'abc '.

    A leading space is not ignored, that is, ' abc' != 'abc'.

    Case is not ignored, that is, 'abc' != 'ABC'.

  • Comparison of VARCHAR strings

    A trailing space is ignored for the comparison of MySQL and DB2-related VARCHAR strings, that is, 'abc' = 'abc '.

    A trailing space is ignored for the comparison of ORACLE VARCHAR strings, that is, 'abc' != 'abc '.

  • Precedence of AND and OR

    For AND and OR operators not enclosed in parentheses, AND takes precedence over OR. Use parentheses to specify the precedence.

  • Column name conversion

    ETL does not perform any case conversion on column names that you pass in. Use the escape character \to escape column names.

  • Implicit conversions

    ETL uses CAST(expr AS type) for explicit conversions. The following type conversions are supported:

    type = CHARACTER/CHAR to VARCHAR_STRING

    type = DEC/NUMERIC/DECIMAL to DECIMAL

    type = DOUBLE to DOUBLE

    type = SIGNED/UNSIGNED to INT64

    type = DATE to DATE

    type = TIME to TIME

    type = DATETIME to DATETIME

    The following implicit conversions are supported:

    INT64 and DECIMAL to DECIMAL for comparison

    INT64 and STRING to INT64 for comparison

    DECIMAL and STRING to DECIMAL for comparison

    You can use a Linux timestamp or 'YYYY-MM-DD HH:MM:SS ZoneID' for the TIMSTAMP type of MySQL. If ZoneID is not specified, +08:00 is used to convert the time zone.

    For ORACLE TIMESTAMP WITH LOCAL TIMEZONE, you can use 'YYYY-MM-DD HH:MM:SS ZoneID' for comparison. If ZoneID is not specified, +08:00 will be used to convert the time zone.

Sample code

Filter condition: 'id' > 100 AND 'ts' < '2022-07-11 00:00:00+08:00'.
INSERT/DELETE id = 1000, ts = '2022-07-10 00:00:00 +08:00' passes.
INSERT/DELETE id = 99, ts = '2022-07-10 00:00:00 +08:00' filtered out.

UPDATE Before update, id = 1000, ts = '2022-07-10 00:00:00 +08:00'.
       After update, id = 1000, ts = '2022-07-10 00:00:00 +08:00' passes.

UPDATE Before update, id = 99, ts = '2022-07-10 00:00:00 +08:00'.  
       After update, id = 99, ts = '2022-07-10 00:00:00 +08:00' filtered out.

UPDATE Before update, id = 99, ts = '2022-07-10 00:00:00 +08:00'.
       After update, id = 1000, ts = '2022-07-10 00:00:00 +08:00'.
       Split into INSERT  id = 1000, ts = '2022-07-10 00:00:00 +08:00'.


UPDATE Before update, id = 1000, ts = '2022-07-10 00:00:00 +08:00'.
       After update, id = 99, ts = '2022-07-10 00:00:00 +08:00'.
       Split into DELETE id = 1000, ts = '2022-07-10 00:00:00 +08:00'.

Contact Us