Error handling

2024-02-27 08:55:15  Updated

If the imported data contains bad record or discard record errors, OBLOADER V4.2.4 or later allows you to control the impact of such dirty data on the exit status of the process. You can manually fix such errors based on the generated error file.

This topic describes the mechanism for handling errors that occur when you use OBLOADER to import data. It also describes the optional parameters.

Error type

Valid errors

Data errors

Error type Description
Data type mismatch For example, VARCHAR data is inserted into an INT column.
Invalid NULL value A NOT NULL constraint is set.
Data type overflow A string or INT type value exceeds the limit.
Data preprocessing failure A call to a preprocessing function defined in the control file failed.
Column count mismatch The number of columns in the file does not match the number of columns in the database.
Duplicate primary key or unique key The primary key or unique key in the table is duplicated.

Note

  • When a data error occurs during data import, the import task cannot be retried.
  • You can view details of errors including data type mismatch, invalid NULL values, data type overflow, data preprocessing failure, and column count mismatch in the ob-loader-dumper.bad file in the {ob-loader-dumper}/logs directory, and details of errors that involve a duplicate primary or unique key in the ob-loader-dumper.discard file in the {ob-loader-dumper}/logs directory.

Environment errors

Error type Description
Network errors
  • The connection times out.
  • The connection is disconnected.
  • I/O times out.
Database system errors
  • The transaction is terminated.
  • The transaction is rolled back.
  • The tenant memory reaches the threshold.
  • The server times out.

Note

  • When an environment error occurs during data import, a single import task can be retried at most five times. The number of retries cannot be modified. After an import task reaches the retry threshold, the import task is marked as failed and the process is terminated.
  • If an environment error occurs when a table without a primary key or unique key is imported, the import task will not be retried and the current subtask is marked as failed to avoid reinserting the same batch of data.

Errors that cannot be handled

Error type Description
Original file inaccessible For example, the file is damaged or the disk access privilege is not granted.
Grammar errors in the original file For example, a CSV file has no delimiter or an SQL file has no line break.
Unable to connect to the database For example, the network between the client and server is disconnected or a connection parameter is incorrect.
OS errors For example, the JVM OOM error is returned or the process is unexpectedly terminated.

Note

When an error that cannot be handled occurs during data import, the import task cannot be retried and is marked as failed, and the process is terminated.

Best practices

In this example, the imported data has a known error. To fix this error, perform the following steps:

  1. Prepare data.

    Sample code:

    DROP TABLE IF EXISTS `example`;
    CREATE TABLE `example` (c1 TINYINT PRIMARY KEY, c2 VARCHAR(12) NOT NULL UNIQUE);
    
    INSERT INTO `example` (c1, c2) VALUES
     (0, NULL),              -- The column cannot be empty.
     (1, 'one'),
     (2, 'two'),
     (40, 'forty'),          
     (50, 'fifty-four'),     
     (77, 'seventy-seven'),  -- The string exceeds 12 characters in length.
     (600, 'six hundred'),   -- The number is beyond the range supported for the TINYINT data type.
     (40, 'forty'),         -- Conflicts with '40'
     (42, 'fifty-four');     -- Conflicts with `'fifty-four'`
    
  2. Use OBLOADER to import data and specify the --strict option to skip errors and continue.

    Sample code:

    $./obloader -h xx.x.x.x -P 2883 -u test -p ****** --sys-user **u*** --sys-password ****** -c cluster_a -t mysql -D USERA --csv --table 'example'  -f /output --strict
    

    The default value of the --strict option is true, which indicates that the tool exits at the failed state (System exit 1) when the imported data contains bad record or discard record errors. To prevent dirty data from affecting the exit state (System exit 0) of the tool, you must set the --strict option to false. The --strict option can be used in combination with the --max-discards or --max-errors option to specify that when the amount of duplicate data or the number of errors is within the specified range, the tool will skip the error and continue.

  3. Run OBLOADER, delete the error data, and skip four errors.

    Sample result:

    ...
    2023-07-11 06:25:22 [WARN] Bad records were found in file: "/home/admin/obloaderobdumper/output/data/USERA/TABLE/example.1.0.csv". Check "ob-loader-dumper.bad" for details
    2023-07-11 06:25:22 [WARN] Skipped 4 problematic records for table `USERA`.`example`. Check "logs" folder for details
    2023-07-11 06:25:24 [INFO] Drain and halt the worker group finished
    2023-07-11 06:25:24 [INFO] Close connections: 26 of the BIZ DataSource.
    2023-07-11 06:25:24 [INFO] Shutdown task context finished
    2023-07-11 06:25:24 [ERROR] Error: Skipped 4 problematic records for table `USERA`.`example`. Check "logs" folder for details
    2023-07-11 06:25:24 [INFO] ----------   Finished Tasks: 1       Running Tasks: 0        Progress: 100.00%       ----------
    2023-07-11 06:25:24 [INFO]
    
    All Load Tasks Finished:
    
    ----------------------------------------------------------
    |  No.#  | Type   | Name       | Count       | Status    |
    ----------------------------------------------------------    
    |  1     | TABLE  | example     | 9 -> 5     | FAILURE   |                
    ----------------------------------------------------------
    
    Total Count: 5          End Time: 2023-07-11 06:25:24
    
    2023-07-11 06:25:24 [INFO] Load record finished. Total Elapsed: 3.281 s
    2023-07-11 06:25:24 [ERROR] System exit 1
    ...
    

    Example of data to be imported:

    ...
    SELECT * FROM `example`;
    +----+------------+
    | c1 | c2         |
    +----+------------+
    |  0 | NULL       |
    |  1 | one        |
    |  2 | two        |
    | 40 | forty      |
    | 50 | fifty-four |
    +----+------------+
    5 rows in set (0.05 sec)
    ...
    
  4. View error details.

    Note

    • View details of data type mismatch and data type overflow errors in the {ob-loader-dumper}/logs/ob-loader-dumper.bad file.
    • View details of duplicate primary key or unique key errors in the {ob-loader-dumper}/logs/ob-loader-dumper.discard file.
    ...
    cd /output/logs
    cat ob-loader-dumper.bad
    INSERT INTO example (`c1`,`c2`) VALUES ('77','seventy-seven');
    Cause: Data too long for column 'c2' at row 1   
    
    INSERT INTO example (`c1`,`c2`) VALUES ('600','six hundred');
    Cause: Out of range value for column
    
    cat ob-loader-dumper.discard
    INSERT INTO example (`c1`,`c2`) VALUES ('40','forty');
    INSERT INTO example (`c1`,`c2`) VALUES ('42','fifty-four');
    ...
    
  5. Manually fix such errors based on the error messages.

Contact Us