You can define rules in a control file by which OBLOADER can preprocess data before importing it.
File template
Here is a template for defining a control file:
lang=java
(
Column name Offset position in bytes (optional) "Preprocessing function" (optional) Mapping definition (optional),
Column name Offset position in bytes (optional) "Preprocessing function" (optional) Mapping definition (optional),
Column name Offset position in bytes (optional) "Preprocessing function" (optional) Mapping definition (optional),
Column name Offset position in bytes (optional) "Preprocessing function" (optional) Mapping definition (optional)
);
Column name: the name of a column in the schema of a database table. In OBLOADER, column names are case-insensitive. To enable case sensitivity, enclose the column names in brackets ([]) or backticks (``). For example,[c1]indicates Column c1, and[C1]indicates Column C1.Offset position in bytes: Two declaration methods are supported: absolute offset and relative offset. Only data in the POS format supports defining offset positions in bytes.Absolute offset: It is defined in the format of
position(start, end), wherestartandendrespectively indicate the start position (byte) and end position (byte) of a column. To specify a column length to skip a segment of bytes when you import data to a database table, you can use the special keyword_FILLERto replace the actual column name and mark the corresponding data. This operation will create an anonymous column, which will be parsed by OBLOADER but will not be associated with any existing column in the table. This keyword is applicable when you want to ignore irrelevant data in a file, such as the padding bytes appended to lines. For example,_filler position(5:10)specifies to skip the fifth to the tenth bytes.Relative offset: It is defined in the format of
position(length), wherelengthspecifies the column length in bytes.
Preprocessing function: You can configure a preprocessing function for a specified column in the control file to preprocess data before the data is exported.Mapping definition: the column position of preprocessed data in the imported data file.Notice
- A control file is named in the format of <Table name>.ctrl.
- Before data is imported, OBLOADER retrieves the control file suffixed with
.ctrland preprocesses the data based on the function defined in the control file.- To import multiple tables by using multiple control files, OBLOADER will retrieve all files suffixed with
.ctrlin the path specified byctl-pathand bind the files to the tables based on the mappings between their names. In this case, make sure that the name of a control file is the same as that of the corresponding table. For example, the control file namedloader1.ctrlcorresponds to the table namedloader1, andloader2.ctrlcorresponds toloader2. - To import a single table by using a single control file, you can directly specify the control file name in the
--ctl-pathoption. OBLOADER will bind the specified control file with the table to be imported.
- To import multiple tables by using multiple control files, OBLOADER will retrieve all files suffixed with
Considerations
If a column name defined in the table contains database keywords, you must escape the column name by enclosing it with backticks (``). Here is an example:
lang=java ( `WHEN` "lower(`WHEN`)", c2 "nanvl(c2,'0')" );In OBLOADER V3.x, the column mapping feature of map files is integrated into control files. Map files are no longer supported. Here is an example:
lang=java( C1 "NONE" map(1), C2 "NONE" map(1), // Multi-column mapping C3 "CONSTANT('xxx')", // xxx is a specified constant. C4 "SEQUENCE(100, 1)" // 100 is the first sequence number, and 1 is the increment. );Here,
C3 "CONSTANT('xxx')"andC4 "SEQUENCE(100, 1)"are generated columns.
Note
- OBLOADER of a version earlier than V3.0.0 supports only single-column mapping and does not support generated columns. OBLOADER V3.0.0 and later allow you to configure multi-column mapping and generated columns in control files.
- A generated column is a column that does not exist in a data file. OBLOADER supports preprocessing generated columns by using the
CONSTANT,SEQUENCE, andDB_SEQUENCEfunctions, for example,C4 "SEQUENCE(100, 1)". For more information, see Preprocessing functions.
Examples
Import data in the CUT, CSV, or SQL format
Prepare data.
Here is a sample statement:
DROP TABLE IF EXISTS `student`; CREATE TABLE `mysql`.`loader` ( `c1` varchar(120) NULL, `c2` varchar(120) NULL, `c3` varchar(120) NULL, `c4` varchar(120) NULL, `c5` varchar(120) NULL, `c6` varchar(120) NULL, `c7` varchar(120) NULL, `c8` varchar(120) NULL, `c9` varchar(120) NULL, `c10` varchar(120) NULL, `c11` varchar(120) NULL, `c12` varchar(120) NULL, `c13` varchar(120) NULL, `c14` varchar(120) NULL, `c15` int(120) NULL, `c16` timestamp NULL, `c17` varchar(120) NULL, `c18` varchar(120) NULL, `c19` varchar(120) NULL, `c20` varchar(120) NULL, `c21` varchar(120) NULL, `c22` date NULL, `c23` varchar(120) NULL, `c24` varchar(120) NULL, `c25` varchar(120) NULL, `c26` varchar(120) NULL, `c27` varchar(120) NULL, `c28` varchar(120) NULL ) DEFAULT CHARSET = utf8mb4 COLLATE = utf8mb4_general_ci; INSERT INTO `loader`(`c1`,`c2`,`c3`,`c4`,`c5`,`c6`,`c7`,`c8`,`c9`,`c10`,`c11`,`c12`,`c13`,`c14`,`c15`,`c16`,`c17`,`c18`,`c19`,`c20`,`c21`,`c22`,`c23`,`c24`,`c25`,`c26`,`c27`,`c28`) VALUES ('XIAOHONG', 1111111,'222222 ',123456,123,'hhhhh',1234,'abc','null',123456,'1','1','china','123',456,'2023-07-11 20:56:36','123','1','1',11,2,'2023-07-11','Abc123','Ab3','Ab3','12345Ab3','Ab312345','123456789');Define a control file. Configure a preprocessing function for each imported column in the control file.
Here is a sample statement:
lang=java ( c1 "lower(c1)" map(1), -- Convert letters in values of Column c1 to lowercase. c2 "ltrim(c2)" map(2), -- Truncate leading spaces in values of Column c2. c3 "rtrim(c3)" map(3), -- Truncate trailing spaces in values of Column c3. c4 "substr(c4,0,5)" map(4), -- Extract a substring of five characters from values of Column c4. The extraction starts from the first byte of each value. c5 "trim(c5)" map(5), -- Truncate leading and trailing spaces in values of Column c5. c6 "upper(c6)" map(6), -- Convert letters in values of Column c6 to uppercase. c7 "nanvl(c7,'0')" map(7), -- Verify values of Column c7 and return 0 for non-numeric values. c8 "replace(c8,'a','A')" map(8), -- Replace letter 'a' in values of Column c8 with letter 'A'. c9 "nvl(c9,'null')" map(9), -- Verify whether values of Column c9 are null and return null strings if yes. c10 "length(c10)" map(10), -- Calculate the length of values of Column c10. c11 "lpad(c11,5,'x')" map(11), -- Append a string of five 'x' to the left of values of Column c11. c12 "rpad(c12,5,'x')" map(12), -- Append a string of five 'x' to the right of values of Column c12. c13 "convert(c13,'utf-8','gbk')" map(13), -- Convert the character set of values of Column c13 from GBK to UTF-8. c14 "concat(c14, '_suffix')" map(14), -- Concatenate values of Column c14 with a specific constant. c15 "none" map(15), -- Specify not to process values of Column c15 and directly return the values in the column. c16 "systimestamp" map(16), -- Specify not to process values of Column c16 and directly return the timestamp of the current cluster. c17 "constant('1')" map(17), -- Specify not to process values of Column c17 and return the constant 1. c18 "lpadb(c18,5,'x')" map(18), -- Append five single-byte 'x' to the left of values of Column c18. c19 "rpadb(c19,5,'x')" map(19), -- Append five single-byte 'x' to the right of values of Column c19. c20 "case when length(trim(c20))<18 then 'Y' else 'N' end" map(20), -- Check whether the values of Column c20 meet the specified condition. If yes, return the values in the column. c21 "case length(trim(c21)) when '1' then 'one' when '2' then 'two' else 'unknown' end" map(21), -- Check whether the values of Column c21 are equal to the specified value. If yes, return the values in the column. C22 "SYSDATE" map(22), -- Set values of Column c22 to the current date. C23 "MASK(C23)" map(23), -- Mask the values of Column c23 by replacing uppercase letters with A, lowercase letters with a, and numbers with n by default. C24 "MASK_FIRST_N(C24,'A','a','b',3)" map(24), -- Mask the first N characters in values of Column c24. By default, N is 0, and the counting starts from the first character. C25 "MASK_LAST_N(C25,'A','a','b',3)" map(25), -- Mask the last N characters in values of Column c25. By default, N is 0, and the counting starts from the last character. C26 "MASK_SHOW_FIRST_N(C26,'A','a','b',3)" map(26), -- Specify not to mask the first N characters in values of Column c26. By default, N is 0, and the counting starts from the first character. C27 "MASK_SHOW_LAST_N(C27,'A','a','b',3)" map(27), -- Specify not to mask the last N characters in values of Column c27. By default, N is 0, and the counting starts from the last character. C28 "REVERSE(C28)" map(28), -- Reverse the order of characters for values of Column c28. );Notice
- A control file is named in the format of <Table name>.ctrl.
- A control file corresponds to a table. To preprocess the data of multiple tables, you must create multiple control files in the path specified by
ctl-path. The name of each control file must be the same as that of the corresponding table to be imported.
After data preprocessing based on the control files in the absolute path specified by the
--ctl-pathoption, import data in the CSV format.Here is a sample statement:
$./obloader -h xx.x.x.x -P 2883 -u test@mysql#cluster_a -p ****** --sys-user **u*** --sys-password ****** -D USERA --csv --table '*' --ctl-path /home/controls/ -f /outputHere, the
--ctl-path /home/controls/option specifies the absolute path where control files are located.Here is sample return result:
... All Load Tasks Finished: ---------------------------------------------------------- | No.# | Type | Name | Count | Status | ---------------------------------------------------------- | 1 | TABLE | loader | 1 -> 1 | SUCCESS | ---------------------------------------------------------- Total Count: 1 End Time: 2023-07-04 07:13:01 ...Here is an example of imported data:
... SELECT * FROM `loader`; +----------+---------+--------+-------+------+-------+------+------+------+------+-------+-------+-------+------------+------+---------------------+------+-------+-------+------+------+------------+--------+------+------+----------+----------+-----------+ | c1 | c2 | c3 | c4 | c5 | c6 | c7 | c8 | c9 | c10 | c11 | c12 | c13 | c14 | c15 | c16 | c17 | c18 | c19 | c20 | c21 | c22 | c23 | c24 | c25 | c26 | c27 | c28 | +----------+---------+--------+-------+------+-------+------+------+------+------+-------+-------+-------+------------+------+---------------------+------+-------+-------+------+------+------------+--------+------+------+----------+----------+-----------+ | xiaohong | 1111111 | 222222 | 12345 | 123 | HHHHH | 1234 | Abc | null | 6 | xxxx1 | 1xxxx | china | 123_suffix | 456 | 2023-07-05 02:52:07 | 1 | xxxx1 | 1xxxx | Y | one | 2023-07-05 | Xxxnnn | Aab | Aab | 123bbAab | Aabbb345 | 987654321 | +----------+---------+--------+-------+------+-------+------+------+------+------+-------+-------+-------+------------+------+---------------------+------+-------+-------+------+------+------------+--------+------+------+----------+----------+-----------+ 1 row in set (0.04 sec) ...
Import data in the POS format
Here is a sample statement for defining a control file:
lang=java
(
c1 position(1:5) "trim(c1)", -- Extract the first to fifth bytes of characters from the values of Column c1 and truncate the leading and trailing spaces of the result.
c2 position(7:25) "replace(trim(c2),'',' ')" -- Extract the seventh to twenty-fifth bytes of characters from the values of Column c2, truncate the leading and trailing spaces of the result, and replace the empty characters with spaces.
);
Notice
OBLOADER of a version later than 3.0.0 no longer supports the --null-replacer, --empty-replacer, --default-date, and --date-format options.