With a defined control file, OBLOADER can preprocess the imported data.
File template
The template for defining a control file is as follows:
lang=java
(
${Column name} ${Byte offset} (optional) "${Preprocessing function}" (optional) ${Mapping definition} (optional),
${Column name} ${Byte offset} (optional) "${Preprocessing function}" (optional) ${Mapping definition} (optional),
${Column name} ${Byte offset} (optional) "${Preprocessing function}" (optional) ${Mapping definition} (optional),
${Column name} ${Byte offset} (optional) "${Preprocessing function}" (optional) ${Mapping definition} (optional)
);
Column name: the name of a field in the database table structure. OBLOADER is not case-sensitive to column names. If you want to specify the case, enclose the column name in either square brackets [ ] or backticks ``. For example,
[c1]indicates the c1 column, and[C1]indicates the C1 column.Byte offset: the starting and ending positions of a field in bytes. Only data in the
--posformat supports the definition of byte offset.Absolute offset:
position(start, end), wherestartandendindicate the starting and ending positions of the field in bytes. If you want to specify the column length to ignore importing a specific segment of bytes into the database table, you can use the special keyword _FILLER instead of the actual column name to mark this part of data. This will create an anonymous column that is parsed by OBLOADER but not associated with any existing column in the table. This keyword is particularly useful for ignoring irrelevant data in the file, such as trailing fill bytes. Example:_filler position(5:10), which indicates to skip bytes 5 to 10.Relative offset:
position(length), wherelengthindicates the length of the field in bytes.
Preprocessing function: a preprocessing function configured for a specified column in the control file by using the preprocessing functions described in Preprocessing functions to preprocess the imported data.
Mapping definition: the position of the column that stores preprocessed data in the data file.
Notice
- The naming rule for a control file is <table name>.ctrl.
- Before data is imported, OBLOADER will retrieve the control file with a
.ctrlextension and preprocess the data to be imported based on the functions specified in the control file.- If multiple tables are imported with multiple control files, OBLOADER will retrieve all files with a
.ctrlextension in thectl-pathdirectory and bind them to the imported tables based on the correspondence between control file names and table names. In this case, ensure that the control file names match the names of the corresponding imported tables. For example, the control file namedloader1.ctrlcorresponds to the table namedloader1, and the control file namedloader2.ctrlcorresponds to the table namedloader2. - If only one table is imported with a single control file, the
--ctl-pathoption can directly specify the control file name. OBLOADER will then directly bind the specified control file to the imported table.
- If multiple tables are imported with multiple control files, OBLOADER will retrieve all files with a
Considerations
If a column name defined in the table contains database keywords, you must escape the column name by enclosing it with backticks (``). Here is an example:
lang=java ( `WHEN` "lower(`WHEN`)", c2 "nanvl(c2,'0')" );OBLOADER 3.x no longer supports .map files. Field mapping functionality is integrated into the .ctrl file. Here is an example:
lang=java( C1 "NONE" map(1), C2 "NONE" map(1), // mapping for multiple columns C3 "CONSTANT('xxx')", // xxx is a specified constant C4 "SEQUENCE(100, 1)" // 100 is the initial value, and 1 is the step size );Here,
C3 "CONSTANT('xxx')"andC4 "SEQUENCE(100, 1)"are generated columns.Note
- OBLOADER versions earlier than 3.0.0 only support single-column mapping and do not support generated columns. The control files of OBLOADER 3.0.0 and later support multi-column mapping and generated columns.
- Generated columns are columns that do not exist in the data file. OBLOADER supports preprocessing functions for generated columns including
CONSTANT,SEQUENCE, andDB_SEQUENCE. For example,C4 "SEQUENCE(100, 1)". For more information, see Preprocessing functions.
The control file cannot be used in conjunction with
--exclude-column-names. The functionality of--exclude-column-namesis already included in the control file.The field names in the control file must be the same as those in the destination database table. Otherwise, the field values in the data file cannot be correctly mapped to the fields.
Examples
Import data in the --cut, --csv, or --sql format
Prepare the data.
Here is a sample statement:
DROP TABLE IF EXISTS `student`; CREATE TABLE `mysql`.`loader` ( `c1` varchar(120) NULL, `c2` varchar(120) NULL, `c3` varchar(120) NULL, `c4` varchar(120) NULL, `c5` varchar(120) NULL, `c6` varchar(120) NULL, `c7` varchar(120) NULL, `c8` varchar(120) NULL, `c9` varchar(120) NULL, `c10` varchar(120) NULL, `c11` varchar(120) NULL, `c12` varchar(120) NULL, `c13` varchar(120) NULL, `c14` varchar(120) NULL, `c15` int(120) NULL, `c16` timestamp NULL, `c17` varchar(120) NULL, `c18` varchar(120) NULL, `c19` varchar(120) NULL, `c20` varchar(120) NULL, `c21` varchar(120) NULL, `c22` date NULL, `c23` varchar(120) NULL, `c24` varchar(120) NULL, `c25` varchar(120) NULL, `c26` varchar(120) NULL, `c27` varchar(120) NULL, `c28` varchar(120) NULL ) DEFAULT CHARSET = utf8mb4 COLLATE = utf8mb4_general_ci; INSERT INTO `loader`(`c1`,`c2`,`c3`,`c4`,`c5`,`c6`,`c7`,`c8`,`c9`,`c10`,`c11`,`c12`,`c13`,`c14`,`c15`,`c16`,`c17`,`c18`,`c19`,`c20`,`c21`,`c22`,`c23`,`c24`,`c25`,`c26`,`c27`,`c28`) VALUES ('XIAOHONG', 1111111,'222222 ',123456,123,'hhhhh',1234,'abc','null',123456,'1','1','china','123',456,'2023-07-11 20:56:36','123','1','1',11,2,'2023-07-11','Abc123','Ab3','Ab3','12345Ab3','Ab312345','123456789');Define a control file. In the control file, configure the corresponding preprocessing function for each column to be imported.
Here is a sample statement:
lang=java ( c1 "lower(c1)" map(1), -- Convert the letters in the c1 column to lowercase. c2 "ltrim(c2)" map(2), -- Truncate leading spaces from the values in the c2 column. c3 "rtrim(c3)" map(3), -- Truncate trailing spaces from the values in the c3 column. c4 "substr(c4,0,5)" map(4), -- Extract a string of 5 characters from the first position in the values of the c4 column. c5 "trim(c5)" map(5), -- Truncate leading and trailing spaces from the values in the c5 column. c6 "upper(c6)" map(6), -- Convert the letters in the c6 column to uppercase. c7 "nanvl(c7,'0')" map(7), -- Verify if the values in the c7 column are numeric. If not, return 0. c8 "replace(c8,'a','A')" map(8), -- Replace all occurrences of 'a' with 'A' in the values of the c8 column. c9 "nvl(c9,'null')" map(9), -- Check if the values in the c9 column are null. If so, return 'null'. c10 "length(c10)" map(10), -- Calculate the length of the values in the c10 column. c11 "lpad(c11,5,'x')" map(11), -- Add a 5-character string 'x' to the left of the values in the c11 column. c12 "rpad(c12,5,'x')" map(12), -- Add a 5-character string 'x' to the right of the values in the c12 column. c13 "convert(c13,'utf-8','gbk')" map(13), -- Convert the values in the c13 column from GBK encoding to UTF-8. c14 "concat(c14, '_suffix')" map(14), -- Concatenate the values in the c14 column with a constant. c15 "none" map(15), -- Return the values in the c15 column without any changes. c16 "systimestamp" map(16), -- Return the current timestamp of the cluster. c17 "constant('1')" map(17), -- Return the constant value 1 without any changes. c18 "lpadb(c18,5,'x')" map(18), -- Add five single-byte 'x' characters to the left of the values in the c18 column. c19 "rpadb(c19,5,'x')" map(19), -- Add five single-byte 'x' characters to the right of the values in the c19 column. c20 "case when length(trim(c20))<18 then 'Y' else 'N' end" map(20), -- Check if the values in the c20 column, after trimming, are less than 18 characters. If so, return 'Y'. Otherwise, return 'N'. c21 "case length(trim(c21)) when '1' then 'one' when '2' then 'two' else 'unknown' end" map(21), -- Check if the values in the c21 column, after trimming, are '1' or '2'. If so, return the corresponding value. Otherwise, return 'unknown'. C22 "SYSDATE" map(22), -- Return the current date. C23 "MASK(C23)" map(23), -- Desensitize the values in the c23 column. Only the column name is effective. The default desensitization characters replace uppercase and lowercase letters and digits in the column. (Default desensitization characters: uppercase X, lowercase x, and digit n) C24 "MASK_FIRST_N(C24,'A','a','b',3)" map(24), -- Specify the desensitization characters for uppercase and lowercase letters and digits in the c24 column. (The default value N is 0, indicating that the desensitization starts from the first character.) C25 "MASK_LAST_N(C25,'A','a','b',3)" map(25), -- Specify the desensitization characters for uppercase and lowercase letters and digits in the c25 column. (The default value N is 0, indicating that the desensitization starts from the last character.) C26 "MASK_SHOW_FIRST_N(C26,'A','a','b',3)" map(26), -- Specify the number of characters not to be desensitized in the c26 column. (The default value N is 0, indicating that the desensitization starts from the first character.) C27 "MASK_SHOW_LAST_N(C27,'A','a','b',3)" map(27), -- Specify the number of characters not to be desensitized in the c27 column. (The default value N is 0, indicating that the desensitization starts from the last character.) C28 "REVERSE(C28)" map(28), -- Reverse the characters in the c28 column. );Notice
- A control file is named in the format of
<Table name>.ctrl. - A control file corresponds to a table. To preprocess the data of multiple tables, you must create multiple control files in the path specified by
ctl-path. The name of each control file must be the same as that of the corresponding table to be imported.
- A control file is named in the format of
After data preprocessing based on the control files in the absolute path specified by the
--ctl-pathoption, import data in the CSV format.Here is a sample statement:
$./obloader -h xx.x.x.x -P 2883 -u test@mysql#cluster_a -p ****** --sys-user **u*** --sys-password ****** -D USERA --csv --table '*' --ctl-path /home/controls/ -f /outputHere, the
--ctl-path /home/controls/option specifies the absolute path where control files are located.Here is a sample return result:
... All Load Tasks Finished: ---------------------------------------------------------- | No.# | Type | Name | Count | Status | ---------------------------------------------------------- | 1 | TABLE | loader | 1 -> 1 | SUCCESS | ---------------------------------------------------------- Total Count: 1 End Time: 2023-07-04 07:13:01 ...Here is an example of imported data:
... SELECT * FROM `loader`; +----------+---------+--------+-------+------+-------+------+------+------+------+-------+-------+-------+------------+------+---------------------+------+-------+-------+------+------+------------+--------+------+------+----------+----------+-----------+ | c1 | c2 | c3 | c4 | c5 | c6 | c7 | c8 | c9 | c10 | c11 | c12 | c13 | c14 | c15 | c16 | c17 | c18 | c19 | c20 | c21 | c22 | c23 | c24 | c25 | c26 | c27 | c28 | +----------+---------+--------+-------+------+-------+------+------+------+------+-------+-------+-------+------------+------+---------------------+------+-------+-------+------+------+------------+--------+------+------+----------+----------+-----------+ | xiaohong | 1111111 | 222222 | 12345 | 123 | HHHHH | 1234 | Abc | null | 6 | xxxx1 | 1xxxx | china | 123_suffix | 456 | 2023-07-05 02:52:07 | 1 | xxxx1 | 1xxxx | Y | one | 2023-07-05 | Xxxnnn | Aab | Aab | 123bbAab | Aabbb345 | 987654321 | +----------+---------+--------+-------+------+-------+------+------+------+------+-------+-------+-------+------------+------+---------------------+------+-------+-------+------+------+------------+--------+------+------+----------+----------+-----------+ 1 row in set (0.04 sec) ...
Import data in the POS format
Here is a sample statement for defining a control file:
lang=java
(
c1 position(1:5) "trim(c1)", -- Extracts the first to fifth bytes from c1 and trims leading and trailing spaces from the result.
c2 position(7:25) "replace(trim(c2),'',' ')" -- Extracts the seventh to 25th bytes from c2, trims leading and trailing spaces from the result, and replaces empty characters with spaces.
);
Notice
OBLOADER V3.0.0 and later no longer support the --null-replacer, --empty-replacer, --default-date, and --date-format options.