This topic describes use cases of command-line options in a complex scenario.
--auto-column-mapping
By default, OBLOADER requires that the columns in the source file exactly match those of the target table. OBLOADER imports data to the target columns in order without identifying column names.
If the number of columns in the source file is different from that of the target table, you can specify the --auto-column-mapping option to define column name mappings, so that OBLOADER imports data based on the column names, and ignores columns that have no matching columns in the target table.
Scenarios
Assume that the CSV file containing the col1, col2, and col3 columns of the my_table table in the /home/admin/obloader directory needs to be imported into the target table containing the col2, col3, and col4 columns in the obloader database.
Data of the source table
$cat my_table.csv 'col1','col2','col3' 'data1',1,'desc1' 'data2',2,'desc2' 'data3',3,'desc3'Schema of the target table
obclient [obloader]> DESC my_table; +-------+--------------+------+------+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-------+--------------+------+------+---------+-------+ | col2 | int(11) | YES | | NULL | | | col3 | varchar(255) | YES | | NULL | | | col4 | varchar(255) | YES | | NULL | | +-------+--------------+------+------+---------+-------+ 3 rows in set (0.035 sec)
Import data without specifying --auto-column-mapping
Run the following command to import data:
./bin/obloader -h xxx.xxx.xxx.xxx -P xxxx -u test@mysql -D obloader --csv --table 'my_table' -f /home/admin/obloaderWait for the execution result. An error is expected.
Notice
If the column names do not match but the column types do, OBLOADER may ignore the column names and import the data, leading to data inconsistency.
Query the data in the target table.
SELECT * FROM my_table;The query result is as follows:
obclient [obloader]> SELECT * FROM my_table; Empty set (0.035 sec)
Import data with --auto-column-mapping specified
Run the following command to import data:
./bin/obloader -h xxx.xxx.xxx.xxx -P xxxx -u test@mysql -D obloader --csv --table 'my_table' --auto-column-mapping -f /home/admin/obloaderWait for the execution result. The expected result is that data of the
col2andcol3columns of the source table is imported, and thecol4column is filled with default values.Query the data in the target table.
SELECT * FROM my_table;The query result is as follows:
obclient [obloader]> SELECT * FROM my_table; +------+---------+------+ | col2 | col3 | col4 | +------+---------+------+ | 1 | desc1 | NULL | | 2 | desc2 | NULL | | 3 | desc3 | NULL | +------+---------+------+ 3 rows in set (0.033 sec)
--source-type hive
By default, OBLOADER maps the source file to the target table and imports data to the target table based on the source file structure and column positions. If the source file stored in Hive has a different number of columns from the target table, you can specify the --source-type hive parameter to parse the names of directories in the source path as column values, and specify the --auto-column-mapping parameter to align the columns with the target table.
Scenarios
Import the Parquet file part-00439-07417647-caab-4f87-823d-7c6dfac92211.c000 that contains the id, name, age, and city columns and that is stored in Hive to the tbl1 table that contains the id, name, age, city, and dt columns in the obloader database.
Data structure of the source file in Hive
hive/ └── tbl1 └── dt=2025 └── part-00439-07417647-caab-4f87-823d-7c6dfac92211.c000Schema of the target table
CREATE TABLE tbl1 ( id INT NOT NULL, -- The ID of the user. name VARCHAR(50) NOT NULL, -- The name of the user. age INT NOT NULL, -- The age of the user. city VARCHAR(50) NOT NULL, -- The city of the user. dt YEAR NOT NULL -- The year. ) PARTITION BY RANGE (dt) ( PARTITION P0 VALUES LESS THAN (2024), PARTITION P1 VALUES LESS THAN (MAXVALUE) );
Procedure
In the import command, specify the --source-type hive parameter to identify the source directory dt=2025 as a column, and specify the --auto-column-mapping parameter to map the source file to the target table based on column names.
Run the following command to import data:
./bin/obloader --user root@sun2 -h xxx.xxx.xxx.xxx -P xxxx -D obloader --table 'tbl1' --truncate-table --par -f ${hdfs_path} -pxxxx --log-path /tmp/ --source-type hive --auto-column-mapping --file-regular-expression '.*'Wait for the execution result. It is expected that the directory name
dt=2025in the source file will be imported to thedtcolumn of the target table.Query the data in the target table.
obclient [obloader]> SELECT * FROM tbl1; +------+---------+------+-------------+------+ | id | name | age | city | dt | +------+---------+------+-------------+------+ | 1 | Alice | 25 | New York | 2025 | | 2 | Bob | 30 | Los Angeles | 2025 | | 3 | Charlie | 35 | Chicago | 2025 | | 4 | David | 40 | Houston | 2025 | | 5 | Eve | 45 | Phoenix | 2025 | +------+---------+------+-------------+------+