Purpose
You can use this statement to import data from an external file.
Notice
- Tables with triggers are not allowed to use the
LOAD DATAstatement. - To import data from external files, you need to have the
FILEprivilege and the following settings:- When loading server-side files, you need to set the system variable secure_file_priv in advance to configure the path that can be accessed for importing or exporting files.
- When loading client-side local files, you need to enable the ability to load data from the local file system by adding the
--local-infile[=1]option when starting the MySQL/OBClient client.
OceanBase Database allows you to use the LOAD DATA statement to load the following input files:
Server-side (OBServer node) files: These files are located on the OBServer node of OceanBase Database. You can use the
LOAD DATA INFILEstatement to load data from server-side files into the database table.Client-side (local) files: These files are located in the local file system of the client. You can use the
LOAD DATA LOCAL INFILEstatement to load data from the client's local files into the database table.OSS files: These files are located in the OSS file system. You can use the
LOAD DATA REMOTE_OSS INFILEstatement to load data from OSS files into the database table.
You can use the LOAD DATA statement to import a CSV text file in the following process:
Parse the file: OceanBase Database reads data from a file based on the file name that you enter and determines whether to perform parallel or serial parsing of data from the input file based on the specified degree of parallelism (DOP).
Distribute the data: OceanBase Database is a distributed database. Data of each partition may be distributed across different OBServer nodes. The
LOAD DATAstatement is used to process the parsed data and determine to which OBServer node the data is sent.Insert the data: After the destination OBServer node receives the data, it executes the
INSERTstatement to insert the data into the corresponding partition.
To import data from an external file, you must have the FILE privilege. You can use the GRANT FILE ON *.* TO $user_name; statement to grant the privilege, where $user_name is the user that executes the LOAD DATA statement.
Syntax
LOAD DATA
[ [/*+ PARALLEL(N) load_batch_size(M) APPEND */]
| [/*+ PARALLEL(N) direct(bool, int) */] ]
[REMOTE_OSS] INFILE 'file_name'
INTO TABLE table_name
[{FIELDS | COLUMNS}
[TERMINATED BY 'string']
[[OPTIONALLY] ENCLOSED BY 'char']
[ESCAPED BY 'char']
]
[LINES
[STARTING BY 'string']
[TERMINATED BY 'string']
]
[IGNORE number {LINES | ROWS}]
[(column_name_var
[, column_name_var] ...)]
Parameters
| Parameter | Description |
|---|---|
| parallel(N) | The DOP for loading data. The default value of N is 4. |
| load_batch_size(M) | The batch size of each insertion. The default value of M is 100. Recommended value range: [100, 1000]. |
| APPEND | A hint for enabling the bypass import feature. This feature allows you to allocate space and write data to data files. By default, this hint is equivalent to direct(false, 0) and can collect statistics online like the GATHER_OPTIMIZER_STATISTICS hint does. |
| direct | A hint for enabling the bypass import feature. The bool parameter in direct(bool, int) specifies whether the given CSV file needs to be ordered. The value true indicates that the file must be ordered. The int parameter specifies the maximum number of error rows allowed. |
| REMOTE_OSS | Specifies whether to read data from an OSS file system. This parameter is optional.
NoticeIf this parameter is specified, |
| file_name | The path and file name of the input file. file_name can be in one of the following formats:
NoteWhen you import a file from OSS, make sure that:
|
| table_name | The name of the table from which data is imported. Partitioned and non-partitioned tables are supported. |
| FIELDS | COLUMNS | The format of the field.
|
| LINES STARTING BY | The start character of the line. |
| LINES TERMINATED BY | The end character of the line. |
| IGNORE number { LINES | ROWS } | Specifies to ignore the first few lines. LINES indicates the first few lines of the file. ROWS indicates the first few rows of data specified by the field delimiter. By default, fields in the input file are mapped to columns in the destination table one by one. If the input file does not contain all the columns, the missing columns are filled based on the following mappings:
|
| column_name_var | The name of the imported column. |
Examples
Example 1: Import data from a file on an OBServer node
Set a global security path.
Notice
For security reasons, when you set the system variable
secure_file_priv, you can connect to the database only through a local socket to execute the SQL statement that modifies the global variable. For more information, see secure_file_priv.obclient> SET GLOBAL secure_file_priv = "/";Log out of the database.
Note
Because
secure_file_privis aGLOBALvariable, you need to run\qto exit for the settings to take effect.obclient> \qThe return result is as follows:
ByeConnect to the database again. Execute the
LOAD DATAstatement to import data.Perform normal import.
obclient> LOAD DATA INFILE '/home/admin/test.csv' INTO TABLE t1;Use the
APPENDhint to enable bypass import.LOAD DATA /*+ PARALLEL(4) APPEND */ INFILE '/home/admin/test.csv' INTO TABLE t1;
Example 2: Import data from an OSS file
Use the direct(bool,int) hint to enable bypass import. The import file can be an OSS file.
LOAD DATA /*+ direct(true,1024) parallel(16) */ REMOTE_OSS INFILE 'oss://antsys-oceanbasebackup/backup_rd/xiaotao.ht/lineitem2.tbl?host=***.oss-cdn.***&access_id=***&access_key=***' INTO TABLE tbl1 FIELDS TERMINATED BY ',';
References
- For more examples of the
LOAD DATAstatement, see Import data by using the LOAD DATA statement. - For more bypass import examples of the
LOAD DATAstatement, see Import data in bypass mode by using the LOAD DATA statement.