You can execute the CREATE EXTERNAL TABLE statement to create an external table. When you create an external table, you must specify the path to the data files and the format of the data files to read data from external files.
Privilege requirements
To create an external table, the current user must have the CREATE privilege. To view the privileges of the current user, see View user privileges.
Create an external table
The SQL statement for creating an external table is as follows:
CREATE EXTERNAL TABLE table_name
( col_name col_type [AS (metadata$filecol{N})]
[ , col_name col_type [AS (metadata$filecol{N})] ]
[ , ... ] )
LOCATION = '<string>'
FORMAT = (
TYPE = 'CSV'
LINE_DELIMITER = '<string>' | <expr>
FIELD_DELIMITER = '<string>' | <expr>
ESCAPE = '<character>' | <expr>
FIELD_OPTIONALLY_ENCLOSED_BY = '<character>' | <expr>
ENCODING = 'charset'
NULL_IF = ('<string>' | <expr>, '<string>' | <expr> ...)
SKIP_HEADER = <int>
SKIP_BLANK_LINES = { TRUE | FALSE }
TRIM_SPACE = { TRUE | FALSE }
EMPTY_FIELD_AS_NULL = { TRUE | FALSE }
)
[ PATTERN = '<regex_pattern>' ]
The parameters are described as follows:
col_name col_type [AS (metadata$filecol{N})]: specifies the columns. You can useAS (metadata$filecol{N})to manually define column mapping.The columns supported by external tables are the same as those supported by regular tables. For more information about the data types supported in MySQL mode of OceanBase Database, see Overview.
By default, the columns in an external file are automatically mapped to those defined in the external table in order. That is, the first column in the external table is mapped to the first column in the external file, and so on.
For example, in the following example, the
C1column in theext_t1external table is automatically mapped to the first column in the external file; and theC2column is automatically mapped to the second column in the external file.CREATE EXTERNAL TABLE ext_t1 ( C1 int, C2 int ) LOCATION = 'oss://$ACCESS_ID:$ACCESS_KEY@$HOST/tpch_1g_data/lineitem/' FORMAT = ( TYPE = 'CSV' FIELD_DELIMITER = '|' );If the columns in the external file are in a different order from those defined in the external table, you can use a pseudo-column in the format of
metadata$filecol{N}to specify that the Nth column in the external table is mapped to the Nth column in the external file. Note that the columns in the file are numbered starting from 1.For example, in the following example,
C1 int AS (metadata$filecol2)specifies that theC1column in theext_t2external table is mapped to the second column in the external file; andC2 int AS (metadata$filecol4)specifies that theC2column in theext_t2external table is mapped to the fourth column in the external file.CREATE EXTERNAL TABLE ext_t2 ( C1 int AS (metadata$filecol2), C2 int AS (metadata$filecol4) ) LOCATION = 'oss://$ACCESS_ID:$ACCESS_KEY@$HOST/tpch_1g_data/lineitem/' FORMAT = ( TYPE = 'CSV' FIELD_DELIMITER = '|' );Notice
If you need to manually define column mapping, the automatic column mapping feature will fail and you need to manually define the mapping for all columns.
LOCATION = '<string>': specifies the path where the external file is stored. Generally, the data files of an external table are stored in a dedicated directory, which can contain subdirectories. When you create an external table, the table automatically collects all files in the specified directory.Two formats are supported:
Local location format:
LOCATION = '[file://] local_file_path'local_file_path: can be a relative path or an absolute path. If you specify a relative path, the current directory must be the installation directory of OceanBase Database. In an absolute path, the path can contain not only the drive letter but also the root directory.Notice
The
local_file_pathparameter must indicate a directory and cannot indicate a single file. If you want to specify a single file, you need to specify the parent directory of the file in theLOCATIONparameter and specify the file itself in thePATTERNparameter.For scenarios that use the local location format, when you configure the OceanBase database to have access to a path by using the system variable
secure_file_priv, the value ofsecure_file_privmust be the parent directory oflocal_file_path. In other words,local_file_pathmust be a subdirectory ofsecure_file_priv.The system variable
secure_file_privis used to specify the paths that OceanBase Database can access when you import data to or export data from a file. For more information about thesecure_file_privvariable, see secure_file_priv.
Remote location format:
LOCATION = '{oss|cos}://$ACCESS_ID:$ACCESS_KEY@$HOST/remote_file_path'$ACCESS_ID,$ACCESS_KEY, and$HOSTare the access information required to access Alibaba Cloud OSS or Tencent Cloud COS. These sensitive access information is stored in encrypted form in system tables of the database.Notice
When you use an object storage path, separate the parameters in the path with the
&symbol. Make sure that the values of the parameters you entered contain only uppercase and lowercase letters, digits, and the following symbols:/-_$+=and wildcard. If you enter any other characters, the setting may fail.
FORMAT: specifies the format of external files.TYPE: specifies the type of external files. Only CSV files are supported.LINE_DELIMITER: specifies the line delimiter in the file. If this parameter is not specified, the default valueLINE_DELIMITER='\n'is used.FIELD_DELIMITER: specifies the column delimiter in the file. If this parameter is not specified, the default valueFIELD_DELIMITER='\t'is used.ESCAPE: specifies the escape character in the file. For example,ESCAPE ='*'specifies to use an asterisk (*) as the escape character, instead of the default escape character (). If this parameter is not specified, the default valueESCAPE ='\'is used.FIELD_OPTIONALLY_ENCLOSED_BY: specifies the characters that enclose the field values in the file. For example,ESCAPE = '"'specifies to enclose the field values in double quotation marks. If this parameter is not specified, the default value is an empty string.ENCODING: specifies the character set encoding format of the file. For more information about the character sets supported in MySQL mode of OceanBase Database, see Character sets. If this parameter is not specified, the default value UTF8MB4 is used.NULL_IF: specifies the strings that should be treated asNULL. If this parameter is not specified, the default value is an empty string.SKIP_HEADER: specifies the number of lines to skip in the file header. If this parameter is not specified, the default value is 0, which means that the file header is not skipped.SKIP_BLANK_LINES: specifies whether to skip blank lines. If this parameter is not specified, the default value isFALSE.TRIM_SPACE: specifies whether to remove leading and trailing spaces from fields in the file. If this parameter is not specified, the default value isFALSE.EMPTY_FIELD_AS_NULL: specifies whether to treat empty strings asNULL. If this parameter is not specified, the default value isFALSE.
PATTERN: specifies a regular pattern string to filter files in the directory specified byLOCATION. For each file in the directory specified byLOCATION, if the file matches the pattern string, the external table can access the file. Otherwise, the external table will skip the file. If this parameter is not specified, the external table can access all files in the directory specified byLOCATIONby default.
Assume that a file named data.csv exists in the /home/admin/oceanbase/ directory on the local server. The file contains the following data.
1,"lin",98
2,"hei",90
3,"ali",95
On the OBServer node, connect to the MySQL tenant of the cluster as the tenant administrator through the local Unix socket.
Here is an example of how to connect:
obclient -S /home/admin/oceanbase/run/sql.sock -uroot@sys -p********The specific procedure for connecting to OceanBase Database through the local Unix socket is described in secure_file_priv.
Specify the path
/home/admin/oceanbase/as the directory from which the database can access.SET GLOBAL secure_file_priv = "/home/admin/oceanbase/";After the command is executed, you need to restart the session for the modification to take effect.
Reconnect to the database and create an external table named
ext_t3.CREATE EXTERNAL TABLE ext_t3(id int, name char(10),score int) LOCATION = '/home/admin/oceanbase/' FORMAT = ( TYPE = 'CSV' FIELD_DELIMITER = ',' FIELD_OPTIONALLY_ENCLOSED_BY ='"' ) PATTERN = 'data.csv';
After the external table is created, you can use the SHOW CREATE TABLE statement to view the table definition, just like you do with regular tables.
SHOW CREATE TABLE ext_t3;
The query result is as follows:
+--------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table |
+--------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ext_t3 | CREATE EXTERNAL TABLE `ext_t3` (
`id` int(11) GENERATED ALWAYS AS (metadata$filecol1),
`name` char(10) GENERATED ALWAYS AS (metadata$filecol2),
`score` int(11) GENERATED ALWAYS AS (metadata$filecol3)
)
LOCATION='file:///home/admin/oceanbase/'
PATTERN='data.csv'
FORMAT (
TYPE = 'CSV',
FIELD_DELIMITER = ',',
FIELD_OPTIONALLY_ENCLOSED_BY = '"',
ENCODING = 'utf8mb4'
)DEFAULT CHARSET = utf8mb4 ROW_FORMAT = DYNAMIC COMPRESSION = 'zstd_1.3.8' REPLICA_NUM = 1 BLOCK_SIZE = 16384 USE_BLOOM_FILTER = FALSE TABLET_SIZE = 134217728 PCTFREE = 0 |
+--------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set
You can also access the external table, just like you do with regular tables. To query an external table, the system reads the external file based on the driver of the external table and parses the file based on the file format. Then, the system converts the parsed data into internal data types of OceanBase Database and returns the data as rows. Here is an example of how to query the external table created earlier named lineitem.
SELECT * FROM ext_t3;
The query result is as follows:
+----+------+-------+
| id | name | score |
+----+------+-------+
| 1 | lin | 98 |
| 2 | hei | 90 |
| 3 | ali | 95 |
+----+------+-------+
3 rows in set
In addition, you can also combine an external table with regular tables for query operations. Assume that the current database contains a regular table named info, which has the following data:
+------+--------+------+
| name | sex | age |
+------+--------+------+
| lin | male | 8 |
| hei | male | 9 |
| li | female | 8 |
+------+--------+------+
3 rows in set
Here is an example of how to combine the external table named ext_t3 and the regular table named info for query operations.
SELECT info.* FROM info, ext_t3 WHERE info.name = ext_t3.name AND ext_t3.score > 90;
The query result is as follows:
+------+--------+------+
| name | sex | age |
+------+--------+------+
| lin | male | 8 |
| li | female | 8 |
+------+--------+------+
2 rows in set
For more information about query operations, see Read data.
Considerations
An external table can only be queried, and you cannot perform DML operations on it.
When you query an external table, if the external file accessed by the external table has been deleted, the system does not return an error, but returns an empty result set.
If the external storage system fails, an error is returned when you query the external table because the file managed by the external storage system does not exist.
Next steps
When you create an external table, the system saves the file list that matches the specified PATTERN in the LOCATION specified in the LOCATION clause to a system table of OceanBase Database. The system uses this file list to locate external files when the system scans the external table. If you add new files to the external directory, you must perform an operation to add the new files to the file list of the external table. For more information, see Manage external files of an external table.
You can use the DROP TABLE statement to drop an external table, just like you drop a regular table. For more information, see Drop a table.