Create an external table

2025-07-29 02:03:09  Updated

This topic describes how to create an external table by using SQL statements. It also describes the prerequisites, overview, and considerations for creating an external table, and provides examples.

Overview

An external table is a logical table object. Its data is stored in an external storage system instead of the database.

For more information about external tables, see Overview of external tables.

Prerequisites

Before you create an external table, make sure that:

  • You have deployed an OceanBase cluster and created a MySQL-compatible tenant. For more information about how to deploy an OceanBase cluster, see Deployment overview.

  • You have connected to a MySQL-compatible tenant of OceanBase Database. For more information about how to connect to the database, see Overview of connection methods.

  • You have created a database. For more information, see Create a database.

  • You have the CREATE privilege. To view the privileges of the current user, perform the relevant operation described in View user privileges. If you do not have the required privilege, contact the administrator to grant you the necessary privileges. For more information, see Directly grant privileges.

Considerations

  • External tables support only query operations, but not DML operations.

  • If the external file accessed by the external table is deleted, the system does not return an error but instead returns empty rows.

  • If the external storage system is unavailable, an error occurs when you query the external table.

  • The query performance may be affected by network factors and file system factors because the data of external tables is stored in external data sources. Therefore, when you create an external table, select an appropriate data source and optimize the query statement to improve the query efficiency.

Create an external table by using the CLI

You can execute the CREATE EXTERNAL TABLE statement to create an external table.

Define an external table name

When you create an external table, you must name the external table. To avoid confusion and ambiguity, we recommend that you use specific naming rules or prefixes to distinguish external tables from regular tables when naming external tables. For example, you can add a suffix such as _csv to the name of an external table.

Here is an example:

When you create an external table that stores student information, you can name it students_csv.

CREATE EXTERNAL TABLE students_csv external_options

Notice

Since no other attributes are specified for the external table in the preceding SQL statement, the statement cannot be executed.

Define columns

Unlike regular tables, external tables cannot have constraints such as DEFAULT, NOT NULL, UNIQUE, CHECK, PRIMARY KEY, or FOREIGN KEY.

The column types supported for external tables are the same as those for regular tables. For more information about the data types supported in the MySQL-compatible mode of OceanBase Database, see Overview of data types.

Define LOCATION

The LOCATION option specifies the path for external tables. Generally, the data files of an external table are stored in a separate directory, which can contain subdirectories. When you create an external table, the table automatically collects all files in the directory.

OceanBase Database supports the following two path formats:

  • Local path format: LOCATION = '[file://] local_file_path'

    Notice

    For paths in the local path format, you must set the system variable secure_file_priv to specify an accessible path. For more information, see secure_file_priv.

  • Remote path format:

    • LOCATION = '{oss|S3}://$ACCESS_ID:$ACCESS_KEY@$HOST:s3_region/remote_file_path'. In this format, $ACCESS_ID, $ACCESS_KEY, and $HOST are required for accessing Alibaba Cloud OSS, AWS S3, and object storage services that support the S3 protocol. s3_region indicates the region selected when you use S3. These sensitive access information is encrypted and stored in system tables of the database.
    • LOCATION = 'hdfs://$ {hdfs_namenode_address}:${port}/PATH.localhost'. In this format, port indicates the port number of HDFS, and PATH indicates the directory path in HDFS.
      • With Kerberos authentication: LOCATION = 'hdfs://localhost:port/user?principal=xxx&keytab=xxx&krb5conf=xxx&configs=xxx'. Where:
        • principal: the authentication user.
        • keytab: the key file for user authentication.
        • krb5conf: the description file of the Kerberos environment for the user.
        • configs: the additional HDFS configuration. By default, it is empty, but in a Kerberos environment, this parameter is usually configured. For example, dfs.data.transfer.protection=authentication,privacy, which specifies the data transfer protection level as authentication and privacy.

Notice

When you use an object storage path, make sure that the values of the parameters in the object storage path are composed of uppercase and lowercase letters, numbers, and the following characters: /-_$+=. If the path contains other characters, the setting may fail.

Define FORMAT

  • FORMAT = ( TYPE = 'CSV'... ) specifies the CSV format for external files. The parameters are as follows:
    • TYPE: the type of the external file.

    • LINE_DELIMITER: the line delimiter for the CSV file. The default value is LINE_DELIMITER='\n'.

    • FIELD_DELIMITER: the field delimiter for the CSV file. The default value is FIELD_DELIMITER='\t'.

    • ESCAPE: the escape character for the CSV file, which can be only 1 byte in length. The default value is ESCAPE ='\'.

    • FIELD_OPTIONALLY_ENCLOSED_BY: the characters that enclose the field values in the CSV file. The default value is an empty string.

      Notice

      When the external data file contains NULL values (not the string NULL and not "NULL"), you must explicitly configure the FIELD_OPTIONALLY_ENCLOSED_BY parameter, and its value cannot be empty.

    • ENCODING: the character set encoding format of the file. For more information about the character sets supported in MySQL-compatible mode, see Character set and collation. If this parameter is not specified, the default value UTF8MB4 takes effect.

    • NULL_IF: the strings that are to be treated as NULL values. The default value is an empty string.

    • SKIP_HEADER: the number of lines to be skipped at the beginning of the file.

    • SKIP_BLANK_LINES: specifies whether to skip blank lines. The default value is FALSE, which specifies not to skip blank lines.

    • TRIM_SPACE: specifies whether to remove leading and trailing spaces from fields in the file. The default value is FALSE, which specifies not to remove leading and trailing spaces from fields in the file.

    • EMPTY_FIELD_AS_NULL: specifies whether to treat empty strings as NULL values. The default value is FALSE, which specifies not to treat empty strings as NULL values.

  • FORMAT = ( TYPE = 'PARQUET'... ) specifies the Parquet format for external files.
  • FORMAT = ( TYPE = 'ORC'... ) specifies the ORC format for external files.

(Optional) Define PATTERN

The PATTERN option is used to specify a regular expression pattern string for filtering files under the LOCATION directory. For each file path under the LOCATION directory, if it matches the pattern string, the external table will access this file; otherwise, it will skip the file. If this parameter is not specified, all files under the LOCATION directory are accessible by default. The external table will save the list of files that meet the PATTERN under the path specified by LOCATION in the database system table. During an external table scan, it will access external files based on this list.

(Optional) Define partitions of the external table

Automatically define partitions of the external table

The external table calculates and adds partitions based on the expression defined by the partition key. When querying, you can specify the value or range of the partition key. At this time, partition pruning will be performed, and the external table will only read the files under that partition.

Manually define partitions of the external table

When you need to manually add and delete partitions yourself, instead of letting the external table automatically manage partitions, you need to specify the PARTITION_TYPE = USER_SPECIFIED field.

Examples

Notice

IP addresses in sample commands are desensitized. Replace them with actual IP addresses during verification.

The following example describes how to create an external table in MySQL-compatible mode of OceanBase Database, where the external file is stored locally and remotely. The steps are as follows:

  1. Create an external file.

    Execute the following command to create a file named test_tbl1.csv in the /home/admin directory of the server where you want to log in to.

    [admin@xxx /home/admin]# vi test_tbl1.csv
    

    The content of the file is as follows:

    1,'Emma','2021-09-01'
    2,'William','2021-09-02'
    3,'Olivia','2021-09-03'
    
  2. Set the path of the imported file.

    Notice

    For security reasons, when you set the system variable secure_file_priv, you can connect to the database only by using a local socket to execute the SQL statement that changes the global variable. For more information, see secure_file_priv.

    1. Execute the following command to log in to the server where the OBServer node is located.

      ssh admin@10.10.10.1
      
    2. Execute the following command to connect to the mysql001 tenant by using a local Unix socket.

      obclient -S /home/admin/oceanbase/run/sql.sock -uroot@mysql001 -p******
      
    3. Execute the following SQL command to set the import path to /home/admin.

      SET GLOBAL secure_file_priv = "/home/admin";
      
  3. Reconnect to the mysql001 tenant.

    Here is an example:

    obclient -h10.10.10.1 -P2881 -uroot@mysql001 -p****** -A -Dtest
    
  4. Execute the following SQL command to create an external table named test_tbl1_csv.

    CREATE EXTERNAL TABLE test_tbl1_csv ( 
        id INT, 
        name VARCHAR(50), 
        c_date    DATE
        )
        LOCATION = '/home/admin'
        FORMAT = (
          TYPE = 'CSV'
          FIELD_DELIMITER = ','
          FIELD_OPTIONALLY_ENCLOSED_BY ='\''
          )
        PATTERN = 'test_tbl1.csv';
    
  5. Execute the following SQL command to view the data of the external table named test_tbl1_csv.

    SELECT * FROM test_tbl1_csv;
    

    The return result is as follows:

    +------+---------+------------+
    | id   | name    | c_date     |
    +------+---------+------------+
    |    1 | Emma    | 2021-09-01 |
    |    2 | William | 2021-09-02 |
    |    3 | Olivia  | 2021-09-03 |
    +------+---------+------------+
    3 rows in set
    

Example 2

To create an HDFS external table, make sure that OceanBase Database is used in an environment with JAVA SDK. For more information about deploying the JAVA SDK environment, see Deploy the Java SDK environment for OceanBase Database.

Below are examples of creating an HDFS external table:

With Kerberos:

CREATE EXTERNAL TABLE ext_data ( 
  id INT, 
  name VARCHAR(50), 
  c_date    DATE
)
LOCATION = 'hdfs://localhost:8020/user?principal=principal_str&keytab=/path/to/keytab&krb5conf=/path/to/krb5conf_file&configs=xxx=xxx#xxx=xxx'
FORMAT = (
TYPE = 'CSV'
FIELD_DELIMITER = ','
FIELD_OPTIONALLY_ENCLOSED_BY ='"'
)
PATTERN = 'data.csv';

where:

  • principal: specifies the login authentication user.
  • keytab: specifies the key file for user authentication.
  • krb5conf: specifies the description file for the Kerberos environment.
  • configs: specifies the additional HDFS configurations. The default value is empty. However, in a Kerberos environment, this parameter is usually configured with a value. For example, dfs.data.transfer.protection=authentication,privacy specifies that the data transfer protection level is authentication and privacy.

Without Kerberos:

CREATE EXTERNAL TABLE ext_data ( 
  id INT, 
  name VARCHAR(50), 
  c_date    DATE
)
LOCATION = 'hdfs://localhost:8020/user'
FORMAT = (
  TYPE = 'CSV'
  FIELD_DELIMITER = ','
  FIELD_OPTIONALLY_ENCLOSED_BY ='\''
  )
PATTERN = 'test_tbl1.csv';

References

  • You can delete an external table in the same way as you delete a regular table. For more information about how to delete a table, see Delete a table.
  • For more information about how to view and update the information about an external file, see Manage external files.

Contact Us