CREATE EXTERNAL TABLE|V4.4.2|OceanBase Database| docs|Distributed Database

CREATE EXTERNAL TABLE

Last Updated：2026-04-27 06:29:11 Updated

Purpose

This statement is used to create a new external table in the database.

External tables are a key feature of a database management system. Typically, tables in a database are stored in the database's storage space, while the data of an external table is stored in an external storage service.

When creating an external table, you need to define the file path and file format of the data. After that, users can read the data from the external storage service through the external table. External tables are read-only and can be used in query statements, but DML operations cannot be performed on them. External tables do not support defining constraints or creating indexes.

Privilege requirements

To execute the CREATE EXTERNAL TABLE statement, the current user must have the CREATE privilege. For more information about the privileges of OceanBase Database, see Privilege types in MySQL mode.
When you create an external table by referencing a location object, you must have the READ privilege on the location object.

Here is an example:

Grant the READ privilege on the test_location_local location object to the user001 user.
```
GRANT READ ON LOCATION test_location_local TO user001 WITH GRANT OPTION;
```

Syntax

CREATE EXTERNAL TABLE table_name (column_definition_list)
    LOCATION = {'file_path' | @location_name['/path']}
    {FORMAT = (format_type_options)
     | PROPERTIES = (properties_type_options)}
    [PARTITION BY (column_name [, column_name ...])]
    [PARTITION_TYPE = USER_SPECIFIED]
    [PATTERN = 'regex_pattern']
    [AUTO_REFRESH = 'xxx'];

column_definition_list:
    column_definition [, column_definition ...]

column_definition:
    column_name column_type [AS metadata$filecol{N}]

format_type_options:
    type_csv_option
    | type_parquet_option
    | type_orc_option

type_csv_option:
    TYPE = 'CSV'
    LINE_DELIMITER = '<string>' | <expr>
    FIELD_DELIMITER = '<string>' | <expr>
    ESCAPE = '<character>' | <expr>
    FIELD_OPTIONALLY_ENCLOSED_BY = '<character>' | <expr>
    ENCODING = 'charset'
    NULL_IF = ('<string>' | <expr>, '<string>' | <expr> ...)
    SKIP_HEADER = <int>
    SKIP_BLANK_LINES = {TRUE | FALSE}
    TRIM_SPACE = {TRUE | FALSE}
    EMPTY_FIELD_AS_NULL = {TRUE | FALSE}
    IGNORE_LAST_EMPTY_COLUMN = {TRUE | FALSE}
    COMPRESSION = {GZIP | ZSTD | DEFLATE | NONE | SNAPPY_BLOCK}
    PARALLEL_PARSE_ON_SINGLE_FILE = {TRUE | FALSE}
    PARALLEL_PARSE_FILE_SIZE_THRESHOLD = <int>
    MAX_ROW_LENGTH = <int>

type_parquet_option:
    TYPE = 'PARQUET'

type_orc_option:
    TYPE = 'ORC'

properties_type_options:
    type_odps_option

type_odps_option:
    TYPE = 'ODPS'
    ACCESSID = '<string>'
    ACCESSKEY = '<string>'
    ENDPOINT = '<string>',
    PROJECT_NAME = '<string>',
    SCHEMA_NAME = '<string>',
    TABLE_NAME = '<string>',
    QUOTA_NAME = '<string>',
    COMPRESSION_CODE = '<string>',
    API_MODE = {"tunnel_api" | "storage_api"},
    SPLIT = {"byte" | "row"}

Parameters

Parameter	Description
table_name	The name of the external table to be created.
column_name	The name of the external table column. By default, the columns in the file are automatically mapped to the columns defined in the external table in sequence.
column_type	The type of the external table column. Constraints such as `DEFAULT`, `NOT NULL`, `UNIQUE`, `CHECK`, `PRIMARY KEY`, and `FOREIGN KEY` cannot be defined.
AS metadata$filecol{N}	Manually specifies the column mapping. When the order of the columns in the file does not match the order of the columns defined in the external table, you can use the pseudo column `metadata$filecol{N}` to specify the mapping between the external table column and the Nth column in the file. For example, `c2 INT AS (metadata$filecol4)` indicates that the `c2` column in the external table corresponds to the fourth column in the file. Note that if you specify manual column mapping, automatic mapping will be disabled, and all columns must be manually defined.
LOCATION	Specifies the path where the external table files are stored. Typically, the data files of the external table are stored in a separate directory, which may contain subdirectories. When creating the external table, all files in this directory are automatically collected. Valid values: `file_path`: the specific path of the external table file. For more information, see file_path. `@location_name['/path']`: specifies to create an external table by referencing a Location object. `['/path']` is an optional parameter that indicates a subdirectory. For more information about creating a Location object, see CREATE LOCATION. Note For OceanBase Database V4.4.x, the `@location_name['/path']` parameter is supported starting from V4.4.1.
FORMAT = (format_type_options)	Specifies the properties of the external file format. Use `TYPE` to specify the export format as `CSV`, `PARQUET`, or `ORC`. `TYPE` cannot be empty. For more information, see format_type_options.
PROPERTIES = (properties_type_options)	Specifies the properties of the external file format. Use `TYPE` to specify the export format as `ODPS`. `TYPE` cannot be empty. For more information, see properties_type_options.
PATTERN	Specifies a regular expression pattern to filter files in the `LOCATION` directory. For each file path in the `LOCATION` directory, if the path matches the pattern, the external table accesses the file; otherwise, it skips the file. If this parameter is not specified, the external table accesses all files in the `LOCATION` directory by default. The external table saves the list of files that match the `PATTERN` in the database system table when the `LOCATION` path is specified. During external table scans, the external table accesses the external files based on this list.
PARTITION_TYPE = USER_SPECIFIED	Specifies `PARTITION_TYPE = USER_SPECIFIED` if you want to manually add and delete partitions instead of letting the external table automatically manage partitions.
AUTO_REFRESH = 'xxx'	Specifies the auto-refresh mode for the external table. Valid values: `OFF`: the default value, which disables auto-refresh. `INTERVAL`: allows the external table to be managed by `dbms_external_table` for auto-refresh rules. `IMMEDIATE`: refreshes the META of the external table every time an SQL statement accesses it.

file_path

The file path of an external table has the following format:

If the file is stored locally, the LOCATION parameter is specified in the LOCATION = '[file://] local_file_path' format, where local_file_path can be a relative or absolute path. If the value is a relative path, the current directory must be the installation directory of OceanBase Database. The secure_file_priv parameter specifies the file path that an OBServer node can access. local_file_path must be a subpath of the secure_file_priv path.
If the file is stored remotely, the LOCATION parameter is specified in the following format:

Notice

When you use an object storage path, the parameters of the object storage path are separated by the & symbol. Make sure that the parameter values contain only uppercase and lowercase letters, numbers, /-_$+=, and wildcards. If you enter other characters, the setting may fail.
- If the file is stored on Alibaba Cloud OSS or AWS S3, the LOCATION parameter is specified in the LOCATION = '{oss\|s3}://$ACCESS_ID:$ACCESS_KEY@$HOST:s3_region/remote_file_path' format, where $ACCESS_ID, $ACCESS_KEY, and $HOST are the access information required to access Alibaba Cloud OSS, AWS S3, or an object storage service that is compatible with the S3 protocol, and s3_region is the region information selected when you use S3. The sensitive access information is stored in the system tables of the database in an encrypted manner.
- If the file is stored on HDFS, the LOCATION parameter is specified in the following format:
  - If you access the cluster by using the address of a single-node NameNode (NN), the LOCATION parameter is specified in the LOCATION = hdfs://localhost:port/PATH format, where localhost specifies the address of HDFS, port specifies the port number of HDFS, and PATH specifies the file path in HDFS.
    - If Kerberos authentication is used, the LOCATION parameter is specified in the LOCATION = 'hdfs://localhost:port/user?principal=xxx&keytab=xxx&krb5conf=xxx&configs=xxx' format, where:
      - principal: specifies the user for login authentication.
      - keytab: specifies the path of the key file for user authentication.
      - krb5conf: specifies the path of the description file for the Kerberos environment.
      - configs: specifies additional HDFS configuration items. By default, this parameter is empty. However, if Kerberos authentication is used, this parameter usually has a value and must be configured. For example, you can set dfs.data.transfer.protection=authentication,privacy to specify the data transmission protection level as authentication and privacy.
  - If you access the cluster by using the logical naming service of Hadoop high availability (HA), the LOCATION parameter is specified in the LOCATION = hdfs://nameserviceID/PATH format, where nameserviceID specifies the ID of the logical naming service of Hadoop HA, and PATH specifies the file path.
    
    Note
    
    Make sure that the client side of the OBServer node contains the nameservice definition and failover strategy of the HA cluster.
    - If Kerberos authentication is used, the LOCATION parameter is specified in the LOCATION = 'hdfs://nameserviceID/PATH?principal=xxx&keytab=xxx&krb5conf=xxx&configs=dfs.data.transfer.protection=${string}#dfs.nameservices=${nameservice id}#dfs.ha.namenodes.${nameservice id}=${namenode1}, ${namenode2}#dfs.namenode.rpc-address.${nameservice id}.${namenode1}=${namenode 1 address}#dfs.namenode.rpc-address.${nameservice id}.${namenode2}=${namenode 2 address}#dfs.ha.automatic-failover.enabled.${nameservice id}=true#dfs.client.failover.proxy.provider.${nameservice id}=org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider' format, where:
      - principal: specifies the user for login authentication. Set this parameter to the pricipal of a non-primary NN.
      - keytab and krb5conf: are the same as those in the single-node NN configuration.
      - configs: specifies additional HDFS configuration items. If multiple configuration items are to be set, only HA and security-related configuration items are involved:
        
        dfs.data.transfer.protection=${string}: aligns with the dfs.data.transfer.protection configuration of the cluster.
        
        dfs.nameservices=${nameservice id}: specifies the namesevice (alias) of the current HA cluster.
        
        dfs.ha.namenodes.${nameservice id}=${namenode1}, ${namenode2}: specifies the list of standby IDs of the namenodes in the HA cluster.
        
        dfs.namenode.rpc-address.${nameservice id}.${namenode1}=${namenode 1 address}: specifies the specific namenode of namenode1 to facilitate client routing.
        
        dfs.namenode.rpc-address.${nameservice id}.${namenode2}=${namenode 2 address}: specifies the specific namenode of namenode2 to facilitate client routing.
        
        dfs.ha.automatic-failover.enabled.${nameservice id}=true: enables the HA cluster to automatically obtain a available namenode to respond to requests.
        
        dfs.client.failover.proxy.provider.${nameservice id}=org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider: specifies the logic class for the HA cluster to switch between primary and standby nodes. You can also customize and upload the logic required by the HA cluster.
    Notice
    
    Some HA-related configuration items are bound to namespace. For example, in the following sample three, mycluster is bound to the HA-related configuration items. Make sure that the related configuration items are set properly.

format_type_options

TYPE = 'CSV'：specifies the format of the external file as CSV. It also includes the following fields:
- LINE_DELIMITER：specifies the line delimiter for the CSV file. The default value is LINE_DELIMITER='\n'.
- FIELD_DELIMITER：specifies the field delimiter for the CSV file. The default value is FIELD_DELIMITER='\t'.
- ESCAPE：specifies the escape character for the CSV file, which must be one byte. The default value is ESCAPE ='\'.
- FIELD_OPTIONALLY_ENCLOSED_BY：specifies the character used to enclose field values in the CSV file. The default value is empty.
  
  Notice
  
  If the external table data file contains NULL values (not the string NULL, i.e., not "NULL"), you must explicitly configure the FIELD_OPTIONALLY_ENCLOSED_BY parameter, and the value of this parameter cannot be empty.
- ENCODING：specifies the character set encoding format of the file. For information about all character sets supported in MySQL mode, see Character sets. If this option is not specified, the default value is UTF8MB4.
- NULL_IF：specifies the string to be treated as NULL. The default value is empty.
- SKIP_HEADER：specifies the number of header rows to skip.
- SKIP_BLANK_LINES：specifies whether to skip blank lines. The default value is FALSE, indicating that blank lines are not skipped.
- TRIM_SPACE：specifies whether to remove leading and trailing spaces from fields in the file. The default value is FALSE, indicating that leading and trailing spaces are not removed.
- EMPTY_FIELD_AS_NULL：specifies whether to treat empty strings as NULL. The default value is FALSE, indicating that empty strings are not treated as NULL.
- IGNORE_LAST_EMPTY_COLUMN：specifies whether to ignore the last empty field at the end of a line in the file. The default value is TRUE, indicating that the last empty field is ignored.
- COMPRESSION：optional. Specifies the compression format of the file. Valid values:
  - GZIP/DEFLATE: GZIP-compressed file.
  - ZSTD: ZSTD-compressed file.
  - NONE: indicates that the file is not compressed (default value).
  - SNAPPY_BLOCK: HADOOP-SNAPPY-compressed file.
    
    Note
    
    For V4.4.2, the compression format SNAPPY_BLOCK is supported starting from V4.4.2 BP1.
- PARALLEL_PARSE_ON_SINGLE_FILE: specifies whether to enable parallel parsing of a single CSV file. If this option is not specified, the default value is TRUE.
- PARALLEL_PARSE_FILE_SIZE_THRESHOLD: specifies the file size threshold for parallel parsing, in bytes. The default value is 256 MB.
- MAX_ROW_LENGTH: specifies the maximum length of a single row of data, in bytes. The default value is 2 MB. This option is used for the boundary determination stage of parallel parsing of a single CSV file and does not affect the accuracy of data reading.
Note

For V4.4.2, the parameters PARALLEL_PARSE_ON_SINGLE_FILE, PARALLEL_PARSE_FILE_SIZE_THRESHOLD, and MAX_ROW_LENGTH are supported starting from V4.4.2 BP1.
TYPE = 'PARQUET': specifies the format of the external file as PARQUET.
TYPE = 'ORC': specifies the format of the external file as ORC.

properties_type_options

TYPE = 'ODPS': specifies the format of the external file as ODPS. It also includes the following fields:
- ACCESSID: specifies the AccessKey ID of the Alibaba Cloud account for identity authentication.
- ACCESSKEY: specifies the AccessKey secret corresponding to the AccessKey ID for identity verification.
- ENDPOINT: specifies the connection address of the ODPS service.
- PROJECT_NAME: specifies the name of the ODPS project to be accessed.
- SCHEMA_NAME: optional. Specifies the schema name in ODPS.
- TABLE_NAME: specifies the name of the target table in ODPS.
- QUOTA_NAME: optional. Specifies the quota to be used.
- COMPRESSION_CODE: optional. Specifies the compression format of the data source. Valid values: ZLIB, ZSTD, LZ4, and ODPS_LZ4. If this option is not specified, compression is not enabled.
- API_MODE: specifies the API mode for calling ODPS. Valid values:
  - tunnel_api (default):
    - No special network configuration is required. This mode is applicable to all deployment scenarios and does not require OceanBase Database to be deployed in the same VPC as MaxCompute.
    - No additional MaxCompute permissions are required. Authentication can be completed by providing the AccessID and AccessKey, without the need to enable the MaxCompute Storage API.
    - Applicable environments:
      - OceanBase Database and MaxCompute are not deployed in the same VPC.
      - The MaxCompute Storage API is not enabled.
      - Data transmission has low latency requirements.
  - storage_api:
    - Network dependency: OceanBase Database and MaxCompute must be deployed in the same VPC to achieve low-latency, high-throughput data transmission.
    - Permission dependency: The Storage API must be enabled in MaxCompute, and the AccessKey must have the necessary permissions.
    - Applicable environments:
      - OceanBase Database and MaxCompute are in the same VPC.
      - The MaxCompute Storage API is enabled.
      - The data volume is extremely large or the real-time requirements are high.
- SPLIT: specifies whether to split tasks by byte or row when using storage_api. If the byte sizes of the rows in a table vary significantly, set SPLIT to byte; otherwise, set it to row.

Considerations

When an external file is deleted, the external table ignores the file if it no longer exists in the file list.
When an external file is modified, the external table accesses the latest data of the external file. If the modification of the external file and the query of the external table are performed concurrently, unexpected results may occur. Therefore, avoid modifying the external file while querying the external table.
When a new file is added to the external directory, the external table only accesses the files in the file list. If you want to add the new file to the file list of the external table, you need to execute the update external table file operation.

Examples

Example 1

Prepare data. First, set the secure_file_priv parameter to /home/admin/. Then, place the CSV file extdata.csv that contains the data for the external table in the /home/admin/test directory of the OBServer node to which the current local connection is established.

Here is an example of setting the global secure file path:
```
obclient> SET GLOBAL secure_file_priv = "/home/admin/";
Query OK, 0 rows affected
obclinet> \q
Bye
```
Note

Since secure_file_priv is a GLOBAL variable, you must execute \q to make the setting take effect.

The content of the CSV file is as follows:
```
1,'Dave','Smith','dsmith@outlook.com','friend',32
2,'Xena','Johnson','xjonson@outlook.com','contact',45
3,'Fred','Jackon','fjackson@outlook.com','co-worker',19
4,'Alma','Tyler','atyler@outlook.com','friend',53
```

obclient> CREATE EXTERNAL TABLE contacts (
    id    INT,
    firstname  VARCHAR(100),
    lastname   VARCHAR(100),
    email      VARCHAR(255),
    category   CHAR(30),
    age        NUMBER
   )
   LOCATION = '/home/admin/test'
   FORMAT = (
     TYPE = 'CSV'
     FIELD_DELIMITER = ','
     FIELD_OPTIONALLY_ENCLOSED_BY ='\''
    )
  PATTERN = 'extdata.csv';

Query the data in the external table contacts.

obclient> SELECT * FROM contacts;
+----+-----------+----------+----------------------+-----------+------+
| id | firstname | lastname | email                | category  | age  |
+----+-----------+----------+----------------------+-----------+------+
|  1 | Dave      | Smith    | dsmith@outlook.com   | friend    |   32 |
|  2 | Xena      | Johnson  | xjonson@outlook.com  | contact   |   45 |
|  3 | Fred      | Jackon   | fjackson@outlook.com | co-worker |   19 |
|  4 | Alma      | Tyler    | atyler@outlook.com   | friend    |   53 |
+----+-----------+----------+----------------------+-----------+------+
4 rows in set

Create an external table test_tbl1_csv that specifies the COMPRESSION, PARALLEL_PARSE_ON_SINGLE_FILE, PARALLEL_PARSE_FILE_SIZE_THRESHOLD, and MAX_ROW_LENGTH parameters in the FORMAT option.

obclient> CREATE EXTERNAL TABLE test_tbl1_csv (
    col1 INT,
    col2 VARCHAR(100),
    col3 VARCHAR(100),
    col4 VARCHAR(255),
    col5 CHAR(30),
    col6 NUMBER)
    LOCATION = '/home/admin/test'
    FORMAT = (
        TYPE = 'CSV'
        FIELD_DELIMITER = ','
        FIELD_OPTIONALLY_ENCLOSED_BY ='\''
        COMPRESSION = SNAPPY_BLOCK
        PARALLEL_PARSE_ON_SINGLE_FILE = TRUE
        PARALLEL_PARSE_FILE_SIZE_THRESHOLD = 1024
        MAX_ROW_LENGTH = 1024)
    PATTERN = 'extdata.csv';

Example 2

Assume that there is a CSV file in a specific path on HDFS. The content of the file is as follows:

$hdfs dfs -cat /user/test_tbl1.csv
1,'Emma','2021-09-01'
2,'William','2021-09-02'
3,'Olivia','2021-09-03'

Create an HDFS external table.

If Kerberos authentication is not enabled in the target HDFS environment
```
CREATE EXTERNAL TABLE test_tbl1_csv_mysql (
  id INT,
  name VARCHAR(50),
  c_date    DATE
)
LOCATION = 'hdfs://${hadoop namenode hostname}:${hadoop namenode port}/user'
FORMAT = (
TYPE = 'CSV'
FIELD_DELIMITER = ','
FIELD_OPTIONALLY_ENCLOSED_BY ='\''
)
PATTERN = 'test_tbl1.csv';
```
Notice

Here, hadoop namenode hostname and hadoop namenode port refer to the host name and port of the HDFS node, which need to be replaced with actual values.

If Kerberos authentication is enabled in the target HDFS environment
1. Environment preparation Since the current HDFS integration uses the JNI framework, the corresponding environment must support the deployment of a Java environment. If OBServer is deployed across multiple nodes, the corresponding OBServer nodes must also be configured with the following environment settings, not just a single node.
  
  Step 1: Deploy and configure the Java environment
  
  You can download the latest version of OpenJDK 8 from openjdk.
  
  Step 2: Decompress the installation package
  
  After decompressing, you will see the following files:
```
$ ls $JAVA_HOME
ASSEMBLY_EXCEPTION  bin  include  jre  lib  LICENSE  man  NOTICE  release  sample  src.zip  THIRD_PARTY_README
```
  Step 3: Deploy and configure the HDFS.so dynamic library
  
  You can download the required dependencies from https://mirrors.aliyun.com/oceanbase/. After downloading, install it using the following command:
```
sudo rpm -Uvh
devdeps-hdfs-sdk-3.3.6-xxxxx.xxx.xxx.rpm
```
  Run the following command to check if the installation meets the expected requirements. The installation should include the libhdfs.so and libhdfs.so.0.0.0 files, and the corresponding symbolic links should be valid.
```
$ll /usr/local/oceanbase/deps/devel/lib
total 376
lrwxrwxrwx 1 root root     16 Dec 24 19:49 libhdfs.so -> libhdfs.so.0.0.0
-rwxr-xr-x 1 root root 384632 Dec 24 19:09 libhdfs.so.0.0.0
```
  Step 4: Deploy and configure the jar package path:
  1. Download the jar package: Visit Aliyun OceanBase Mirror to download the jar package.
    Notice
    - For OceanBase Database V4.3.5 BP1 and earlier versions: Please download the 1.0.0 version of the devdeps-java-extensions RPM installation package.
    - For OceanBase Database V4.3.5 BP2 and later versions: Please download the 1.0.1 version of the devdeps-java-extensions RPM installation package.
    - For OceanBase Database V4.4.0: Please download the 1.0.1 version of the devdeps-java-extensions RPM installation package.
    - For OceanBase Database V4.4.1: Please download the 1.0.2 version of the devdeps-java-extensions RPM installation package.
  2. Install the jar package: After downloading the required jar package, you can execute the following command to decompress and install it:
```
sudo rpm -Uvh devdeps-java-extensions-x.x.x-xxxxxxxxxxxx.xxx.xxxxxx.rpm
```
  3. Check if the installation meets the expected requirements:
```
$ll /home/admin/oceanbase/jni_packages
total 52756
drwxr-sr-x 4 root root     4096 Dec 24 20:25 hadoop
drwxr-xr-x 3 root root     4096 Dec 24 20:25 lib
-rw-r--r-- 1 root root 54008720 Dec 24 19:52 oceanbase-odps-connector-jar-with-dependencies.jar
```
2. Start the OBSERVER
  
  Before using the HDFS external table, you need to configure the corresponding OBServer node. The configuration steps are as follows:
  
  Note
  
  Since the current OBServer does not support flexible and immediate configuration changes for JNI-related settings, any changes to Java environment variables require a restart of OBServer to take effect. All the following configurations are cluster-wide and only need to be set once, without the need to configure each node individually.
  - Execute the following command as the sys tenant to enable the Java environment supported by OBServer:
```
alter system set ob_enable_java_env=true;
```
  - Execute the following command to set the Java home path for OBServer:
```
alter system set ob_java_home="/home/user/openjdk/jdk8u422-b05";
```
    Note
    
    The path comes from the OpenJDK Java installation path.
  - Set the relevant configuration items for Java environment startup:
    
    Notice
    
    Changes to this configuration item require a restart of OBServer to take effect. Since the current HDFS integration uses direct memory copying from the HDFS data stream to the C++ memory heap, you can appropriately reduce the settings of -Xmx2048m and -Xms2048m.
```
alter system set ob_java_opts="-Djdk.lang.processReaperUseDefaultStackSize=true -XX:+HeapDumpOnOutOfMemoryError -Xmx2048m -Xms2048m -Xloggc:/home/user/jvmlogs/gc.log -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/home/user/jvmlogs/heapdumps/  -XX:+UseG1GC  -XX:-CriticalJNINatives";
```
    Note
    
    Here, /home/user/jvmlogs and /home/user/jvmlogs/heapdumps are the paths for generating the corresponding log folders, which need to be manually created.
  - Set the available LD_LIBRARY_PATH path:
    
    Note
    
    Since the JNI SDK supported by OBServer is an extended feature, it is dynamically loaded, so you need to configure the corresponding dynamic library path and set the LD_LIBRARY_PATH. All OBServer nodes must be configured to use it.
```
$ vim ~/.bashrc
export LD_LIBRARY_PATH=/home/user/openjdk/jdk8u422-b05/jre/lib/amd64/server:/usr/local/oceanbase/deps/devel/lib
```
3. Create an HDFS external table
  
  Notice
  
  If Kerberos authentication is configured, the corresponding OBServer node must deploy and configure the keytab (user authentication key file) and krb5conf file.
```
CREATE EXTERNAL TABLE partsupp  ( PS_PARTKEY     INTEGER ,
PS_SUPPKEY     INTEGER ,
PS_AVAILQTY    INTEGER ,
PS_SUPPLYCOST  DECIMAL(15,2)  ,
PS_COMMENT     VARCHAR(199)
) LOCATION = 'hdfs://localhost:8020/tpch_csv?principal=principal_str&keytab=/path/to/keytab&krb5conf=/path/to/krb5conf_file&configs=xxx=xxx#xxx=xxx'
FORMAT = (
  TYPE = 'CSV'
  FIELD_DELIMITER = '|'
  FIELD_OPTIONALLY_ENCLOSED_BY ='"'
  )
PATTERN = 'partsupp.tbl';
```

Query the external table data.

select * from test_tbl1_csv_mysql;

The returned result is as follows:

+----+----------+------------+
| id | name     | c_date     |
+----+----------+------------+
|  1 | Emma     | 2021-09-01 |
|  2 | William  | 2021-09-02 |
|  3 | Olivia   | 2021-09-03 |
+----+----------+------------+
3 rows in set

Example 3

Assume that there is a CSV file in a specified directory on HDFS. The file content is as follows:
```
$hdfs dfs -cat /hadoop_ha_test/test_simple.csv
1,lili,19
2,alic,20
3,solvi,21
```
Start the Observer-related JNI configuration items.

For more information about the JNI configuration, see the Environment preparation section in Step 2 if Kerberos authentication is enabled for the target HDFS environment. For more information about how to deploy the OceanBase Database Java SDK environment, see Deploy the OceanBase Database Java SDK environment.

Create an HDFS external table.

obclient> CREATE EXTERNAL TABLE test_ha (
    id INT,
    r_name VARCHAR(100),
    age INT
    )
    LOCATION = 'hdfs://mycluster/hadoop_ha_test?principal=ha/xxx@xxx.com&keytab=/path/to/ha.keytab&krb5conf=/path/to/krb5conf_file&configs=dfs.data.transfer.protection=integrity#dfs.nameservices=mycluster#dfs.ha.namenodes.mycluster=nn1,nn2#dfs.namenode.rpc-address.mycluster.nn1=localhost1:port#dfs.namenode.rpc-address.mycluster.nn2=localhost2:port#dfs.ha.automatic-failover.enabled.mycluster=true#dfs.client.failover.proxy.provider.mycluster=org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider'
    FORMAT = (
        TYPE = 'CSV'
        FIELD_DELIMITER = ','
        FIELD_OPTIONALLY_ENCLOSED_BY = '\''
        )
    PATTERN = 'test_simple.csv';

Query data from the external table.

obclient> SELECT * FROM test_ha;

The returned result is as follows:

+----+--------+-------+
| id | r_name | age   |
+----+--------+-------+
|  1 | lili   |    19 |
|  2 | alic   |    20 |
|  3 | solvi  |    21 |
+----+--------+-------+
3 rows in set

References

Create an external table

Manage external files

Update the list of external table files

OceanBase

Customer Stories

Documentation

CREATE EXTERNAL TABLE

Purpose

Privilege requirements

Syntax

Parameters

Note

file_path

Notice

Note

Notice

format_type_options

Notice

Note

Note

properties_type_options

Considerations

Examples

Example 1

Note

Example 2

Notice

Notice

Note

Note

Notice

Note

Note

Notice

Example 3

References