Add partitions for an external table

2025-11-27 02:38:06  Updated

Automatically add partitions for an external table

The syntax for creating a partitioned external table is as follows:

CREATE EXTERNAL TABLE
  <table_name>
     ( <part_col_name> <col_type> AS <part_expr> )
     [ , ... ]
  [ PARTITION BY ( <part_col_name> [, <part_col_name> ... ] ) ]
  ..

The parameters are described as follows:

  • PARTITION BY specifies one or more partitioning columns.

Example

Use a partitioned external table to manage files in the following file directory. Two formats of data files are supported for external tables: CSV and PARQUET. In this example, the data files are in the CSV format.

external_table_mock_log
├── 2023-06-01
│   ├── server_log1.csv
│   └── server_log2.csv
├── 2023-06-02
│   └── server_log1.csv
├── 2023-06-03
│   ├── server_log1.csv
│   ├── server_log2.csv
│   └── server_log3.csv
└── 2023-07-01
    └── server_log1.csv

The file content is as follows:

2023-06-01 14:42:37.568624, INTERNAL ERROR, -4007, Not supported
2023-06-01 14:42:38.861356, ITER END, -4008, traverse map failed
2023-06-01 14:42:39.931161, NEED WAIT, -4076, query and update last id fail
2023-06-01 14:42:39.931161, SUCCESS, 0, do flush cache success

You can create a partitioned external table to access the file directory:

obclient> create external table ex_t1 (
time date,
errstate varchar2(30),
errcode int,
errcontent varchar2(100),
date_key date as (substr(substr(metadata$fileurl, instr(metadata$fileurl, '%') + 1), 1, 10))
)
location='external_table_mock_log'
FORMAT (
    type = 'csv',
    field_delimiter = ',',
    SKIP_BLANK_LINES = TRUE
)
partition by (date_key)
;
  • The METADATA$FILEURL column records the path and file name. For example, the value of this parameter is ip:port%2023-06-01/server_log2.csv for a local file and is 2023-06-01/server_log2.csv for a file stored in Alibaba Cloud Object Storage Service (OSS).

In this example, the date_key column serves as the partitioning key. It specifies to partition files by date. You can specify the value or range of the partitioning key in the query statement. In this case, partitions are pruned, and the external table reads only the files in the specified partition.

obclient> select * from ex_t1 where date_key = '2023-06-01';

Manually add partitions for an external table

The syntax for creating an external table with partitions to be manually defined is as follows:

CREATE EXTERNAL TABLE
  <table_name>
     ( <part_col_name> <col_type> AS <part_expr> )
     [ , ... ]
  [ PARTITION BY ( <part_col_name> [, <part_col_name> ... ] ) ]
  PARTITION_TYPE = USER_SPECIFIED

If you want to manually add and drop partitions, you can set PARTITION_TYPE to USER_SPECIFIED so that the external table does not automatically manage partitions.

Examples

Create an external table with partitions to be manually defined to access the file directory in the preceding example.

Example 1

Two formats of data files are supported for external tables: CSV and PARQUET. You can execute the following statement to add partitions for an external table to read data from CSV files:

obclient> create external table ex_t1 (
time date,
errstate varchar2(30),
errcode int,
errcontent varchar2(100),
date_key date as (metadata$partition_list_col1)
)
location='external_table_mock_log'
PARTITION_TYPE = USER_SPECIFIED
FORMAT (
    type = 'csv',
    field_delimiter = ',',
    SKIP_BLANK_LINES = TRUE
)
partition by (date_key)
;
  • metadata$partition_list_col specifies the order of the column among partitioning key columns of the external table, which corresponds to the order of partitioning key columns specified by the partition by field.

Execute the following statement to add a partition and load data from files in the 2023-06-01 subdirectory to the partition. In this example, the partitioning key value is set to 2023-06-01. You can set the partitioning key to any other value as needed.

alter external table ex_t1 add partition 
(date_key = '2023-06-01' ) location '2023-06-01';

You can also execute the following statement to drop the added partition:

alter external table ex_t1 drop partition location '2023-06-01';

Example 2

Two formats of data files are supported for external tables: CSV and PARQUET. You can execute the following statement to add partitions for an external table to read data from PARQUET files:

obclient> create external table lineitem_part
(
  l_orderkey     int,
  l_partkey      int,
  l_suppkey      int,
  l_linenumber   int,
  l_quantity     decimal(15,2),
  l_quantity     decimal(15,2),S
  l_discount     decimal(15,2),
  l_tax          decimal(15,2),
  l_returnflag   char(1),
  l_linestatus   char(1),
  l_shipdate     date,
  l_commitdate   date,
  l_receiptdate  date,
  l_shipinstruct char(25),
  l_shipmode     char(10),
  l_comment      varchar(44),
  l_shipyear     varchar(4) as (substr(metadata$fileurl, instr(metadata$fileurl, '%') + 1, 4))
) location 'file:///data/external_data/lineitem_part' 
format (
type = 'parquet'
)
partition by(l_shipyear);
Query OK, 0 rows affected (0.126 sec)

References

Contact Us