Automatically create partitions for an external table
The syntax for creating a partitioned external table is as follows:
CREATE EXTERNAL TABLE
<table_name>
( <part_col_name> <col_type> AS <part_expr> )
[ , ... ]
[ PARTITION BY ( <part_col_name> [, <part_col_name> ... ] ) ]
..
In syntax above, PARTITION BY specifies the partitioning columns. You can specify one or more partitioning columns.
Examples
Use a partitioned external table to manage files in the following file directory:
external_table_mock_log
├── 2023-06-01
│ ├── server_log1.csv
│ └── server_log2.csv
├── 2023-06-02
│ └── server_log1.csv
├── 2023-06-03
│ ├── server_log1.csv
│ ├── server_log2.csv
│ └── server_log3.csv
└── 2023-07-01
└── server_log1.csv
The file content is as follows:
2023-06-01 14:42:37.568624, INTERNAL ERROR, -4007, Not supported
2023-06-01 14:42:38.861356, ITER END, -4008, traverse map failed
2023-06-01 14:42:39.931161, NEED WAIT, -4076, query and update last id fail
2023-06-01 14:42:39.931161, SUCCESS, 0, do flush cache success
You can create a partitioned external table to access the file directory:
obclient> create external table ex_t1 (
time datetime,
errstate varchar(30),
errcode int,
errcontent varchar(100),
date_key datetime as (substr(substr(metadata$fileurl, instr(metadata$fileurl, '%') + 1), 1, 10))
)
location='external_table_mock_log'
FORMAT (
type = 'csv',
field_delimiter = ',',
SKIP_BLANK_LINES = TRUE
)
partition by (date_key)
;
- The
METADATA$FILEURLcolumn records the path and file name. For example, the value of this parameter isip:port%2023-06-01/server_log2.csvfor a local file and is2023-06-01/server_log2.csvfor a file stored in Alibaba Cloud Object Storage Service (OSS).
In this example, the date_key column serves as the partitioning key. It specifies to partition files by date. You can specify the value or range of the partitioning key in the query statement. In this case, partitions are pruned, and the external table reads only the files in the specified partition.
obclient> select * from ex_t1 where date_key = '2023-06-01';
Manually create partitions for an external table
The syntax for creating an external table with partitions to be manually defined is as follows:
CREATE EXTERNAL TABLE
<table_name>
( <part_col_name> <col_type> AS <part_expr> )
[ , ... ]
[ PARTITION BY ( <part_col_name> [, <part_col_name> ... ] ) ]
PARTITION_TYPE = USER_SPECIFIED
If you want to manually add and delete partitions, you can set PARTITION_TYPE to USER_SPECIFIED, so that the external table does not automatically manage partitions.
Examples
Create an external table with partitions to be manually defined to access the file directory in the preceding example.
obclient> create external table ex_t1 (
time datetime,
errstate varchar(30),
errcode int,
errcontent varchar(100),
date_key datetime as (metadata$partition_list_col1)
)
location='external_table_mock_log'
PARTITION_TYPE = USER_SPECIFIED
FORMAT (
type = 'csv',
field_delimiter = ',',
SKIP_BLANK_LINES = TRUE
)
partition by (date_key)
;
metadata$partition_list_colspecifies the order of the column among partitioning columns of the external table, which corresponds to the order of partitioning columns specified by thepartition byfield.
Execute the following statement to create a partition and add files in the 2023-06-01 subdirectory to the partition. In this example, the partitioning key is set to 2023-06-01. You can set the partitioning key to any other value as needed.
obclient> alter external table ex_t1 add partition
(date_key = '2023-06-01' ) location '2023-06-01';
You can also execute the following statement to drop the created partition.
obclient> alter external table ex_t1 drop partition location '2023-06-01';