Automatically add partitions for an external table
The syntax for creating a partitioned external table is as follows:
CREATE EXTERNAL TABLE
<table_name>
( <part_col_name> <col_type> AS <part_expr> )
[ , ... ]
[ PARTITION BY ( <part_col_name> [, <part_col_name> ... ] ) ]
..
The parameters are described as follows:
PARTITION BYspecifies one or more partitioning columns.
Example
Use a partitioned external table to manage files in the following file directory. Two formats of data files are supported for external tables: CSV and PARQUET. In this example, the data files are in the CSV format.
external_table_mock_log
├── 2023-06-01
│ ├── server_log1.csv
│ └── server_log2.csv
├── 2023-06-02
│ └── server_log1.csv
├── 2023-06-03
│ ├── server_log1.csv
│ ├── server_log2.csv
│ └── server_log3.csv
└── 2023-07-01
└── server_log1.csv
The file content is as follows:
2023-06-01 14:42:37.568624, INTERNAL ERROR, -4007, Not supported
2023-06-01 14:42:38.861356, ITER END, -4008, traverse map failed
2023-06-01 14:42:39.931161, NEED WAIT, -4076, query and update last id fail
2023-06-01 14:42:39.931161, SUCCESS, 0, do flush cache success
You can create a partitioned external table to access the file directory:
obclient> create external table ex_t1 (
time datetime,
errstate varchar(30),
errcode int,
errcontent varchar(100),
date_key datetime as (substr(substr(metadata$fileurl, instr(metadata$fileurl, '%') + 1), 1, 10))
)
location='external_table_mock_log'
FORMAT (
type = 'csv',
field_delimiter = ',',
SKIP_BLANK_LINES = TRUE
)
partition by (date_key)
;
- The
METADATA$FILEURLcolumn records the path and file name. For example, the value of this parameter isip:port%2023-06-01/server_log2.csvfor a local file and is2023-06-01/server_log2.csvfor a file stored in Alibaba Cloud Object Storage Service (OSS).
In this example, the date_key column serves as the partitioning key. It specifies to partition files by date. You can specify the value or range of the partitioning key in the query statement. In this case, partitions are pruned, and the external table reads only the files in the specified partition.
obclient> select * from ex_t1 where date_key = '2023-06-01';
Manually add partitions for an external table
The syntax for creating an external table with partitions to be manually defined is as follows:
CREATE EXTERNAL TABLE
<table_name>
( <part_col_name> <col_type> AS <part_expr> )
[ , ... ]
[ PARTITION BY ( <part_col_name> [, <part_col_name> ... ] ) ]
PARTITION_TYPE = USER_SPECIFIED
If you want to manually add and drop partitions, you can set PARTITION_TYPE to USER_SPECIFIED so that the external table does not automatically manage partitions.
Examples
Create an external table with partitions to be manually defined to access the file directory in the preceding example.
Example 1
Two formats of data files are supported for external tables: CSV and PARQUET. You can execute the following statement to add partitions for an external table to read data from CSV files:
obclient> create external table ex_t1 (
time datetime,
errstate varchar(30),
errcode int,
errcontent varchar(100),
date_key datetime as (metadata$partition_list_col1)
)
location='external_table_mock_log'
PARTITION_TYPE = USER_SPECIFIED
FORMAT (
type = 'csv',
field_delimiter = ',',
SKIP_BLANK_LINES = TRUE
)
partition by (date_key)
;
metadata$partition_list_colspecifies the order of the column among partitioning key columns of the external table, which corresponds to the order of partitioning key columns specified by thepartition byfield.
Execute the following statement to add a partition and load data from files in the 2023-06-01 subdirectory to the partition. In this example, the partitioning key value is set to 2023-06-01. You can set the partitioning key to any other value as needed.
obclient> alter external table ex_t1 add partition
(date_key = '2023-06-01' ) location '2023-06-01';
You can also execute the following statement to drop the added partition:
obclient> alter external table ex_t1 drop partition location '2023-06-01';
Example 2
Two formats of data files are supported for external tables: CSV and PARQUET. You can execute the following statement to add partitions for an external table to read data from PARQUET files:
obclient> create external table lineitem_part
(
l_orderkey int,
l_partkey int,
l_suppkey int,
l_linenumber int,
l_quantity decimal(15,2),
l_quantity decimal(15,2),S
l_discount decimal(15,2),
l_tax decimal(15,2),
l_returnflag char(1),
l_linestatus char(1),
l_shipdate date,
l_commitdate date,
l_receiptdate date,
l_shipinstruct char(25),
l_shipmode char(10),
l_comment varchar(44),
l_shipyear varchar(4) as (substr(metadata$fileurl, instr(metadata$fileurl, '%') + 1, 4))
) location 'file:///data/external_data/lineitem_part'
format (
type = 'parquet'
)
partition by(l_shipyear);
Query OK, 0 rows affected (0.126 sec)