Purpose
FILES(...) maps external file locations to queryable rowsets, allowing you to directly read external data in a SELECT ... FROM statement.
Syntax
FILES(location_expr, format_expr [, pattern_expr])
A common usage is as follows:
SELECT * FROM
FILES(
location = '/data/',
format (TYPE = 'csv', field_delimiter = ',', parse_header = true),
pattern = 'datafiles$'
);
Parameters
location_expr: the location of external data.format_expr: The file format clause, written asFORMAT (...)orformat (...)(case-insensitive for keywords). The parentheses enclose a comma-separated list of format options. Mandatory is theTYPE = '<file type>'item, where<file type>is a string literal that can be'CSV','PARQUET', or'ORC'. The semantics and default values of each option are consistent with those in theformat_type_optionsclause of CREATE EXTERNAL TABLE; inFILES, you can use either the uppercase form (e.g.,FIELD_DELIMITER) or the lowercase underscore form (e.g.,field_delimiter) from the documentation external table syntax, as they are equivalent.pattern_expr: Optional. Filters files by file name pattern.
Common options in format(...)
After specifying TYPE, you can add the following options as needed (only the commonly used ones for reading plain files with FILES are listed here; for a more complete syntax tree, see the external table section in the link above).
When TYPE = 'CSV', you can also specify, for example:
Options |
Summary |
|---|---|
LINE_DELIMITER / line_delimiter |
Line separator. Default:'\n'. |
FIELD_DELIMITER / field_delimiter |
Column separator, default:'\t'. |
ESCAPE / escape |
Escape character, 1 byte in length. The default value is a backslash.\. |
FIELD_OPTIONALLY_ENCLOSED_BY / field_optionally_enclosed_by |
The character that wraps the field value, which can be empty by default. |
ENCODING / encoding |
File character set. If not specified, the default isUTF8MB4. |
NULL_IF / null_if |
A list of strings to be treated as NULL, in the syntax null_if = ('string1', 'string2', ...). |
SKIP_HEADER / skip_header |
Number of lines to skip at the beginning of the file. The value must be a non-negative integer. |
PARSE_HEADER / parse_header |
Whether to parse the first row as column names. Valid values:TRUE/FALSE(Ortrue/false). |
SKIP_BLANK_LINES / skip_blank_lines |
Whether to skip blank lines. Default:FALSE. |
TRIM_SPACE / trim_space |
Whether to remove leading and trailing spaces from the field. Default:FALSE. |
EMPTY_FIELD_AS_NULL / empty_field_as_null |
Whether to treat empty fields asNULLDefault:FALSE. |
IGNORE_LAST_EMPTY_COLUMN / ignore_last_empty_column |
Whether to ignore extra empty fields at the end of a row. Default:TRUE. |
COMPRESSION / compression |
Compression format:GZIP、ZSTD、DEFLATE、NONE、SNAPPY_BLOCKetc. |
PARALLEL_PARSE_ON_SINGLE_FILE / parallel_parse_on_single_file |
Whether to perform parallel parsing of single files. Default value:TRUE. |
PARALLEL_PARSE_FILE_SIZE_THRESHOLD / parallel_parse_file_size_threshold |
The file size threshold (in bytes) that triggers parallel parsing. The default value is 256 MB. |
MAX_ROW_LENGTH / max_row_length |
Maximum length of a single row (bytes). The default value is 2 MB. This parameter specifies the maximum length of a row that can be parsed in parallel. |
Notice
Do not use SKIP_HEADER/skip_header and PARSE_HEADER/parse_header at the same time. They have conflicting semantics and will cause an error.
Notice
When a data file contains actual NULL values (not the literal string "NULL"), you must explicitly set FIELD_OPTIONALLY_ENCLOSED_BY/field_optionally_enclosed_by, and its value cannot be empty. For details, see the corresponding description in the external table documentation.
When TYPE = 'PARQUET' or TYPE = 'ORC', you generally only need to specify TYPE for common query scenarios. If you also need to add other read options, their names and meanings are consistent with those of the FORMAT = (...) clause in an external table. For details, see the syntax definition and parameter description in CREATE EXTERNAL TABLE.
Usage instructions
FILES(...)can be used directly in theFROMclause of aSELECTstatement.- In
LOAD DATAscenarios, you can also specify the data source usingFROM FILES(...).
