Description
FILES(...) maps the location of external files to a queryable rowset, allowing you to directly read external data in SELECT ... FROM.
Syntax
FILES(location_expr, format_expr [, pattern_expr])
A common usage is as follows:
SELECT * FROM
FILES(
location = '/data/',
format (TYPE = 'csv', field_delimiter = ',', parse_header = true),
pattern = 'datafiles$'
);
Parameters
location_expr: the location of external data.format_expr: The file format clause, written asFORMAT (...)orformat (...)(case-insensitive for keywords). The parentheses enclose a comma-separated list of format options. Mandatory is theTYPE = '<file type>'item, where<file type>is a string literal that can be'CSV','PARQUET', or'ORC'. The semantics and default values of each option are consistent with those in theformat_type_optionsclause of CREATE EXTERNAL TABLE. InFILES, you can use either the uppercase form (e.g.,FIELD_DELIMITER) or the lowercase underscore form (e.g.,field_delimiter) from the documentation external table syntax; both are equivalent.pattern_expr: Optional. Filters files by file name pattern.
Common options for format(...)
After specifying TYPE, you can add the following options as needed (only those commonly used for reading plain files with FILES are listed here; for a more complete syntax tree, see the external table section in the link above).
When TYPE = 'CSV', you can also specify, for example:
Options |
Summary |
|---|---|
LINE_DELIMITER / line_delimiter |
Line separator. Default:'\n'. |
FIELD_DELIMITER / field_delimiter |
Column delimiter. Default:'\t'. |
ESCAPE / escape |
Escape character, 1 byte in length. The default value is the backslash.\. |
FIELD_OPTIONALLY_ENCLOSED_BY / field_optionally_enclosed_by |
The character that wraps the field value, which can be empty by default. |
ENCODING / encoding |
File character set. The default value isUTF8MB4. |
NULL_IF / null_if |
A list of strings to be treated as NULL, in the syntax null_if = ('string1', 'string2', ...). |
SKIP_HEADER / skip_header |
The number of lines to skip at the beginning of the file, which must be a non-negative integer. |
PARSE_HEADER / parse_header |
Whether to parse the first row as column names. Valid values:TRUE/FALSE(Ortrue/false). |
SKIP_BLANK_LINES / skip_blank_lines |
Whether to skip blank lines. Default:FALSE. |
TRIM_SPACE / trim_space |
Whether to remove leading and trailing spaces from the field. Default:FALSE. |
EMPTY_FIELD_AS_NULL / empty_field_as_null |
Whether to treat empty fields asNULLDefault:FALSE. |
IGNORE_LAST_EMPTY_COLUMN / ignore_last_empty_column |
Whether to ignore extra empty fields at the end of a row. Default:TRUE. |
COMPRESSION / compression |
Compression format:GZIP、ZSTD、DEFLATE、NONE、SNAPPY_BLOCKand so on. |
PARALLEL_PARSE_ON_SINGLE_FILE / parallel_parse_on_single_file |
Whether to perform parallel parsing on single files. Default:TRUE. |
PARALLEL_PARSE_FILE_SIZE_THRESHOLD / parallel_parse_file_size_threshold |
The file size threshold (in bytes) that triggers parallel parsing. The default value is 256 MB. |
MAX_ROW_LENGTH / max_row_length |
Maximum length of a single row (bytes), which is 2 MB by default. This parameter specifies the maximum length of a row that can be parsed in parallel. |
Notice
Do not use SKIP_HEADER/skip_header and PARSE_HEADER/parse_header together. They have conflicting semantics and will cause an error.
Notice
When the data file contains actual NULL values (not the literal string "NULL"), you must explicitly set FIELD_OPTIONALLY_ENCLOSED_BY/field_optionally_enclosed_by, and the value cannot be empty. For details, see the corresponding description in the external table documentation.
When TYPE = 'PARQUET' or TYPE = 'ORC', you generally only need to specify TYPE for common query scenarios. If you also need to add other read options, their names and meanings are the same as those in the FORMAT = (...) clause of an external table. For more information, see the syntax definition and parameter description in CREATE EXTERNAL TABLE.
Usage instructions
FILES(...)can be used directly in theFROMclause of aSELECTstatement.- In
LOAD DATAscenarios, you can also specify the data source usingFROM FILES(...).
