Description
FILES(...) maps external file locations to queryable rowsets, allowing you to directly read external data in a SELECT ... FROM statement.
Syntax
FILES(location_expr, format_expr [, pattern_expr])
A common usage is as follows:
SELECT * FROM
FILES(
location = '/data/',
format (TYPE = 'csv', field_delimiter = ',', parse_header = true),
pattern = 'datafiles$'
);
Parameters
location_expr: The location of the external data.format_expr: the file format clause, written asFORMAT (...)orformat (...)(case-insensitive for keywords). The parentheses enclose a comma-separated list of format options. Mandatory is theTYPE = '<file type>'item, where<file type>is a string literal that can be'CSV','PARQUET', or'ORC'. The semantics and default values of each option are consistent with those in theformat_type_optionsclause of CREATE EXTERNAL TABLE. InFILES, you can use either the uppercase form (e.g.,FIELD_DELIMITER) or the lowercase underscore form (e.g.,field_delimiter) from the documentation external table syntax; both are equivalent.pattern_expr: Optional. Filters files by file name pattern.
Common options in format(...)
After specifying TYPE, you can append the following options as needed (only those commonly used when FILES reads flat files are listed; for a more complete syntax tree, see the external table section in the link above).
When TYPE = 'CSV', you can also specify, for example:
Options |
Summary |
|---|---|
LINE_DELIMITER / line_delimiter |
Line separator. Default:'\n'. |
FIELD_DELIMITER / field_delimiter |
Column separator, default:'\t'. |
ESCAPE / escape |
Escape character, 1 byte in length. The default value is a backslash.\. |
FIELD_OPTIONALLY_ENCLOSED_BY / field_optionally_enclosed_by |
The character that wraps the field value, which can be empty by default. |
ENCODING / encoding |
File character set. If not specified, the default isUTF8MB4. |
NULL_IF / null_if |
A list of strings to be treated as NULL, in the syntax null_if = ('string1', 'string2', ...). |
SKIP_HEADER / skip_header |
Number of lines to skip at the beginning of the file. The value must be a non-negative integer. |
PARSE_HEADER / parse_header |
Whether to parse the first line as column names. Valid values:TRUE/FALSE(Ortrue/false). |
SKIP_BLANK_LINES / skip_blank_lines |
Whether to skip blank lines. Default:FALSE. |
TRIM_SPACE / trim_space |
Whether to remove leading and trailing spaces from the field. Default:FALSE. |
EMPTY_FIELD_AS_NULL / empty_field_as_null |
Whether to treat empty fields asNULLDefault:FALSE. |
IGNORE_LAST_EMPTY_COLUMN / ignore_last_empty_column |
Whether to ignore extra empty fields at the end of a row. Default:TRUE. |
COMPRESSION / compression |
Compression format:GZIP、ZSTD、DEFLATE、NONE、SNAPPY_BLOCKand so on. |
PARALLEL_PARSE_ON_SINGLE_FILE / parallel_parse_on_single_file |
Whether to perform parallel parsing on single files. Default value:TRUE. |
PARALLEL_PARSE_FILE_SIZE_THRESHOLD / parallel_parse_file_size_threshold |
The file size threshold (in bytes) that triggers parallel parsing. The default value is 256 MB. |
MAX_ROW_LENGTH / max_row_length |
Maximum length of a single row (bytes). The default value is 2 MB. This parameter specifies the maximum length of a row that can be parsed in parallel. |
Notice
Do not use SKIP_HEADER/skip_header and PARSE_HEADER/parse_header together. They have conflicting semantics and will cause an error.
Notice
When the data file contains actual NULL values (not the literal string "NULL"), you must explicitly set FIELD_OPTIONALLY_ENCLOSED_BY/field_optionally_enclosed_by, and the value cannot be empty. For details, see the corresponding description in the external table documentation.
When TYPE = 'PARQUET' or TYPE = 'ORC', you generally only need to specify TYPE for common query scenarios. If you also need to add other read options, their names and meanings are the same as those in the FORMAT = (...) clause of an external table. For more information, see the syntax definition and parameter description in CREATE EXTERNAL TABLE.
Usage instructions
FILES(...)can be used directly in theFROMclause of aSELECTstatement.- In a
LOAD DATAscenario, you can also specify the data source usingFROM FILES(...).
