A file is used in File System Persistence Mappings to define the location and format of data on a filesystem. There are many types of files, and each type may have different required fields, based on the format
field.
Text File
A text file has a single field, value
for each row in the file.
Specification
{ "format": "text", "path": "<string>", "fileName": "<string>", "compression": "<string>" }
Structure Values
Field Name | Type | Description | Required | Default |
---|---|---|---|---|
path | String | Parent path of the file. Can be an empty string. | Required | |
fileName | String | Name of the file or folder. Can be a glob using wildcards (e.g., | Required | |
compression | String | Compression type for source file. Supported compression types:
Note: this compression is only used when writing files. Magpie will detect the compression of files based on the file extension when reading. | Optional |
|
format | String | File's storage format, | Required |
Example
{ "path": "/my_data", "fileName": "transactions.txt", "format": "text" }
Delimited Text File (e.g., csv, tsv)
A text file with fields delimited by a specified separator.
Specification
{ "format": "DelimitedText", "path": "<string>", "fileName": "<string>", "compression": "<string>", "encoding": "<string>", "delimiter": "<string>", "header": <boolean>, "multiLine": <boolean>, "ignoreLeadingWhiteSpace": <boolean>, "ignoreTrailingWhiteSpace": <boolean>, "quoteCharacter": "<string>", "escapeCharacter": "<string>", "dateFormat": "<string>", "timestampFormat": "<string>" }
Structure Values
Field Name | Type | Description | Required | Default |
---|---|---|---|---|
path | String | Parent path of the file. Can be an empty string. | Required | |
fileName | String | Name of the file or folder. Can be a glob using wildcards (e.g., | Required | |
delimiter | String | The separator used to partition records into fields | Optional |
|
compression | String | Compression type for source file. Supported compression types:
Note: this compression is only used when writing files. Magpie will detect the compression of files based on the file extension when reading. | Optional |
|
encoding | String | Encoding of the file | Optional |
|
header | Boolean | If true, the first line of each file will be used as field names for the resulting tables. | Optional |
|
multiLine | Boolean | If true, multiple lines of the file will be parsed as one record, with new records starting based on field counts. | Optional |
|
ignoreLeadingWhiteSpace | Boolean | If true, leading white space will be trimmed from each field. | Optional |
|
ignoreTrailingWhiteSpace | Boolean | If true, trailing white space will be trimmed from each field. | Optional |
|
quoteCharacter | String | The character optionally used to enclose fields within the files. | Optional |
|
escapeCharacter | String | The character optionally used to escape quotations within a quoted field. | Optional |
|
nullValue | String | The value to treat as null when reading files and the value to use as null when writing files. | Optional | <empty string> |
dateFormat | String | The Java date format used to identify fields as dates within the files. | Optional |
|
timestampFormat | String | The Java date time format used to identify fields as timestamps within the files. | Optional |
|
format | String | File's storage format, | Required |
Example
{ "path": "/my_data", "fileName": "transactions.tsv.gz", "delimiter": "\t", "compression": "Gzip", "encoding": "UTF-8", "header": true, "multiLine": false, "ignoreLeadingWhiteSpace": false, "ignoreTrailingWhiteSpace": false, "quoteCharacter": "\"", "escapeCharacter": "\"", "dateFormat": "yyyy-MM-dd", "timestampFormat": "yyyy-MM-dd'T'HH:mm:ss.SSSXXX", "format": "DelimitedText" }
Parquet File
A columnar storage format. Default storage format in Magpie.
Specification
{ "format": "parquet", "path": "<string>", "fileName": "<string>", "compression": "<string>", "mergeSchema": <boolean> }
Structure Values
Field Name | Type | Description | Required | Default |
---|---|---|---|---|
path | String | Parent path of the file. Can be an empty string. | Required | |
fileName | String | Name of the file or folder. Can be a glob using wildcards. | Required | |
compression | String | Compression type for source file. Supported compression types:
Note: this compression is only used when writing files. Magpie will detect the compression of files based on the file extension when reading. | Optional |
|
mergeSchema | Boolean | If true and this file points to a folder of parquet files, Magpie will merge the schema for each file to create the final schema for the table. If false, Magpie will only use the schema of the first file for the table. | Optional |
|
format | String | File's storage format, | Required |
Example
{ "path": "/my_data", "fileName": "transactions", "mergeSchema": true, "format": "parquet" }
JSON File
A JSON or newline-delimited JSON file (NDJSON).
Specification
{ "format": "json", "path": "<string>", "fileName": "<string>", "compression": "<string>", "encoding": "<string>", "multiLine": <boolean>, "dateFormat": "<string>", "timestampFormat": "<string>" }
Structure Values
Field Name | Type | Description | Required | Default |
---|---|---|---|---|
path | String | Parent path of the file. Can be an empty string. | Required | |
fileName | String | Name of the file or folder. Can be a glob using wildcards (e.g., | Required | |
compression | String | Compression type for source file. Supported compression types:
Note: this compression is only used when writing files. Magpie will detect the compression of files based on the file extension when reading. | Optional |
|
encoding | String | The file’s encoding. If not specified, encoding is auto-detected. | Optional | |
multiLine | Boolean | If true, each file will be parsed as a single record (each file is a single JSON record). If false, files are parsed as newline-delimited JSON, with each new line starting a separate JSON record. | Optional |
|
dateFormat | String | The Java date format used to identify fields as dates within the files. | Optional |
|
timestampFormat | String | The Java datetime format used to identify fields as timestamps within the files. | Optional |
|
format | String | File's storage format, | Required |
Example
{ "path": "/my_data", "fileName": "clicks/*.json", "multiLine": true, "dateFormat": "yyyy-MM-dd", "timestampFormat": "yyyy-MM-dd'T'HH:mm:ss.SSSXXX", "format": "json" }
ORC File
A columnar storage format.
Specification
{ "format": "orc", "path": "<string>", "fileName": "<string>", "compression": "<string>" }
Structure Values
Field Name | Type | Description | Required | Default |
---|---|---|---|---|
path | String | Parent path of the file. Can be an empty string. | Required | |
fileName | String | Name of the file or folder. Can be a glob using wildcards. | Required | |
compression | String | Compression type for source file. Supported compression types:
Note: this compression is only used when writing files. Magpie will detect the compression of files based on the file extension when reading. | Optional |
|
format | String | File's storage format, | Required |
Example
{ "path": "/my_data", "fileName": "transactions", "format": "orc" }
Avro File
A row-based storage format widely used as a serialization platform.
Specification
{ "format": "avro", "path": "<string>", "fileName": "<string>", "compression": "<string>", "ignoreExtension": <boolean>, "recordName": "<string>", "recordNamespace": "<string>" }
Structure Values
Field Name | Type | Description | Required | Default |
---|---|---|---|---|
path | String | Parent path of the file. Can be an empty string. | Required | |
fileName | String | Name of the file or folder. Can be a glob using wildcards. | Required | |
compression | String | Compression type for source file. Supported compression types:
Note: this compression is only used when writing files. Magpie will detect the compression of files based on the file extension when reading. | Optional |
|
ignoreExtension | Boolean | If false, read only | Optional | false |
recordName | String | When writing, the name of the top level record to write. | Optional |
|
recordNamespace | String | When writing, the namespace of the record to write. | Optional | <empty string> |
format | String | File's storage format, | Required |
Example
{ "path": "/my_data", "fileName": "transactions", "format": "avro" }
Delta File
A delta lake file.
Specification
{ "format": "delta", "path": "<string>", "fileName": "<string>", "mergeSchema": <boolean> }
Structure Values
Field Name | Type | Description | Required | Default |
---|---|---|---|---|
path | String | Parent path of the file. Can be an empty string. | Required | |
fileName | String | Name of the file or folder. Can be a glob using wildcards. | Required | |
mergeSchema | Boolean | If true and this file points to a folder of delta files, Magpie will merge the schema for each file to create the final schema for the table. If false, Magpie will only use the schema of the first file for the table. | Optional |
|
format | String | File's storage format, | Required |
Example
{ "path": "/my_data", "fileName": "transactions", "format": "delta" }