Create Table From File

August 08, 2018 13:32

Creates a Magpie table that references a file on an HDFS file system data source.

Syntax

create [temp | temporary] table { <table spec> | <table name> } [in schema <schema reference>]
  from data source <data source reference> file "<file name>"
  [at path "<path>"]
  [with infer schema]
  [with format <file format>]
  [with quote char "<quote character>"]
  [with escape char "<escape character>"]
  [with null value "<null value>"]
  [with delimiter "<delimiter>"]
  [with encoding "<encoding>"]
  [with date format "<date format>"]
  [with timestamp format "<timestamp format>"]
  [with multiline]
  [with header]
  [with ignore leading white space]
  [with ignore trailing white space]
  [with ignore extension]
  [with merge schema]

Parameters

`table spec`

JSON. A specification for saving the table. Note that the persistence mapping and fields are not used when creating a table from file.

`table name`

String. The name of the table to save the data as.

`schema reference`

String. The name of the schema to create the table in. Defaults to the current schema.

`data source reference`

String. The name of the data source that the file is located in.

`file name`

String. The name of the file on the file system.

`path`

String. The parent path of the file on the file system. Defaults to an empty string.

`infer schema`

None. If present, this option causes schema inference to occur, attempting to identify the data types of each field in the file. Only used for csv files.

`file format`

String. The format of the source file. Supported formats: text, parquet, csv, json, orc, avro, or delta. Default is parquet.

`quote character`

String. The character optionally used to enclose fields within the file. Only used for csv files. Default is ".

`escape character`

String. The character optionally used to escape quotations within a quoted field. Only used for csv files. Default is ".

`null value`

String. Fields of this value will be converted to null. Only used for csv files. Default is that empty fields are converted to null.

`delimiter`

String. The character used to separate fields within the file. Only used for csv files. Default is ,.

`encoding`

String. The encoding of the file. Only used for csv and json files. Default is UTF-8 for csv and newline-delimited json and auto-detected for multi-line json.

`date format`

String. The Java date format used to identify fields as dates within the file. Only used for csv and json files. Default is yyyy-MM-dd.

`timestamp format`

String. The Java datetime format used to identify fields as timestamps within the file. Only used for csv and json files. Default is yyyy-MM-dd'T'HH:mm:ss.SSSXXX for json and yyyy-MM-dd HH:mm:ss for csv.

`multiline`

None. If present, this option enables parsing multiple lines as one record. Only used for csv and json files.

`header`

None. If present, the first line of the file will be used as field names for the resulting table. Only used for csv files.

`ignore leading white space`

None. If present, leading white space will be trimmed from each field. Only used for csv files.

`ignore trailing white space`

None. If present, trailing white space will be trimmed from each field. Only used for csv files.

`ignore extension`

None. If present, read all files at the specified path and file name regardless of whether they end in .avro. By default, only .avro files are read and others are ignored. Only used for avro files.

`merge schema`

None. If present, union the fields present in all of the parquet files at the given path and file name to determine the schema of the new table. By default, only the fields present in the first file are used. Only used for parquet or delta files.