Create Schema From Path

Creates a Magpie schema that references a set of files on a file system data source. This command lists all files at a path in a file system and creates a table that references each file. All files must be the same format.

Syntax

create schema <schema reference> from data source <data source reference> path "<path>"
  [with ignore failed tables]
  [with infer schema]
  [with format <file format>]
  [with quote char "<quote character>"]
  [with escape char "<escape character>"]
  [with null value "<null value>"]
  [with delimiter "<delimiter>"]
  [with encoding "<encoding>"]
  [with date format "<date format>"]
  [with timestamp format "<timestamp format>"]
  [with multiline]
  [with header]
  [with ignore leading white space]
  [with ignore trailing white space]
  [with ignore extension]
  [with merge schema]

Parameters

ignore failed tables

None. If present, continue creating tables from path after encountering a failure on any table(s). Default behavior halts on failures.

schema reference

String. The name of the schema to create.

data source reference

String. The name of the data source that the path is located in.

path

String. The path on the filesystem to look for files in.

infer schema

None. If present, this option causes schema inference to occur, attempting to identify the data types of each field in each file. Only used for csv files.

file format

String. The format of the source files. Supported formats: text, parquet, csv, json, orcavro, or delta. Default is parquet.

quote character

String. The character optionally used to enclose fields within the files. Only used for csv files. Default is ".

escape character

String. The character optionally used to escape quotations within a quoted field. Only used for csv files. Default is ".

null value

String. Fields of this value will be converted to null. Only used for csv files. Default is that empty fields are converted to null.

delimiter

String. The character used to separate fields within the files. Only used for csv files. Default is ,.

encoding

String. The encoding of the files. Only used for csv and json files. Default is UTF-8 for csv and newline-delimited json and auto-detected for multi-line json.

date format

String. The Java date format used to identify fields as dates within the files. Only used for csv and json files. Default is yyyy-MM-dd.

timestamp format

String. The Java datetime format used to identify fields as timestamps within the files. Only used for csv and json files. Default is yyyy-MM-dd'T'HH:mm:ss.SSSXXX.

multiline

None. If present, this option enables parsing multiple lines as one record. Only used for csv and json files.

header

None. If present, the first line of each file will be used as field names for the resulting tables. Only used for csv files.

ignore leading white space

None. If present, leading white space will be trimmed from each field. Only used for csv files.

ignore trailing white space

None. If present, trailing white space will be trimmed from each field. Only used for csv files.

ignore extension

None. If present, read all files at the specified path and file name regardless of whether they end in .avro. By default, only .avro files are read and others are ignored. Only used for avro files.

merge schema

None. If present, union the fields present in all of the parquet files at the given path and file name to determine the schema of the new table. By default, only the fields present in the first file are used. Only used for parquet or delta files.


Was this article helpful?
0 out of 0 found this helpful