Persistence Mapping

A persistence mapping defines the underlying data for a Magpie table. Usually, this is a location on a Magpie data source (e.g., a file or files in a filesystem or a table in a database), but sometimes this can be a virtual mapping (such as a SQL query executed by Magpie). There are many types of persistence mappings, and each has different required fields based on the mappingType field.

Distributed File System Mapping (e.g., Amazon S3, Google Cloud Storage, Azure Storage)

Specification

{
  "sourceName": "<string>",
  "file": <file spec>,
  "mapping": [
    {
      "sourceField": "<string>",
      "targetField": "<string>"
    }
  ],
  "isWritable": boolean,
  "mappingType": "FileSystem"
}

Structure Values

Field Name

Type

Description

Required

Default

sourceName

String

Name of the Magpie HDFS data source containing the data.

Required


file

File

An object which describes the details of the underlying file or folder on the filesystem.

Required


mapping

Array of mapping objects

List of mappings from source field name to target field name. Default is to use the source field names in Magpie.

Optional

isWritable

Boolean

Whether Magpie is allowed to write to or delete the underlying filesystem files.

Optional

true

mappingType

String

Type of persistence mapping, FileSystem for file system sources.

Required


Delimited Text File Example

{
  "sourceName": "my_s3_source",
  "file": {
    "path": "/my_data",
    "fileName": "transactions",
    "delimiter": "\t",
    "compression": "Gzip",
    "header": true,
    "multiLine": false,
    "ignoreLeadingWhiteSpace": false,
    "ignoreTrailingWhiteSpace": false,
    "quoteCharacter": "\"",
    "escapeCharacter": "\"",
    "dateFormat": "yyyy-MM-dd",
    "timestampFormat": "yyyy-MM-dd'T'HH:mm:ss.SSSXXX",
    "format": "DelimitedText"
  },
  "mapping": [
    {
      "sourceField": "uid",
      "targetField": "uid"
    },
    {
      "sourceField": "price",
      "targetField": "invoice_val"
    },
  "isWritable": false,
  "mappingType": "FileSystem"
}


JDBC Mapping (e.g., most relational databases)

Specification

{
  "sourceName": "<string>",
  "expression": <database expression spec>,
  "fetchSize": <integer>,
  "mapping": [
    {
      "sourceField": "<string>",
      "targetField": "<string>"
    }
  ],
  "isWritable": <boolean>,
  "mappingType": "jdbc"
}

Structure Values

Field Name

Type

Description

Required

Default

sourceName

String

Name of the Magpie database data source containing the data.

Required


expression

Database expression

Defines the location of the data in the database, either a table or a SQL query.

Required


fetchSize

Integer

The number of rows to fetch at once from a JDBC connection when reading from the table.

Optional


mapping

Array of mapping objects

List of mappings from source to target fields. Default is to use the source field names in Magpie.

Optional

isWritable

Boolean

Whether Magpie is allowed to write to or delete the underlying table. Only valid for mappings with table expressions.

Optional

true

mappingType

String

Type of persistence mapping, jdbc for JDBC mappings.

Required


Example

{
  "sourceName": "my_rdbms",
  "expression": {
    "schemaName": "dbo",
    "tableName": "transactions"
  },
  "fetchSize": 50000,
  "isWritable": false,
  "mappingType": "Jdbc"
}



PostgreSQL Bulk Mapping

Specification

{
  "sourceName": "<string>",
  "expression": <database expression spec>,
  "mapping": [
    {
      "sourceField": "<string>",
      "targetField": "<string>"
    }
  ],
  "isWritable": <boolean>,
  "mappingType": "PostgresBulk"
}

Structure Values

Field Name

Type

Description

Required

Default

sourceName

String

Name of the Magpie Postgres data source containing the data.

Required


expression

Database expression

Defines the location of the data in the database, either a table or a SQL query.

Required


mapping

Array of mapping objects

List of mappings from source to target fields. Default is to use the source field names in Magpie.

Optional

isWritable

Boolean

Whether Magpie is allowed to write to or delete the underlying table. Only valid for mappings with table expressions.

Optional

true

mappingType

String

Type of persistence mapping, PostgresBulk for PostgreSQL mappings.

Required


Example

{
  "sourceName": "my_rdbms",
  "expression": {
    "schemaName": "public",
    "tableName": "transactions"
  },
  "isWritable": false,
  "mappingType": "PostgresBulk"
}



Amazon Redshift Bulk Access Mapping

Specification

{
  "sourceName": "<string>",
  "expression": <database expression spec>,
  "mapping": [
    {
      "sourceField": "<string>",
      "targetField": "<string>"
    }
  ],
  "stagingFile": <staging file spec>,
  "isWritable": <boolean>,
  "mappingType": "RedshiftBulk"
}

Structure Values

Field Name

Type

Description

Required

Default

sourceName

String

Name of the Magpie Redshift data source containing the data.

Required


expression

Database expression

Defines the location of the data in the database, either a table or a SQL query.

Required


mapping

Array of mapping objects

List of mappings from source to target fields. Default is to use the source field names in Magpie.

Optional

stagingFile

Staging file

A specification of a location to use to stage reads from and writes to Redshift.

Optional


isWritable

Boolean

Whether Magpie is allowed to write to or delete the underlying table. Only valid for mappings with table expressions.

Optional

true

mappingType

String

Type of persistence mapping, RedshiftBulk for Redshift mappings.

Required


Example

{
  "sourceName": "my_redshift_db",
  "expression": {
    "schemaName": "bi",
    "tableName": "transactions"
  },
  "stagingFile": {
    "sourceName": "my_temporary_s3_dump",
    "path": "/staging",
    "fileName": "/transactions_temp"
  },
  "mappingType": "RedshiftBulk"
}


Snowflake Bulk Mapping

Specification

{
  "sourceName": "<string>",
  "expression": <database expression spec>,
  "mapping": [
    {
      "sourceField": "<string>",
      "targetField": "<string>"
    }
  ],
  "isWritable": <boolean>,
  "mappingType": "SnowflakeBulk"
}

Structure Values

Field Name

Type

Description

Required

Default

sourceName

String

Name of the Magpie Snowflake data source containing the data.

Required


expression

Database expression

Defines the location of the data in the database, either a table or a SQL query.

Required


mapping

Array of mapping objects

List of mappings from source to target fields. Default is to use the source field names in Magpie.

Optional

isWritable

Boolean

Whether Magpie is allowed to write to or delete the underlying table. Only valid for mappings with table expressions.

Optional

true

mappingType

String

Type of persistence mapping, SnowflakeBulk for Snowflake mappings.

Required


Example

{
  "sourceName": "my_snowflake_source",
  "expression": {
    "schemaName": "public",
    "tableName": "transactions"
  },
  "isWritable": false,
  "mappingType": "SnowflakeBulk"
}



Google BigQuery Mapping

Specification

{
  "sourceName": "<string>",
  "expression": <bigquery expression spec>,
  "mapping": [
    {
      "sourceField": "<string>",
      "targetField": "<string>"
    }
  ],
  "stagingFile": <staging file spec>,
  “isWritable”: <boolean>,
  "mappingType": "BigQuery"
}

Structure Values

Field Name

Type

Description

Required

Default

sourceName

String

Name of the Magpie Redshift data source containing the data.

Required


expression

BigQuery expression

Defines the location of the data in the bigquery project via a dataset ID and a table ID.

Required


mapping

Array of mapping objects

List of mappings from source to target fields. Default is to use the source field names in Magpie.

Optional

stagingFile

Staging file

A specification of a location to use to stage writes to BigQuery.

Optional


isWritable

Boolean

Whether Magpie is allowed to write to or delete the underlying table.

Optional

true

mappingType

String

Type of persistence mapping, BigQuery for BigQuery mappings.

Required


Example

{
  "sourceName": "my_bigquery_source",
  "expression": {
    "datasetId": "sample-bq-dataset",
    "tableId": "sample-bq-table"
  },
  "stagingFile": {
    "sourceName": "my_temporary_gs_dump",
    "path": "staging",
    "fileName": "temp_file"
  },
  “isWritable”: true,
  "mappingType": "BigQuery"
}



Kafka Subscription Mapping

Specification

{
  "sourceName": String,
  "subscription": <subscription spec>,
  "keyType": "<string>",
  "valueType": "<string>",
  "startingOffsets": <offsets spec>,
  "endingOffsets": <offsets spec>,
  "isWritable": <boolean>,
  "mappingType": "Kafka"
}

Structure Values

Field Name

Type

Description

Required

Default

sourceName

String

Name of the Magpie Kafka data source containing the data.

Required


subscription

Kafka Subscription

Defines which data Magpie should include in the Kafka query.

Required


keyType

String

The data type of keys in the loaded topic or topics. Supported value types:

  • binary

  • string

  • json

Optional

binary

valueType

String

The data type of message values in the loaded topic or topics. Supported value types:

  • binary

  • string

  • json

Optional

binary

startingOffsets

Kafka Offsets

The start point for loading messages from the configured topic or topics. Must be either earliest or a specification or which offsets to use by topic partition.

Optional

earliest

endingOffsets

Kafka Offsets

The end point for loading messages from the configured topic or topics. Must be either latest or a specification or which offsets to use by topic partition.

Optional

latest

isWritable

Boolean

Whether Magpie is allowed to write to the underlying topics.

Optional

true

mappingType

String

Type of persistence mapping, Kafka for Kafka subscriptions.

Required


Example

{
  "sourceName": "kafka_source",
  "subscription": {
    "subscriptionType": "Topic",
    "topic": "topicA"
  },
  "keyType": "String",
  "valueType": "String",
  "startingOffsets": "Earliest",
  "endingOffsets": "Latest",
  "isWritable": true,
  "mappingType": "Kafka"
}


Generic Spark Mapping

Specification

{
  "sourceName": "<string>",
  "path": "<string>",
  "mapping": [
    {
      "sourceField": "<string>",
      "targetField": "<string>"
    }
  ],
  "options": <options spec>,
  "isWritable": <boolean>,
  "mappingType": "Generic"
}

Structure Values

Field Name

Type

Description

Required

Default

sourceName

String

Name of data source containing table. Data source object must exist.

Required


path

String

The path option passed to Spark on .load() or .save().

Required


mapping

Array of mapping objects

List of mappings from source to target fields.

Optional

None

options

Object

An object of string key-value pairs to supply to Spark as data frame read or write options. This will be merged with the data source options set, with values here overriding duplicate values specified on the data source.

Required


isWritable

Boolean

Whether Magpie is allowed to write to the underlying source.

Optional

true

mappingType

String

Type of data source, Generic for generic sources.

Required


Example

{
  "sourceName": "my_elastic_source",
  "path": "media/movies",
  "options": {
    "es.read.field.as.array.include": "actor, genre"
  },
  "mappingType": "Generic"
}



SQL Query

Specification

{
  "sql": "<string>",
  "mappingType": "Sql"
}

Structure Values

Field Name

Type

Description

Required

Default

sql

String

SQL query string to execute in Magpie.

Required


mappingType

String

Type of persistence mapping, Sql for SQL queries.

Required


Example

{
  "sql": "SELECT product, sum(revenue) AS sales FROM transactions",
  "mappingType": "Sql"
}



Was this article helpful?
0 out of 0 found this helpful