Magpie supports access to any type of data source supported by the underlying Spark environment using a generic data source type. This provides more flexibility to integrate data from sources that are not fully supported yet, but may require a more detailed understanding of the underlying configuration of sources of that type.
At present, Magpie includes the libraries for connecting to Snowflake and ElasticSearch. Please reach out to Support if you would like to connect to another system.
Generic sources have a few key attributes that need to be configured as part of their creation. Please view the Data Source JSON specification for more details. These attributes are set during the initial creation of the source or through an ALTER
statement using a JSON specification to set the values. An example of such a command is shown below.
create data source { "name": "elastic_test", "sourceType": "org.elasticsearch.spark.sql", "supportsTables": true, "options": { "es.nodes" : "vpc-magpie-test-afdslk5l23klj2323.us-east-1.es.amazonaws.com:80", "es.nodes.wan.only" : "true" } };
Once a generic source has been created, the tables within it can be used as the underlying storage for tables within Magpie. One factor to note is that any options
specified when creating a data source will be combined with options
present in the Create Table
command. This can save you time and avoid repetitive tasks. Options in the table command will override options in the data source specification if an option with the same key is specified in both locations. Also note that not all generic sources support tables. Some only support streams.
The following is an example of creating a Magpie table that references an underlying generic source table:
create table movies in schema elasticsearch from data source elastic_test at path "media/movies" with options { "es.read.field.as.array.include": "actor, genre" // "es.nodes" : "vpc-magpie-test-afdslk5l23klj2323.us-east-1.es.amazonaws.com:80", // inherited from data source // "es.nodes.wan.only" : "true" // inherited from data source };
This will result in a table within the Magpie Context that in turn references the source table in the generic source. When this table is queried within Magpie, Magpie will "reach" into the source and pull the data before combining it with other local data.
Security
To access this type of data source, you may need to adjust the security configuration of your cloud environment. Please reach out to a member of the Silectis team with any questions or support requests.