Accessing Data from Redshift

February 04, 2020 17:05

Cloud-based data warehouses such as Amazon Redshift are frequently used in analytics. Magpie can access data from Redshift using specialized connectors.

Interacting with Redshift is similar to interacting with relational databases. By creating a Redshift data source, you can use Magpie to access data within the warehouse cluster and either import data into Magpie for further analysis, or execute queries directly against the live warehouse. Additionally, we’ve layered on capabilities to make the Redshift connection more performant. We've provided a Bulk Access connector, which yields much faster performance for large scale reads and writes but may be slower for ad hoc interactive queries.

Amazon Redshift sources have a few key attributes that need to be configured as part of their creation. Please view the Data Source JSON specification for more details. An example of such a command for Redshift is shown below.

create data source {
  "name": "my_redshift_source",
  "sourceType": "redshift",
  "endpoint": "magpie-test.fdas53lkj234y.us-east-1.redshift.amazonaws.com:5439",
  "database": "analytics",
  "user": "my_user_name",
  "password": "**********",
  "defaultStagingSourceName": "default_magpie_s3_source" // if using bulk access
}

Once a Redshift source has been created, the tables within it can be used as the underlying storage for tables within Magpie. The commands present in the "Accessing Data from JDBC Sources" page are broadly applicable to PostgreSQL, Redshift, and BigQuery data sources as well.

Security

To access this type of data source, you may need to adjust the security configuration of your cloud environment. Please reach out to a member of the Silectis team with any questions or support requests.

If you plan to use the Bulk Access connector, you will need to specify defaultStagingSourceName in the data source definition. This is the the default S3 data source from which COPY and UNLOAD commands operate. The iamRole parameter can also be used to specify an AWS IAM role that the cluster should use while using the Bulk Access connector. This IAM role should have access to read and write temporary files in the S3 staging bucket. By default, Magpie forwards its IAM credentials to redshift for the COPY and UNLOAD operations.