Cloud-based data warehouses such as Google BigQuery are frequently used in analytics. Magpie can access data from BigQuery using specialized connectors.
Interacting with BigQuery is similar to interacting with relational databases. By creating a BigQuery data source, you can use Magpie to access data within the warehouse cluster and either import data into Magpie for further analysis, or execute queries directly against the live warehouse.
Google BigQuery sources have a few key attributes that need to be configured as part of their creation. Please view the Data Source JSON specification for more details. An example of such a command for BigQuery is shown below.
create data source { "name": "my_bigquery_source", "sourceType": "BigQuery", "projectId": "sample-bq-project", "billedProjectId": "sample-bq-project" };
Once a BigQuery source has been created, the tables within it can be used as the underlying storage for tables within Magpie. The commands present in the "Accessing Data From JDBC Sources" section are broadly applicable to BigQuery data sources as well.
Note that BigQuery is case sensitive when referencing datasets and tables whereas Magpie is case insensitive, and converts all references to lowercase before processing. If you need to access BigQuery tables with capital letters or special characters in the name, quote the reference using backticks. For example:
list tables in data source my_bigquery_source schema `MyCaseSensitiveSchema`
Magpie requires a Google Cloud Storage bucket to use as a staging data source. We recommend provisioning a dedicated bucket for this purposed and creating a dedicated service account to access this bucket.
Add the staging bucket to Magpie as a data source
create data source { "name": "bigquery_staging", "sourceType": "hdfs", "hdfsType": "gs", "host": "sample-staging-bucket" }
Then reference the data source in your Google BigQuery datasource:
alter data source my_bigquery_source { "name": "my_bigquery_source", "sourceType": "BigQuery", "projectId": "sample-bq-project", "billedProjectId": "sample-bq-project", "defaultStagingSourceName": "bigquery_staging" // Reference the staging data source };
When using Magpie to read views that are part of a BigQuery linked-dataset, ensure you include the data source option materializationDataset
to point to where the BigQuery connector can materialize the view. This dataset must be in the same region as the linked-dataset. Without this, the BigQuery connector will attempt to materialize the view within the linked-dataset, which is read only, and will cause an exception.
Security
To access this type of data source, you may need to adjust the security configuration of your cloud environment. Please reach out to a member of the Silectis team with any questions or support requests.
To utilize BigQuery, you need to enable the BigQuery API (bigquery.googleapis.com
) and the BigQuery Storage API (bigquerystorage.googleapis.com
) from the Google Cloud Console. See Google’s page on Enabling and disabling APIs for more info. Furthermore, you will need to grant several roles to the Magpie cluster service account to allow access.
The table below summaries the role requirements:
Role |
Reading from Tables |
Reading from Views |
Writing to Tables |
Executing SQL |
---|---|---|---|---|
|
X |
X |
X |
|
|
X |
X |
||
|
X |
X |
||
|
X |
X |
X |
|
|
X |
|||
|
X |