Specification
{
"name": "<string>",
"description": "<string>",
"driverType": "<string>",
"workerCount": <integer>,
"workerType": "<string>",
"executorCores": <integer>,
"storageVolumeSize": "<string>",
"privateKeys": "array<string>",
"bootstrapScript": "<string>",
"pythonLibraries": "array<string>"
}
Structure Values
Field Name | Type | Description | Required | Default |
|---|---|---|---|---|
name | String | Cluster name. | Required | |
description | String | Expanded cluster information. | Optional | None |
driverType | String | The type of instance to use for the cluster driver. Supported values can be found by running the command | Required | |
workerCount | Integer | How many worker nodes to provision for the cluster. Omit or set to zero for a single-node (driver only) cluster. | Optional | 0 |
workerType | String | The type of instance to use for the cluster workers. See | Optional |
|
executorCores | Integer | How many cores to use per Spark executor. | Optional | 4 |
storageVolumeSize | String | How much local storage to provision for each cluster node. Takes the form of a string with units, such as "500 GiB" | Optional | 100 GiB |
privateKeys | Array<String> | A list of private keys that should be attached to the cluster. Private Keys are created using the | Optional | [] |
bootstrapScript | String | Bash script lines to execute at cluster startup. | Optional | None |
pythonLibraries | Array<String> | A list of libraries installable from a requirements.txt file. | Optional | [] |
Example
{
"name": "etl_cluster",
"description": "Cluster to use for nightly ETL runs",
"driverType": "m5.2xlarge",
"workerCount": 2,
"workerType": "m5.4xlarge",
"executorCores": 4,
"storageVolumeSize": "500 GiB",
"privateKeys": ["github_ssh", "gcp_prod_key"],
"bootstrapScript": "pip install some-package",
"pythonLibraries": ["pandas", "matplotlib"]
}