Cluster

May 16, 2021 19:07

Specification

{
  "name": "<string>",
  "description": "<string>",
  "driverType": "<string>",
  "workerCount": <integer>,
  "workerType": "<string>",
  "executorCores": <integer>,
  "storageVolumeSize": "<string>",
  "privateKeys": "array<string>",
  "bootstrapScript": "<string>",
  "pythonLibraries": "array<string>"
}

Structure Values

Field Name	Type	Description	Required	Default
name	String	Cluster name.	Required
description	String	Expanded cluster information.	Optional	None
driverType	String	The type of instance to use for the cluster driver. Supported values can be found by running the command `list instance types` in Magpie. Example values are `m5.4xlarg`e, `n2-standard-16`, or `Standard_D16s_v4`	Required
workerCount	Integer	How many worker nodes to provision for the cluster. Omit or set to zero for a single-node (driver only) cluster.	Optional	0
workerType	String	The type of instance to use for the cluster workers. See `driverType` for supported values.	Optional	`driverType`
executorCores	Integer	How many cores to use per Spark executor.	Optional	4
storageVolumeSize	String	How much local storage to provision for each cluster node. Takes the form of a string with units, such as "500 GiB"	Optional	100 GiB
privateKeys	Array<String>	A list of private keys that should be attached to the cluster. Private Keys are created using the `create private key` command.	Optional	[]
bootstrapScript	String	Bash script lines to execute at cluster startup.	Optional	None
pythonLibraries	Array<String>	A list of libraries installable from a requirements.txt file.	Optional	[]

Example

{
  "name": "etl_cluster",
  "description": "Cluster to use for nightly ETL runs",
  "driverType": "m5.2xlarge",
  "workerCount": 2,
  "workerType": "m5.4xlarge",
  "executorCores": 4,
  "storageVolumeSize": "500 GiB",
  "privateKeys": ["github_ssh", "gcp_prod_key"],
  "bootstrapScript": "pip install some-package",
  "pythonLibraries": ["pandas", "matplotlib"]
}