Overview
Magpie supports the Scala Programming Language. Scala is a general-purpose programming language providing support for functional programming and a strong static type system. It is also the language that Apache Spark is written in, so it serves as a natural way to access the underlying Spark APIs for Magpie tables.
Scala Libraries
Many popular Scala and Java libraries are already installed in your Magpie environment. You can load packages as you normally would in Scala via an import
statement, and can list the currently available libraries by running the snippet below.
%scala # List all packages in Java class path sys.props("java.class.path") .split(":") .sorted .foreach(println)
Magpie Context
You will often interact with data in Magpie using the MagpieContext
(available as mc
). The MagpieContext provides a handle to the Magpie environment and allows you to easily retrieve tables from Magpie as DataFrames, execute commands in the Magpie layer, and access results. Read our documentation for the magpie
Scala library to learn more.
Note that the underlying SparkContext and SparkSession are available within Magpie as well. There is no need to instantiate these objects separately. They are exposed within Magpie as sc
for the SparkContext and spark
for the SparkSession.
Examples
Scala scripting blocks in the notebook are denoted by the %scala
header. In the following example we show how users can create views in Magpie that pivot a dataset by one of its columns. In this case, we will pivot sample_table
by it’s integer year
column. After using Spark to collect the distinct values in the column, we can use the MagpieContext
to run SQL queries on Magpie tables and execute Magpie commands iteratively within standard Scala code.
%scala import spark.implicits._ // Get every year from sample_table) val years = mc.sql("select distinct year from sample_table") .as[Int] .collect() // create views for each year years .map({ y => // calculate a table name for each year y -> s"sample_view_$y" }) .foreach({ case (year, table) => // create the table mc.exec(s""" create temp table $table from sql(select * from sample_table where year=$year) """) })
This type of process where we access Magpie data or metadata and iterate through the results to execute a parameterized set of Magpie commands can be a very powerful way to automate repeated tasks in Magpie.
For more in depth examples visit “Using Scala in Magpie” in the “Magpie Tutorials” notebook on your Magpie cluster.