Using Scala in Magpie

Overview

Magpie supports the Scala Programming Language. Scala is a general-purpose programming language providing support for functional programming and a strong static type system. It is also the language that Apache Spark is written in, so it serves as a natural way to access the underlying Spark APIs for Magpie tables.

Scala Libraries

Many popular Scala and Java libraries are already installed in your Magpie environment. You can load packages as you normally would in Scala via an import statement, and can list the currently available libraries by running the snippet below.

%scala
# List all packages in Java class path
sys.props("java.class.path")
  .split(":")
  .sorted
  .foreach(println)

Magpie Context

You will often interact with data in Magpie using the MagpieContext (available as mc). The MagpieContext provides a handle to the Magpie environment and allows you to easily retrieve tables from Magpie as DataFrames, execute commands in the Magpie layer, and access results. Read our documentation for the magpie Scala library to learn more.

Note that the underlying SparkContext and SparkSession are available within Magpie as well. There is no need to instantiate these objects separately. They are exposed within Magpie as sc for the SparkContext and spark for the SparkSession.

Examples

Scala scripting blocks in the notebook are denoted by the %scala header. In the following example we show how users can create views in Magpie that pivot a dataset by one of its columns. In this case, we will pivot sample_table by it’s integer year column. After using Spark to collect the distinct values in the column, we can use the MagpieContext to run SQL queries on Magpie tables and execute Magpie commands iteratively within standard Scala code.

%scala

import spark.implicits._

// Get every year from sample_table)
val years = mc.sql("select distinct year from sample_table")
  .as[Int]
  .collect()

// create views for each year
years
  .map({ y =>
    // calculate a table name for each year
    y -> s"sample_view_$y"
  })
  .foreach({ case (year, table) =>
    // create the table
    mc.exec(s"""
    create temp table $table from sql(select * from sample_table where year=$year)
    """)
  })

This type of process where we access Magpie data or metadata and iterate through the results to execute a parameterized set of Magpie commands can be a very powerful way to automate repeated tasks in Magpie.

For more in depth examples visit “Using Scala in Magpie” in the “Magpie Tutorials” notebook on your Magpie cluster.

Was this article helpful?
0 out of 0 found this helpful