Introduction to the Spark Web UI

February 02, 2020 18:58

Magpie surfaces monitoring and instrumentation data of its underlying Spark compute layer via the Spark Web UI. As an cluster admin, you can use this interface to monitor the status and resource consumption of your Magpie cluster. Note that normal Magpie users cannot access this interface.

The interface can be accessed at the /spark/ path for your cluster: https://<orgname>.silect.is/clusters/<clustername>/spark/

It can also be reached via the Magpie notebook, in the top right menu of individual paragraphs. The link will appear when an action is run from the paragraph that utilizes Spark resources (i.e., Markdown paragraphs will not display the link).

Spark Web UI Home Page

The home page (at /spark/ has a few components that can be useful to users.

Workers
- This section displays the status of worker nodes. In this case, we see a single worker node with 16 cores.
Running Applications
- This section typically displays one application, which is the application responsible for executing tasks from the Magpie Notebook, Magpie CLI, and BI tools.
Completed Applications
- This section contains all completed applications. If the Magpie application restarted due to an error, you may see previous runs of the application appear here.

Click on the application name to view active and completed jobs.

Job View

Active Jobs
- At a glance, this section provides information about active jobs. Look here first when debugging whether a job is stalled or making progress.
Completed Jobs
- Magpie logs an extensive record of completed Spark Jobs. This section is particularly useful for an understanding of how long jobs are taking to execute and how many stages are being generated for particular jobs.
Failed Jobs
- Scrolling to the bottom of the page, you’ll be presented with a log of failed jobs, which may provide additional detail about why a particular action did not succeed.

Job Detail View

Within the Job View, click on the link generated within the Description field to navigate to the Job Detail View.

Review this interface to see details for each Active, Completed, or Failed job.

By default, you are presented with metrics such as Duration, Number of Tasks, how much data is read out (Shuffle Read), and how much data is written out (Shuffle Write) in the course of the job execution.

Clicking on the +details element in Description field expands the field to see additional execution details. The example above is fairly uneventful, but for failed jobs you may surface details relevant to your debugging and troubleshooting efforts (e.g., memory or connection issues).

Navigation Bar

A detailed explanation of each of these features is beyond the scope of this introduction, but advanced Spark users can use these for troubleshooting, debugging, and optimizations.

Magpie Logo - This provides easy navigation back to the landing page of you Magpie Notebook.
Jobs - The aforementioned Job View. A list of jobs that have been scheduled or run, along with job and task progress.
Stages - Details for each stage, including Read/Write metrics and links to Job Details pages
Storage - View details of RDD persistence. Not relevant for most Magpie implementations.
Environment - Runtime information, Spark properties, and system properties
Executors - Summary and detailed metrics on Spark executors
SQL - Provides a list of all Running, Completed, and Failed queries, with links to the relevant Job ID(s) and the query’s logical plan
JDBC/ODBC Server - Shows all the sessions from BI tools