03-18, 08:30–10:00 (US/Eastern), Tsai Auditorium (CGIS S010)
This tutorial will demonstrate how to integrate XDMoD job statistics graphics on the OnDemand dashboard and how to configure XDMoD to aggregate OnDemand usage logs.
Building off of the highly successful HPC Toolset Tutorial (https://github.com/ubccr/hpc-toolset-tutorial) developed by the teams at OSC and University at Buffalo CCR, this 90 minute tutorial will take users through the steps needed to integrate Open XDMoD (https://open.xdmod.org/) with OnDemand. Open XDMoD is an application for monitoring cyberinfrastructure resources such as HPC clusters, storage, cloud, and OnDemand instances. Open XDMoD provides standard metrics such as utilization, provides quality of service metrics designed to proactively identify underperforming system hardware and software, and can report job level performance data for every job running on the HPC system.
There are two types of OnDemand-XDMoD integration possible. The first allows XDMoD user specific job usage graphs to display on the OnDemand dashboard. Users are presented with information such as the most recent month's job efficiency report, core hours efficiency report, and recently completed job information. The second part of the tutorial will demonstrate the xdmod-ondemand module, an optional add-on for Open XDMoD that allows for the display and analysis of Open OnDemand usage. This is intended to be used by HPC center staff to analyze who uses OnDemand at their center, how frequently it's used and what applications within OnDemand are utilized.
This tutorial will use the Docker container setup created for the HPC Toolset Tutorial and step participants through the configurations required for the two different integration options. This tutorial will be fast paced and is intended for experienced HPC system administrators who are relatively familiar with the configuration of both Open XDMoD and Open OnDemand. Attendees have the option to either participate in the step-by-step configuration using Docker on their laptops and running the HPC Toolset Tutorial containers locally or they can simply follow along as the speaker demonstrates.
Tutorial setup:
The container environment requires approximately 25GB of disk space and should be downloaded prior to the tutorial from here: https://github.com/ubccr/hpc-toolset-tutorial
Run the first step of the tutorial under the "XDMoD Setup & Job Ingestion" section (https://github.com/ubccr/hpc-toolset-tutorial/blob/master/xdmod/README.md#xdmod-ondemand-integration-tutorial) prior to the start of the tutorial as this process can take 25-30 minutes:
- Submit jobs to the cluster
Dori is a senior systems administrator at the University at Buffalo's Center for Computational Research. She manages the help desk team, is responsible for the Open OnDemand service, and is the project manager for the ColdFront resource and allocations management portal (https://github.com/ubccr/coldfront). You may have heard her at HPC conferences and other community events evangelizing the trifecta of cyberinfrastructure tools - ColdFront, Open OnDemand, and Open XDMoD.