03-19, 11:30–11:55 (US/Eastern), Tsai Auditorium (CGIS S010)
Research computing organizations facilitate scientific investigations by providing access to computational resources, advanced networking, and ample storage to support the demands of scientific workflows. Traditional HPC systems run as self-contained environments with a head node that defines access to all resources and orchestrates operation of the cluster. Managing access to these services over the various lifetimes of hardware, software, clusters, and facilities presents challenges in maintaining access for users to different systems as they evolve. At UAB we are building a software defined HPC environment to manage evolution of our systems by implementing an A/B testing framework that leverages Open OnDemand as the web interface to different generations of hardware.
This talk will cover the use case of building a A/B testing environment that enables us to route our cluster users to existing and new cluster resources base on their membership is specific groups. We describe the general motivation for building this capability and provide details on how we used this new framework to migrate our user community from GPFS4 to GPFS5 and our cluster systems from an on-campus data center to a commercial colocation facility.
John-Paul Robinson is an HPC Architect with the Research Computing organization at the University of Alabama at Birmingham. He has multiple decades of experience building distributed systems to support science.