03-18, 16:00–16:25 (US/Eastern), Tsai Auditorium (CGIS S010)
Some organizations, such as CSC - IT Center for Science, provide the users with access to multiple supercomputers, where each of the supercomputers may have completely separate instances of Open OnDemand. This leads to a fragmented user experience, where the user is required to log in to another instance to access another supercomputer, as well as increased time spent on maintaining multiple instances. This talk targets system administrators, service owners, and other persons responsible for maintaining and developing Open OnDemand instances, and discusses the benefits and challenges of providing a single instance of Open OnDemand, which is connected to all of the organization's supercomputers and potentially even partner organizations' supercomputers.
Open OnDemand (OOD) is typically deployed in, or close to the supercomputer, usually on login or utility nodes, where the supercomputer file system can be mounted, direct access to the scheduler exists, and compute nodes can be accessed. However, in cases where the supercomputers are geographically distributed or managed by external organizations, these pre-requisites may not be fulfilled.
This talk explores the possibilities and challenges of deploying OOD in a different environment than the supercomputer itself, enabling the possibility of providing access to multiple remote supercomputers through a single centralized instance of OOD.
As part of the exploration work, CSC developed two prototypes of a centralized OOD instance to discover challenges and test potential solutions. Based on the prototypes, the main technical challenges consist of identity and access management, file system access, compute node access, scheduler access and potential performance concerns.
The first prototype was based on direct SSH access to the national supercomputers, with quick-and-dirty SSHFS mounts to explore the other challenges of federated OOD. The second prototype utilized FirecREST for accessing the supercomputer, and included an additional abstraction layer to solve the challenges related to the file system. Future work in this area consists of building a production-ready implementation of a federated OOD instance.
The goal of the talk is to provide a foundation for future discussion surrounding the topic, gather ideas for technical solutions and future development, as well as provide insight about the topic to other organizations interested in achieving a single, centralized OOD instance that could provide access to multiple supercomputers.
Working at CSC - IT Center for Science in Finland with configuring and developing our instances of Open OnDemand for both our national supercomputers Puhti and Mahti, and the European supercomputer LUMI, hosted by CSC.