Brownbag Test Run

Nov 20, 2024ยท
Michael Ott
Michael Ott
ยท 1 min read
Abstract
Operational data analytics (ODA) provides unique opportunities to analyze, understand, and optimize operations of HPC systems. Readily available open-source frameworks make the collection of monitoring data from different domains of the HPC system increasingly easy. However, making the data work for HPC operations is not straightforward and HPC sites are duplicating efforts to develop methods and tools to analyze and leverage the data. AI-based analysis methods are appealing, but certainly not the only option. This BoF aims to bring together practitioners in HPC operations to share use cases for ODA, discuss problems and provide feedback. To support this BoF, I discuss the data journey of OLCF in the past two generations of system and share lessons learned in an interactive way.

Discussion

The interesting details of the discussion.

Michael Ott
Authors
Michael Ott
ODA Team Leader
Michael Ott is a senior research engineer in the Future Computing group at Leibniz Supercomputing Centre (LRZ)