Operational Data Analytics

May 23, 2023·
Michael Ott
Michael Ott
,
Kadidia Konaté
Melissa Romanus
Melissa Romanus
Rachel Palumbo
Rachel Palumbo
Woong Shin
Woong Shin
Torsten Wilde
Torsten Wilde
· 1 min read
Abstract
Many HPC sites around the globe have improved their monitoring capabilities over the last couple of years significantly by leveraging established software tools from the big data domain to collect, stream, and store operational data at unprecedented granularity and detail. Vast amounts of data from the different domains of HPC operations (infrastructure, system hardware, software, applications) are now readily available to be utilized for improving HPC operations. Consequently, the focus is now shifting towards analyzing the data to obtain actionable knowledge for daily operations. While dashboards remain a valuable tool for this task, there is clearly a trend towards AI-based methods to help harvest the humongous amounts of data. As data analytics is not necessarily the core expertise of the data centers that operate the HPC systems, there is a unique opportunity for collaboration with the data analytics community to identify methods and develop tools to monitor and make use of this data treasure.
Event
Location

Congress Center Hamburg

Hamburg,

Session Overview

At ISC 2023 in Hamburg, the ODA BoF returned to pick up where SC22 left off: with sites now collecting operational data at unprecedented scale, the community’s focus had shifted to analyzing it. The session combined two short 10-minute presentations on the state of the art with an extended open discussion, structured around beginner, intermediate, and advanced content in roughly equal proportions.

Key Discussion Themes

  • From dashboards to AI. Dashboards remained valuable, but AI-based methods were emerging as the scalable response to the “humongous amounts of data” operators now held.
  • Collaboration across communities. Data analytics was not typically the core expertise of HPC data centers, creating a clear opening for partnership with the broader data-analytics research community.
  • Operator, researcher, vendor. The session targeted HPC system operators, monitoring specialists, and data-analytics researchers together, reflecting that ODA only works when these three groups share practice.

Outcome

The BoF’s discussion of open data and standardization continued in the ODA team’s monthly meetings under the EEHPCWG and set up the data-standardization focus of the SC23 ODA BoF six months later.