May 22, 2025 Minutes

May 22, 2025ยท
Natalie Bates
Natalie Bates
ยท 5 min read

We started with a discussion about whether the “brown bag” presentations could be useful for a much larger, public audience (beyond the ODA Team), possibly through a website. Further, this offers presenters an “invited publishing credit,” serving as an incentive. Michael Ott proposed reusing recent “brownbag presentations,” specifically mentioning the recent ones on LLV and cluster cockpit. Filipe Guimaraes confirmed he was comfortable with his presentation being shared, asking if a more detailed discussion was needed to determine if any editing or specific format was required.

The discussion centered on scheduling brown bag presentations, specifically whether to hold them bi-weekly and if they should coincide with the ODA (Operational Data Analytics) team meetings. While a bi-weekly schedule and a consistent time slot (like the current one, favorable for Europe and North America) were generally preferred, it was decided not to exclusively use the ODA team meeting for these presentations. However, it could be utilized on occasion if no other ODA business is planned. The participants agreed that a minimum of two weeks’ notice should be given for scheduled presentations. Ultimately, it was decided that a separate sub-team meeting would be scheduled to finalize the detailed plan for these presentations and to secure speakers from HPC sites and Oda software developers.

A historical overview of the ODA SC25 Birds of Feather topics suggests that 2019/2021 focused on building the ODA software stack and framework. In 2022, the focus shifted to managing the abundance of collected data (“drowning in data”). Standardization and collaboration to share tools and methods across sites were key themes in 2023. Last year (2024) addressed the “data journey” over time and the role of AI in data analysis. For the current year, discussions are centered on “bridging the gap between practice and research.” This involves addressing the challenge where researchers often need data but don’t have access, while HPC sites possess data but may lack analysis tools and face difficulties sharing due to confidentiality concerns (like GDPR in Europe). A related discussion involved making monitoring data more accessible to HPC system users to provide deeper insights into their jobs.

Michael Ott added that the upcoming EuroHPC-funded “Synergies project” will mandate 13 large HPC sites in Europe to make their data available, including developing anonymization methods and a data submission platform.

Woong Shin explained that existing DOE-level scientific data sharing frameworks, designed for scientific, observation, or simulation data, do not effectively apply to Operational Data Analytics (ODA) data. He emphasized that a specific approach is needed for ODA data sharing, as current methods “break down” and are not particularly useful. A key challenge he highlighted is the debate within their approval process (like the IRB) regarding whether ODA data, such as job yielding behavior, could inadvertently reveal information about individual researchers, underscoring the need to thoroughly address these identifiability concerns.

Jeff Hanson highlighted that new concepts like “exigit” and “digital twin” are gaining traction. He specifically referenced Terry Jones from Oak Ridge, who is working on “application fingerprinting” and plans to submit a BoF (Birds of a Feather). Jones’s work reveals that existing data sets often lack sufficient fidelity, making them “damaged” or unusable for application fingerprinting. This problem underscores the critical need to define effective data sharing mechanisms and to revisit the standards discussed two years prior. Hanson concluded that data sharing, and how it looks, is a “top of mind” concern not only for operators but for users as well.

A security concern regarding operational data is the possibility that it could reveal what types of codes are running at different locations. This potential for revealing code information alone is deemed unacceptable within certain security domains, making the operational data restricted for this reason, similar to scientific data.

Data sensitivity stems from multiple concerns. On the research side, even in open science facilities, there’s a fear of research being “scooped” or reverse-engineered. From an operational standpoint, she recalled an instance where a data breach involving operational data allowed someone to reverse-engineer details about a facility’s cooling system. Additionally, there are general concerns about how data might be interpreted, what conclusions could be drawn, and the potential negative impact on vendor relationships or the public reporting of the facility.

The discussed topics would be excellent for the Birds of a Feather (BoF) session, as they would encourage input from various parties. These topics could attract additional participants, particularly application developers interested in gaining more insight into code behavior on machines.

The white paper outline will be regularly revisited in ODA team meetings to keep it on the agenda and track progress. The goal is for it to serve as a comprehensive resource, covering introductions to ODA, pipeline components, data organization (e.g., naming, metadata, sampling), visualization tools (e.g., Grafana, LLView, Cluster Cockpit), data analysis techniques (including AI/ML), use cases, best practices, and the “buy or make” decision for solutions.

due to bandwidth concerns, the detailed white paper discussion and writing may be sequenced after securing speakers for the brown bag series and finalizing the Birds of a Feather (BoF) submission, which has a mid-July deadline. The BoF submission is prioritized given its impending deadline.

Natalie Bates
Authors
Natalie Bates
EE HPC WG Technical and Executive Lead
Natalie has been the technical and executive leader for EE HPC WG that disseminates best practices, shares information (peer to peer exchange), and takes collective action since its inception in 2010.