November 6, 2025 Minutes
The meeting was dedicated to a dry run for the SC25 Birds of a Feather (BoF) session, “Operational Data Analytics: Mind the Gap,” led by Tim Osborne and featuring presentations from Wolfgang Frings and Terry Jones.
Wolfgang Frings presented LLView, an open-source HPC monitoring tool developed at JSC, which collects job-relevant data from various sources (such as SLURM, I/O, and GPU info) to provide a web portal with role-based access and near real-time job reports. LLView is used on the JUPITER exascale system and helps users self-diagnose issues, for example memory problems indicated by increasing memory usage over time, while also improving the communication flow between users, support staff, and administrators.
Terry Jones then presented the researcher perspective of Operational Data Analytics, illustrating it through the ExaDIGIT project, an international collaboration developing a modular digital twin framework for supercomputers that combines workload, power, and cooling data. A core component of this work is Application Fingerprinting, which characterizes an application’s telemetry to predict its resource needs (such as I/O or network bandwidth) and enables intelligent scheduling that prevents resource contention and speeds up scientific discovery.
Following the talks, Tim moderated an interactive Mentimeter session that gauged audience demographics and identified key community gaps in ODA. Participants highlighted the need for standardization, better anonymization techniques, and more publicly available datasets, with specific calls for networking data and higher fidelity metrics.
The next monthly ODA team meeting will serve as a debrief for the SC25 BoF session, where the team will review the Mentimeter results and other details. After the BoF session at SC25, the organizers planned to hold a continuing discussion in the Landmark 3 meeting room at the Marriott St. Louis Grand from 1:30 PM to 2:25 PM, in a room provided by HPE.
On scheduling, the team meeting that would have fallen on January 1st (New Year’s Day) is cancelled, and the January 29th meeting is also cancelled because it collides with HPC Asia. The first meeting of the new year will be held on January 15th, followed by the typical four-week cadence, and the January 15th meeting will mark the start of the new talk series.