Operational Data Analytics: Drowning in Data
Kay Bailey Hutchison Convention Center
Dallas, Texas
Session Overview
SC22 in Dallas marked a shift in the ODA BoF series. Where earlier sessions focused on building the monitoring stack, “Drowning in Data” turned to the problem that came after: sites had successfully collected vast amounts of telemetry but now struggled to visualize it at useful granularity or to analyze it for actionable knowledge.
The session combined presentations from US, European, and Asian HPC facilities on specific use cases and lessons learned with open audience discussion, leaving roughly half the session for interaction.
Why This Matters
Most HPC sites were engaged in some form of ODA, whether they called it that or not. Many were overwhelmed by the amount of data they collected and found it difficult to either visualize it in enough detail or find the right tool to extract actionable knowledge. The big-data world offered many methods, but picking the right one required expertise both in data analytics and in the HPC domain.
Threshold-based alarms frequently produced nuisance alerts that overwhelmed operators. Anomaly-based methods held promise for making alarms more relevant. The BoF brought ODA researchers, HPC operators, and data-analytics experts into the same room to share experiences and lessons learned.
Historical Context
“Drowning in Data” built directly on the SC18 and SC19 BoFs on ODA infrastructure and on the SC21 BoF that introduced the 4x4 ODA conceptual framework. It set up the standardization and interoperability themes that would dominate the following years’ sessions.