Enabling High Performance Computing Insights via Monitoring and Analysis
Guest speaker Omar Aaziz will present "Enabling HPC Performance Insights via Monitoring and Analysis."
Description: During high performance computing (HPC) system acquisition, a significant consideration is given to the desired performance, which drives the selection of processing components, memory, interconnects, file systems, and more. The achieved performance, however, is highly dependent on operational conditions, applications, and workloads, of which the discovery and assessment are critical in practice.
As such, HPC system analysis has been a long-standing need for administrators to observe the health of their systems, detect abnormal conditions, and to take informed actions when restoring system health. Moreover, users strive to understand how well their jobs run and what are the architecture limits that restrict the performance of their jobs.
In my talk, I will present two research projects. I will introduce a methodology for modeling the expected runtime of an HPC job based on historical application data and data from the job itself using the Neural Network technique. This estimation model is useful for both HPC users and administrators as a metric to establish a measure of performance of the job.
I will then present a data-driven methodology to characterize the relationship between parent and proxy applications based on collecting runtime data from both and then using the data analytics to find their correspondence or divergence. I will explain how to measure the dynamic hardware behaviors and relationships between several parent and proxy application pairs and evaluate several different metrics over the time-varying behavior that are potentially useful for comparing real applications to their proxy counterparts
Location
Room 202
Olin Center for Educational Technology
733 35th St.
Rock Island, IL61201
United States
Tickets
Free