|
|
Paper: |
Massive Scientific Workloads - Lessons Learned From Petaflop-Scale Weather Simulations |
Volume: |
521, Astronomical Data Analysis Software and Systems XXVI |
Page: |
577 |
Authors: |
Pierfederici, F. |
Abstract: |
Weather forecasts run at the European Centre for Medium-Range Weather Forecasts
(ECMWF) are complex workloads which use tens of thousands of CPU cores from two
of the most powerful supercomputers in the world (top twenty of the top 500
list). They run for potentially weeks on end and process hundreds of millions
of observation datasets.
Each of these forecast simulations is a heterogeneous mix of hybrid MPI-OpenMP
Fortran/C/C++ numerical code surrounded by a host of Python and Shell scripts
staging data in and out of databases, creating high-level products, performing
sanity check on inputs and outputs, etc. When running on a HPC cluster, they
each spawn tens of thousands of jobs in a very deep dependency graph.
Monitoring, profiling, debugging these complex workloads and their dependency
rules is a herculean task, made more difficult by the fact that the tools
one can use to analyze compiled executables (e.g. darshan and Allinea MAP)
lose much of their power or are completely unusable when dealing with
scripts. Important issues of machine over-subscription and CPU power
management are also left untackled.
Lessons learned at ECMWF in the approach to whole-workload profiling of weather
simulations will be presented. Their applicability to present and future
astronomy processing needs will be investigated as well. |
|
|
|
|