SPPEXA Projects - Phase 2 (2016 - 2018)
|
The findings of all projects of the second phase of SPPEXA are summarized in the book "Software for Exascale Computing - SPPEXA 2016-2019", published in the Springer series on Lecture Notes in Computational Science and Engineering, volume 136.
EXA-DUNE - Flexible PDE Solvers, Numerical Methods, and Applications | |
Principal Investigators | Peter Bastian (University of Heidelberg) Olaf Ippisch (TU Clausthal) Mario Ohlberger (University of Münster) Christian Engwer (University of Münster) Stefan Turek (TU Dortmund) Dominik Göddeke (University of Stuttgart) Oleg Iliev (Fraunhofer ITWM and TU Kaiserslautern) |
Contact | Peter Bastian |
In this interdisciplinary project consisting of computer scientists, mathematicians and domain experts from the open source projects DUNE and FEAST we develop, analyse, implement and optimise new numerical algorithms and software for the scalable solution of partial differential equations (PDEs) on future exascale systems exhibiting a heterogeneous massively parallel architecture. The DUNE software framework combines flexibility and generality with high efficiency by the use of state-of-the-art programming techniques and interchangeable components conforming to a common interface. Incorporating the hardware-oriented numerical techniques of the FEAST project into these components allowed us already during the first funding phase to optimally exploit the performance of heterogeneous architectures with their three-level parallelism (SIMD vectorisation, multithreading, message passing) while at the same time being able to support a variety of different applications from the steadily growing DUNE user community. In order to cope with the increased probability of hardware failures, a central aim in the second funding period is to add flexible, application-orientied resilience capabilities into the framework which, based on a common infrastructure, includes on the one hand ready-to-use self-stabilising iterative solvers and on the other hand global and local checkpoint restart techniques. Continuous improvement of the underlying hardware-oriented numerical methods is achieved by combining matrix-free sum-factorisation based high-order discontinuous Galerkin discretisations with matrix-based algebraic multigrid low-order subspace correction schemes resulting in both robust and performant solvers. On top of that, extreme scalability is facilitated by exploiting massive coarse grained parallelism offered by multiscale and uncertainty quantification methods where we now focus on the adaptive choice of the coarse/fine scale and the overlap region as well as the combination of local reduced basis multiscale methods and the multilevel Monte-Carlo algorithm. As an integral part of the project we propose to bring together our scalable PDE solver components in a next-generation land-surface model including subsurface flow, vegetation, evaporation and surface runoff. This development is carried out in close cooperation with the Helmholtz-Centre for environmental research (UFZ) in Halle which provides the additional modelling expertise as well as measurement data from multiple sources (experimental sites, geophysical data, remote sensing, \dots). Together we set out to provide the environmental research community with an open source tool that contributes to the solution of problems with high social relevance. Article about EXA-DUNE on InSide
|
Smart-DASH - Smart Data Structures and Algorithms with Support for Hierarchical Locality | |||||
Principal Investigators |
| ||||
Contact | Karl Fürlinger | ||||
Link | http://www.dash-project.org/ | ||||
Smart-DASH builds upon previous work with DASH, a data-structure oriented C++ template library that can exploit hierarchical organization afforded by machines and algorithms. Smart-DASH extends this work by addressing some of the most pressing challenges on the way to exascale. The runtime and C++ template library will be extended to support a task-based execution model and with the notion of memory spaces, Smart-DASH will be able to exploit the complex memory architecture of upcoming machines. Several case studies will explore the utility of these new features in the context of important scientific problem classes. As part of the project we will develop smart data structures that capture frequently encountered application scenarios, such as N-dimensional arrays with built-in support for halo regions, to enable a productive transition onto new hardware platforms and assist in code modernization efforts. To address fault-tolerance and reliability, we will explore concepts for the redundant storage of data items and with the DASH data dock we will explore the usage of the PGAS approach in general and NVRAM in particular for the runtime-coupling of applications. |
Terra-Neo - Integrated Co-Design of an Exascale Earth Mantle Modeling Framework | ||||
Principal Investigators |
| |||
Contact | Ulrich Rüde | |||
Link | http://terraneo.fau.de/ | |||
Geophysics research depends increasingly on Earth mantle modeling software. However, simulating Earth mantle dynamics requires a resolution in space and time that makes it one of the grand challenge applications in the computational sciences. While the first funding phase was devoted to exploring the mathematical, algorithmic, and physical foundations of Earth mantle modeling, we will shift our focus in the second phase to the design of a new software architecture for Earth mantle dynamics simulations. |
EXASTEEL-2 - Dual Phase Steels - -From Micro to Macro Properties | |||||||
Principal Investigators |
| ||||||
Contact | Axel Klawonn | ||||||
Link | http://www.numerik.uni-koeln.de/14079.html | ||||||
In the EXASTEEL-2 project, experts on scalable iterative solvers, computational modeling in materials science, performance engineering, and parallel direct solvers are joining forces to develop new computational algorithms and implement software for a grand challenge problem from computational materials science. There is an increasing need for predictive simulations of the macroscopic behavior of complex new materials. In the EXASTEEL-2 project, this problem is considered for modern micro-heterogeneous (dual-phase) steels, attempting to predict the macroscopic properties of new materials from those on the microscopic level. It is the goal to develop algorithms and software towards a virtual laboratory for predictive material testing in silico. A bottleneck is the computational complexity of the multiscale models needed to describe the new materials, involving sufficiently accurate micromechanically motivated models on the crystalline scale. Therefore, new ultra-scalable nonlinear implicit solvers will be developed and combined with a highly parallel computational scale bridging approach (FE^2), intertwined with a consequent and permanent performance engineering, to bring the challenging engineering application of a virtual laboratory for material testing and design to extreme scale computing. We envisage a continuously increased transition from descriptive to predictive macroscopic simulations and take into account, to the best of our knowledge for the first time within a computational scale bridging approach, the polycrystalline nature of dual phase steels including grain boundary effects at the microscale. Our goals could not be reached without building on the algorithm and software infrastructure from EXASTEEL-1. We will complete the paradigm shift, begun in the EXASTEEL-1 project, from Newton-Krylov solvers to nonlinear methods (and their composition) with improved concurrency and reduced communication. By combining nonlinear domain decomposition with multigrid methods we plan to leverage the scalability of both implicit solver approaches for nonlinear methods. Although our application is specific, the algorithms and optimized software will have an impact well beyond the particular application. Nonlinear implicit solvers are at the heart of many simulation codes, and our software building blocks PETSc, BoomerAMG, PARDISO, and FEAP are all software packages with a large user base. The advancement of these software packages is explicitly planned for in the work packages of this project. The project thus adresses computational algorithms (nonlinear implicit solvers and scale bridging), application software, and programming (PE, hybrid programming, accelerators). |
GROMEX - Unified Long-range Electrostatics and Dynamic Protonation for Realistic Biomolecular Simulations on the Exascale | ||||
Principal Investigators |
| |||
Contact | Carsten Kutzner | |||
Link | http://www.mpibpc.mpg.de/grubmueller/sppexa | |||
Computer simulations on the basis of theoretical physics and chemistry have grown to be invaluable tools of scientific research on molecular function and structure. Such simulations share common challenges with simulations on other strongly interacting systems, e.g. in astrophysics. Advancing the field of molecular simulation to exascale computing is thus highly beneficial to science and the public. The most costly part of molecular simulations is the computation of electrostatic long-range interactions. Thus, the efficiency of calculating these interactions is decisive for the whole simulation. Molecular electrostatics is complicated by the fact that molecules can contain titratable sites whose charge distribution varies over time. This variability originates, e.g., from protonation or redox reactions, or binding of different drug molecules. These reactions are intricately coupled to electrostatics as well as crucial for the function and interaction properties of many (bio)molecules. Thus, a realistic treatment of electrostatics in biomolecular simulation has to account for the different forms of titratable sites. The particle mesh Ewald method (PME, currently state of the art in molecular simulation) does not scale to large core counts as it suffers from a communication bottleneck, and does not treat titratable sites efficiently. In this project, we combine a fast multipole method (FMM) with a λ-dynamics method to both alleviate the PME bottleneck and, for the first time, enable realistic chemical variability of titratable sites in molecular simulations. The FMM will enable an efficient calculation of long-range interactions on massively parallel exascale computers, including alternative charge distributions representing various forms of titratable sites. λ-dynamics allows for a smooth interconversion between site forms during the simulation, which is indispensable for efficient, fully atomistic molecular simulations. In the second funding period, we aim to open up a whole new application range for molecular simulation, both in terms of the hardware that can be utilized at optimum performance and in terms of the type of scientific problems that can be addressed. In detail we will: (1) Extend the current code to allow for multiple local topologies of each site. This will let simulations account for the whole range of variability of titratable sites instead of just protonation. (2) Enable our solver to take full advantage of future exascale hardware including many-core CPUs and accelerators like GPUs or Xeon Phi coprocessors. (3) For optimum scaling on this heterogeneous hardware, we will design and implement a graph-based partitioning scheme to choose from algorithmic alternatives according to latency and throughput of the available hardware devices. Example applications include computational drug design and simulations on the function of nanomachines. |
ExaStencils - Advanced Stencil-Code Engineering | |||||||||
Principal Investigators |
| ||||||||
Contact | Christian Lengauer | ||||||||
Link | http://www.exastencils.org/ | ||||||||
Project ExaStencils pursues a domain-specific design approach for the domain of stencil codes. ExaStencils does not aim at the acceleration of legacy code but rather at a completely new software technology that facilitates the fully automatic, domain and target platform-specific optimization of application code that can be programmed easily by non-computer scientists. Stencils play a central role in high-performance simulation. They are regular access patterns on (usually multidimensional) data grids. Multigrid methods involve a hierarchy of very fine to successively coarser grids. ExaStencils addresses the challenge of achieving exascale performance and provides programmability with a multi-layered domain-specific language and a code generator that enriches user-supplied programs with domain knowledge and uses it for optimization. ExaStencils sophisticated tool support rests on a generator-based product-line technology that produces automatically an implementation that is custom-optimized for the specific problem and platform at hand. After the first funding period, ExaStencils has been able to showcase a full compilation flow for the investigated class of elliptic partial differential equations, including the automatic generation and optimization of functionally correct parallel code for real, still homogeneous supercomputers such as JUQUEEN. The second funding period has two major goals. First, in order to prove massive productivity gains with the ExaStencils approach, yet for a broad range of differential equations and solvers, the spectrum of input specifications shall be extended to a considerably broader class of equations. In particular, linear elasticity equations and Stokes equations will be investigated. Second, in order to combat the power wall of exascale computing, automatic code optimization must not only target performance, but also consider possible performance/energy tradeoffs. This important additional objective of energy consumption shall be analyzed not only for homogeneous, but also for heterogeneous supercomputers, including accelerator technologies such as Xeon Phis and GPUs. The resulting much wider design space is being managed with a machine-learning and product-line-sampling infrastructure that will be able to handle rich kinds of domain knowledge. A proof of flexibility and productivity involves the investigation of mathematical models for performance/power tradeoffs at each language layer with domain-specific knowledge governing the identification of suitable choices, e.g., the choice of smoother, number of grid levels, data or communication structure, as well as new techniques for energy-aware code partitioning (CPU/accelerator), communication-computation scheduling and, finally, also the generation of hybrid target code. Experimental evaluation shall be carried out on both homogeneous and heterogeneous systems, including JUQUEEN, TSUBAME, and SuperMUC. |
ExaFSA - Exascale Simulation of Fluid-Structure-Acoustics Interactions | |||||||
Principal Investigators |
| ||||||
Contact | Miriam Mehl | ||||||
Link | http://ipvs.informatik.uni-stuttgart.de/SGS/EXAFSA/ | ||||||
The ExaFSA project is the continuation of a project with the same title in the first funding period of the priority program. The task was and is to provide methods enabling efficient massively parallel simulations of fluid-structure-acoustics interactions. As our approach is a modular approach, developed methods are going to be usable also for other coupled multi-physics simulations. In the first period, the final goal was split into six subtasks. These subtasks were mainly functionality-oriented with a focus on how to realize the bi-coupling between two fields and the overall coupling in an efficient way (T1-T3), on how to minimize communication costs (T4), howto balance load (T5), and how to develop and integrate visualization (T6). In the second period, the focus slightly changes from solving problems for the different forms of coupling involved to the overall picture of the fluid-structure-acoustics simulation environment. After many questions concerning numerical coupling schemes suitable for parallel simulations have been answered in the first funding period, this particularly means that the efforts in further algorithmic and technical optimizations are enforced. In addition, considering the whole picture of fluid-structure-acoustics interaction also means that we validate our methods for realistic application scenarios. Subtasks of the project are 1) the optimization of the point-to-point communication between partitions of all involved solvers, 2) enhancing the load balancing from static to dynamical approaches, 3) the development of efficient parallel algorithms for coupling interface numerics, 4) validation and testing of our simulation environment for an enlarged number of scenarios, 5) the development of numerical methods and parallelization schemes for the coupling in time, 6) the development of techniques ensuring performance portability for various high-performance computing platforms.
|
EXAHD - An Exa-Scalable Two-Level Sparse Grid Approach for Higher-Dimensional Problems in Plasma Physics and Beyond | |||||||
Principal Investigators |
| ||||||
Contact | Dirk Pflüger | ||||||
Link | http://ipvs.informatik.uni-stuttgart.de/SGS/EXAHD/ | ||||||
Higher-dimensional problems are among the most compute-hungry problems, with an inherent need for future exascale resources. Caused by the exponential dependency of the number of degrees of freedom of classical discretization schemes on the problem's dimensionality, the solution of higher-dimensional problems (beyond the classical four dimensions from continuum mechanics) is a challenging and difficult task. The sparse grid combination technique, introduced to HPC environments by our project, allows to overcome this "curse of dimensionality" to a large extent. It can be employed in a wide range of applications (simulation, optimization, inverse problems,...) in domains as diverse as medicine, finance or astrophysics. The combination technique furthermore provides intriguing approaches to deal with challenges posed by future exascale systems. The combination technique introduces a second, numerically decoupled level of parallelism that ensures high scalability beyond domain decomposition. It is based on a superposition of solutions obtained on significantly coarser and anisotropic full grids, an approach that can also be exploited to deal with faults in an algorithm-based way without the need for expensive checkpoint-restart. In our project, we have demonstrated the feasibility of our approach by applying it to a highly visible and relevant application: turbulence simulations of hot fusion plasmas. We have shown algorithm-based fault tolerance, its scalability properties, and advances in the underlying numerics. Building on the foundations of the first funding period, we will advance the state of the art in three of SPPEXA's research topics towards exascale computing. First, we introduce new algorithmic approaches to the exascale challenges. We will extend fault tolerance for higher-dimensional problems to all levels of parallelization and to the detection and treatment even of silent failures due to data corruption. We propose a third layer in our approach to scale even beyond the boundaries of one single HPC system and even for the solution of time-dependent PDEs. This will be accompanied by hierarchical communication schemes to reduce communication even further. Furthermore, we will advance the frontiers of the numerics of higher-dimensional problems. Second, our software framework will provide a general tool for the solution of higher-dimensional problems, with efficient adaptive and dynamic load balancing. And third, these developments will drive our exemplary application code to scenarios that are far beyond what is currently feasible with conventional parallelizations even on the fastest systems. |
EXAMAG - Exascale Simulations of the Magnetic Universe | |||||
Principal Investigators |
| ||||
Contact | Volker Springel | ||||
Link | http://www.mathematik.uni-wuerzburg.de/~klingen/EXAMAG.html | ||||
Simulations of cosmic structure formation address multi-scale, multi-physics problems of vast proportions. These calculations are presently at the forefront of today's use of supercomputers, and are important scientific drivers for the future use of exaflop computing platforms. However, continued future success in this field requires the development of new numerical methods that excel in accuracy, robustness, parallel scalability, and physical fidelity to the processes relevant in galaxy and star formation. In an interdisciplinary and international effort of astrophysicists and applied mathematicians the EXAMAG project aims to substantially improve the astrophysical moving-mesh code AREPO and extend its range of applicability, with the goal of producing an internationally leading application code for the upcoming large computing platforms. We work on new, powerful high-order discontinuous Galerkin schemes, on more efficient solvers for gravity and for anisotropic transport of heat and relativistic particles, and on an improvement of the accuracy of the treatment of ideal magnetohydrodynamics. We aim to drastically enhance the raw performance and scalability of the code by employing sophisticated hybrid parallelisation techniques combined with low-level optimizations that make full use of vector instructions and device accelerators. The project also applies the code on current state-of-the art supercomputers to carry out transformative magnetohydrodynamic simulations of galaxy and primordial star formation, stretching the envelope of what is possible today and in the years to come. |
FFMK - A fast and fault tolerant microkernel-based system for exascale computing | |||||
Principal Investigators |
| ||||
Contact | Hermann Härtig | ||||
Link | www.zib.de/projects/ffmk-fast-and-fault-tolerant-microkernel-based-system-exascale-computing | ||||
The FFMK project designs, builds and evaluates a software system architecture to address the challenges expected in an Exascale system. In particular, these include performance losses caused by the much larger impact of runtime variability within applications, hardware, and operating system (OS), as well as increased vulnerability to failures. The architecture builds upon a node-local small OS kernel supporting a combination of specialized runtimes and a full-blown general purpose operating system. Global platform management supports dynamically changing partitions. For FFMK, we have instantiated the architecture components with the L4 microkernel, virtualized Linux, and MPI as an application runtime for bulk-synchronous applications. We carefully split OS and runtime components such that performance-critical functionality remains undisturbed by management-related background activities. The latter includes randomized Gossip algorithms and decentralized decision making as the basic building blocks of global platform management. XtreemFS serves as a fault-tolerant checkpoint store that uses local memory of all nodes to achieve scalability. Building these components and integrating them into the architecture is almost complete. In phase 2 of the project, we will complete and tune the prototype and then explore its potential by answering research questions such as: feasibility of global platform management using decentralized decision making, scalability of gossip algorithms, and benefit of prediction algorithms and application-level hints for platform management. We will further scrutinize the promise of deterministic, noise-free execution on a microkernel-based system, analyze how and how often load changes during execution of applications, and how often checkpoints and migration are needed. We plan to cooperate closely with application developers to evolve and continuously evaluate our platform. |
ESSEX-II - Equipping Sparse Solvers for Exascale | ||||||||
Principal Investigators |
| |||||||
Contact | Gerhard Wellein | |||||||
Link | http://blogs.fau.de/essex/ | |||||||
The ESSEX project investigates programming concepts and numerical algorithms for scalable, efficient and robust iterative sparse matrix applications on exascale systems. Starting with successful blueprints and prototype solutions identified in ESSEX-I, the second phase project ESSEX-II aims at delivering a collection of broadly usable and scalable sparse eigenvalue solvers with high hardware efficiency for the computer architectures to come. Project activities are organized along the traditional software layers of low-level parallel building blocks (kernels), algorithm implementations, and applications. The classic abstraction boundaries separating these layers are broken in ESSEX by strongly integrating objectives: scalability, numerical reliability, fault tolerance, and holistic performance and power engineering. The basic building block library supports an elaborate MPI+X approach which is able to fully exploit hardware heterogeneity while exposing functional parallelism and data parallelism to all other software layers in a flexible way. In addition, facilities for fully asynchronous checkpointing, silent data corruption detection and correction, performance assessment, performance model validation, and energy measurements will be provided transparently. The advanced building blocks will be defined and employed by the developments at the algorithms layer. Here, ESSEX-II will provide state-of-the-art library implementations of classic linear sparse eigenvalue solvers including block Jacobi-Davidson, Kernel Polynomial Method (KPM), and Chebyshev filter diagonalization (ChebFD) that are ready to use for production on modern heterogeneous compute nodes with best performance and numerical accuracy. Research in this direction includes the development of appropriate parallel adaptive AMG software for the block Jacobi-Davidson method. Contour integral-based approaches are also covered in ESSEX-II and will be extended in two directions: The FEAST method will be further developed for improved scalability, and the Sakurai-Sugiura method (SSM) method will be extended to nonlinear sparse eigenvalue problems. These developments are strongly supported by additional Japanese project partners from University of Tokyo, Computer Science, and University of Tsukuba, Applied Mathematics. The applications layer will deliver scalable solutions for conservative (Hermitian) and dissipative (non-Hermitian) quantum systems with strong links to optics and biology and to novel materials such as graphene and topological insulators. Extending its predecessor project, ESSEX-II adopts an additional focus on production-grade software. Although the selection of algorithms is strictly motivated by quantum physics application scenarios, the underlying research directions of algorithmic and hardware efficiency, accuracy, and resilience will radiate into many fields of computational science. Most importantly, all developments will be accompanied by an uncompromising performance engineering process that will rigorously expose any discrepancy between expected and observed resource efficiency.
ScienceNode Article: Another step on the stairs to exascale Video on Youtube: Mortiz Kreutzer, SPPEXA Best PhD Award 2017
|
EXASOLVERS - Extreme Scale Solvers for Coupled Problems | |||||||||
Principal Investigators |
| ||||||||
Contact | Gabriel Wittum | ||||||||
Link | gepris.dfg.de/gepris/projekt/230946257 |
ADA-FS - Advanced Data Placement via Ad-hoc File Systems at Extreme Scales | ||||
Principal Investigators |
| |||
Contact | Wolfgang E. Nagel | |||
Link | http://ada-fs.github.io/ | |||
For future HPC systems, data management is an essential factor. The location of data that is needed during a calculation plays a central role in increasing the efficiency of HPC systems. While already efficient methods for reserving data exist on the processor-level, access to the parallel file system is still a bottleneck. The volume of data that is expected for the calculation within future applications exceeds the capacity of the node-local storage and requires the download of data from the parallel file system during the runtime of the application. In HPC file systems are usually a shared medium, which are used by many users in parallel. Furthermore, the performance is limited by the interface between the central file system and the compute nodes. Thus, for an application it is currently not possible to predict the actual load on the file system infrastructure and to optimize the I/O subsystem. The project aims to improve I/O performance for highly-parallel applications by distributed ad-hoc overlay file systems. For this purpose, it examines how job-specific temporary file systems can be efficiently provided for HPC environments. These file systems are to be created from the resources of the computing nodes involved. These temporary file systems are filled with the necessary data through an integration into the scheduling system of the supercomputer before the job starts. After the completion of the job, the data is migrated back into the global parallel file system. The research approach includes both the design of the file system itself as well as the questions about the proper scheduling strategy for planning the necessary I/O transfers. |
AIMES - Advanced Computation and I/O Methods for Earth-System Simulations | |||||
Principal Investigators |
| ||||
Contact | Thomas Ludwig | ||||
Link | http://wr.informatik.uni-hamburg.de/research/projects/aimes/start | ||||
With the Advanced Computation and I/O Methods for Earth-System Simulations (AIMES) project we will address the key issues of programmability, computational efficiency and I/O limitations that are common in next-generation icosahedral earth-system models. To reach a higher-level of code design and improve computational efficiency, we apply and advance concepts and tools for domain-specific languages on earth-system models. Data handling is advanced by two complementing approaches: the investigation of suitable formats for icosahedral data and by pushing lossy compression forward. Ultimately, with the project we intend to foster development of best-practices and useful norms by cooperating on shared ideas and components. During the project, we will ensure that the developed concepts and tools are not only applicable for earth-science but for other scientific domains as well. |
ExaDG - High-Order Discontinuous Galerkin for the Exa-Scale | |||||
Principal Investigators |
| ||||
Contact | Guido Kanschat | ||||
Link | gepris.dfg.de/gepris/projekt/279336170 | ||||
The goal of the ExaDG project is to provide new high-performance implementations of finite element-related algorithms, including discontinuous Galerkin methods. State-of-the-art finite-element software libraries focus on leveraging sparse-matrix structures for their linear algebra because of their ease of use and the accessibility of highly efficient algebraic multigrid solvers, usually via the Trilinos or PETSc packages. Nevertheless, sparse matrix vector products suffer from a low computational intensity of less than one floating point operation per byte of memory. In prototypical applications, tensor product based matrix-free codes with much higher computational intensity have proven superior on current architectures with high potential for the future. This project aims for an implementation of a completely matrix-free infrastructure for wide ranges of applications in the deal.II software library, extending our previous work on matrix-free operator evaluation for continuous finite elements to discontinuous Galerkin and other elements such as Raviart-Thomas. Efficient vectorized kernels and multithreaded implementations of local operations are a central aspect of our work, in addition to continued use and extension of our message passing environment to even larger scales. This includes also the design of appropriate data structures and access patterns of our kernels. Since the matrix-free evaluation restricts the choice of preconditioners, a central part is to develop and efficiently implement multigrid tools, including element-based and patch-based smoothers that can be approximated by tensor products. Finally, the feasibility of the framework for complex applications will be exemplified on challenging problems of incompressible fluid flow and fluid-structure interaction. |
MYX - MUST Correctness Checking for YML and XMP Programs | ||||||
Principal Investigators |
| |||||
Contact | Matthias S. Müller | |||||
Link | gepris.dfg.de/gepris/projekt/279334242 | |||||
Exascale systems challenge the programmer to write multi-level parallel programs, which means employing multiple different paradigms to address each individual level of parallelism in the system. The long-term challenge is to evolve existing and to develop new programming models to better support the application development on exascale machines. In the multi-level programming paradigm FP3C, users are able to express high-level parallelism in the YvetteML workflow language (YML) and employ parallel components written in the XcalableMP (XMP) paradigm. By developing correctness checking techniques for both paradigms, and by investigating the fundamental requirements to first design for and then to verify the correctness of parallelization paradigms, MYX aims to combine the know-how and lessons learned of different areas to derive the input necessary to guide the development of future programming models and software engineering methods. XMP is a PGAS language specified by Japan’s PC Cluster Consortium for high-level programming and the main research vehicle for Japan’s post-petascale programming model research targeting exascale. YML is used to describe the parallelism of an application at a very high level, in particular to couple complex applications. YML provides a compiler to translate the YvetteML notation into XMP-parallel programs, and a just-in-time scheduler to manage the execution of parallel programs. The MUST correctness checker can detect a wide range of issues in MPI, OpenMP and hybrid MPI+OpenMP programs by collecting program information and aggregating this in a tree-based overlay network capable of running different types of analysis. Due to the use of the PnMPI profiling interface, MUST can in principle trace and analyze any MPI communication either directly from the application code or any middleware library, such as the XMP runtime. In MYX we will investigate the application of scalable correctness checking methods to YML, XMP and selected features of MPI. This will result in a clear guideline how to limit the risk to introduce errors and how to best express the parallelism to catch errors that for principle reasons can only be detected at runtime, as well as extended and scalable correctness checking methods. |
ExtraPeak - Automatic Performance Modeling of HPC Applications with Multiple Model Parameters |
Principal Investigators |
| ||
Contact | Felix Wolf | ||
Link | www.vi-hps.org/projects/extrapeak/ |
Applications of ever-increasing complexity combined with rapidly growing volumes of data create an insatiable demand for computing power. However, the operational and procurement costs of the supercomputers needed to run them are tremendous. Minimizing runtime and energy consumption of a code is therefore an economic imperative. Tuning complex HPC applications requires the clever exploration of their design and configuration space. Especially on supercomputers, however, this space is so large that its exhaustive traversal via performance experiments is too expensive, if not impossible. Performance models, which describe performance metrics such as the execution time as a function of parameters such as the number of cores or the size of the input problem in an equation, allow this space to be explored more efficiently. Unfortunately, creating performance models manually is extremely laborious if done for large real-world applications. Further, to ensure that applications are free of performance bugs, it is often not enough to analyze any single aspect, such as processor count or problem size. The effect that the one varying parameter has on performance must be understood not only in a vacuum, but also in the context of the variation of other relevant parameters, including algorithmic options, input characteristics, or tuning parameters such as tiling. Recent advances in automatic empirical performance modeling, i.e., the generation of performance models based on a limited set of performance experiments, aspire to bridge this gap. However, while models with one parameter can be handled quite easily, performance modeling with multiple parameters poses significant challenges, namely (i) the identification of performance-relevant parameters, (ii) the resource-aware design of the required performance experiments, (iii) the diversity of possible model functions, and (iv) the efficient traversal of a complex high-dimensional model search space. In this project, we will develop an automatic empirical approach that allows performance modeling of any combination of application execution parameters. Solution components to tackle the above challenges include prior source-code analysis and a feedback-guided process of performance-data acquisition and model generation. The goal are insightful performance models that enable a wide range of uses from performance predictions for balanced machine design to detailed performance tuning. Our approach will help application developers understand complex performance tradeoffs and ultimately improve the performance of their code. |