SPPEXA Background
Today, simulation in its broadest sense (i.e. including numerical and discrete approaches, and including all steps of the so-called simulation cycle, from mathematical modelling to (visual) data exploration and validation) is generally considered the third path to insight in most fields of science and engineering, complementing theory and experiment and sometimes even allowing for first results at all (if experiments are too costly, time-consuming, or even impossible). Hence, the respective young and trans-disciplinary field of Computational Science and Engineering (CSE) has become a key technology for science and industry. Despite the fact that, of course, not all kinds of simulation involve large-scale computations on high-end systems, High-Performance Computing (HPC) is the enabling technology for CSE. Therefore, mastering the upcoming challenges summarized under the label “exascale” will be the key to any future capability computing application, but it will also be crucial for learning how to deal with commodity systems of the day after tomorrow for smaller-scale or capacity computing tasks.
Currently, the most powerful supercomputers used for simulation in research reach numbers of processor cores of up to one million and obtain a peak performance of 1 to 10 PFLOP/s (1015 to 1016 floating point operations per second). For 2018, a peak performance of 1 EFLOP/s is expected – a factor of 102 to 103 more that, due to physical limits, can no longer be achieved by higher clock rates. Rather, for such exascale systems, about one hundred million cores are being predicted. In many fields of science, such as climate research, plasma physics, astrophysics, high-energy physics, material science, geosciences, life sciences, or neurosciences, that traditionally use HPC heavily, this computational power will allow for significant breakthroughs. Other fields that so far have not been that much simulation-driven, such as medicine, currently see a strong increase of respective research. Even more, with growing possibilities new fields of application will emerge. In engineering applications such as aerodynamics, combustion, or process engineering, significantly reduced response times as well as the resulting unprecedented interaction and real-time potential will help to drastically accelerate design cycles of technological innovations.
The big overall scientific challenge is to master massive parallelism and, with its help, to let high-end applications benefit from HPC technology. Parallel programming paradigms, language constructs, hierarchical organization, dynamic load distribution, energy-awareness, and scalability of algorithms and applications – these are just some of the key issues that have to be tackled. There is a tremendous lack of fundamental methodology, since all existing parallelization strategies – in particular the de-facto standards MPI and OpenMP as well as their straightforward hybrid combination – were designed for at most thousands of processors. So far, current research is primarily extension-based, aiming at enabling existing concepts for larger core numbers. This still is an important and reasonable direction to follow, but this must not be the only path pursued. There is a need for completely new structures of organization, distribution, and communication – strategies for leading global enterprises with several hundred thousands of employees cannot and have not been derived merely as evolutionary extensions of the respective concepts for small handcraft enterprises and mid-size companies. The situation is comparable to a 1970’s designer of electronic circuits with some 10–100 transistors who would be asked to design a modern computer chip with millions of such units; or to that of an aeronautical engineer who would be given the task to design a passenger aircraft propelled by some 100,000 hair dryers instead of two or four jet engines. Moreover, many components of the software stack are lacking the required scalability and other essential features even for petascale computations, a fact that increasingly hinders not only efficient computations, but also the underlying science itself. Finally, the impact of massive parallelism on simulation algorithms has not yet been sufficiently explored, and hardware-software co-design involving all relevant disciplines is still in its infancy. Without concerted research activities in these directions, simulation applications and their codes will stay far behind the theoretical possibilities of the latest hardware such that its reasonable use will be hardly possible in the foreseeable future. Since multi-national task forces, in which members of the team of initiators of SPPEXA are involved, have organized and organize workshops and prepare roadmaps for exascale software (the European Exascale Software Initiative EESI as well as the International Exascale Software Project IESP), and since several countries have launched large programs addressing this issue specifically, there is the additional risk for German science to be left behind. The main idea of SPPEXA is to provide a framework for such concerted and trans-disciplinary research activities in Germany.