A4: Characterisation and Analysis of Invasive Algorithmic Patterns
Principal Investigators:
Prof. M. Bader, Dr. S. Wildermann
Scientific Researchers:
J. Spieck, B. Pourmohseni, A. Pöppl, M. Wille, Dr. J. Falk
Abstract
Project A4 characterises, investigates existing and develops novel invasive algorithmic patterns, which are motivated by applications from project area D. Using design space exploration (DSE), we precompute operating points to achieve desired quality numbers (performance, energy consumption, etc.) of applications.
In the second funding phase, we addressed three major research questions:
- How can we derive knowledge about an application’s performance and pass it to the run-time system? Together with Projects A1 and C2, we developed the ActorX10 library. The use of actor-based design for InvadeX10 applications realises a separation of computation and communication and automatically serves as a design entry for a subsequent DSE. The DSE obtains multiple operating points with different trade-offs between resource requirements and achievable quality numbers. This information is passed to the run-time system.
- How can we prepare run-time system decisions based on this knowledge, to improve the system performance w.r.t. various performance metrics? Each operating point found by DSE has Pareto-optimal quality numbers and resource claim constraints. We distil this usually huge set of Pareto-optimal operating points to a manageable and still relevant subset of points. These are passed to the run-time system, which can use the provided information for resource management.
- How can invasion-aware algorithmic patterns be developed that benefit from invasion on heterogeneous platforms? For this purpose, we developed and examined SWE-X10, a proxy application for space- and time-adaptive tsunami simulation, and a prototypical image detection chain. Both proxy applications are written in InvadeX10 using the ActorX10 library and serve to evaluate novel invasive algorithmic patterns and how to exploit operating points distilled by DSE.
The developed design flow allows the run-time system to execute applications with predictable performance (w.r.t. upper and lower bounds on various quality numbers), where applications are statically optimised under worst-case assumptions and for hardware with known performance behaviour.
In Phase III, we will focus on programming and characterising applications with varying execution phases and changing workloads to run on hardware with uncertain performance behaviour.
Synopsis
Application execution may be exposed to diverse workload scenarios depending on varying inputs, execution phases, and environmental conditions. Rather than application mappings being tailored to the worst-case or average-case scenarios, it is necessary to provide run-time enforcement strategies that enable switching between different algorithmic patterns and resource claims dynamically to cope with such variances. Additionally, on modern hardware, such uncertain performance may already result from small variations in the manufacturing process—for example, compute core instances of the same type will have to run at slightly different clock frequencies to meet a certain power limit or energy budget. Furthermore, hardware faults are expected to occur more often thus making resources temporally or, due to manufacturing variability and ageing, even permanently unavailable. Therefore, it is our goal to provide optimised and robust application execution by dynamically adapting the application behaviour and structure (i.e. the actor model) as well as the set of allocated resources by means of re-invasion and run-time requirement enforcement (RRE) to such variances and fault scenarios. Consequently, the concept of operating points needs to be extended to include such variations and provide an operational corridor for run-time requirement enforcers in the run-time system or in the invasive hardware.
Approach
Project A4’s mission is to explore and establish application characterisation in the invasive computing paradigm and to investigate and evaluate how algorithms and applications may exploit and profit from the resulting predictability features.
A major challenge is that our study of invasive proxy applications in Phase II and the experiences gained within the embedded system and HPC communities reveal that application execution is getting increasingly dynamic:
Preview: Fictional scenario of a tsunami in the Eastern Mediterranean Sea. Scenario used by invasive hardware prototype.
- Workload scenarios and execution phases of an application vary strongly, such that a static worst- or average-case characterisation for a single scenario becomes inadequate.
- Performance behaviour on modern hardware is highly uncertain, as external influences, differences in manufacturing and measures to enforce power limits or energy budgets affect quality numbers in an unforeseeable way. For example, compute core instances of the same type will have to run at slightly different clock frequencies to meet a certain power limit or energy budget. (Intel Sandy Bridge architecture) compute nodes of the SuperMUC petascale machine. Observing the the average node power of compute nodes in the SuperMUC petascale supercomputer, one can see a Gaussian distribution of the power consumption. If we assume power capping for the compute nodes, we have to expect a similar Gaussian distribution for the achievable clock speeds.
- Hardware faults are expected to occur more often thus making resources temporally or, due to manufacturing variability and ageing, even permanently unavailable.
Project A1 investigates the enforcement of non-functional requirements for the execution of one operating point. Such adaptations concentrate on modifications of the claimed resources, and thus always happen within a corridor defined by the respective operating point. Project A4 expands this approach beyond the boundaries of a single operating point by considering algorithmic adaptations of the application as well as modifications of the resource claim. Our goal is to provide optimised and robust application execution in the context of above aspects by characterising and enforcing adaptation of the application behaviour (e.g. algorithmic variants), the application structure (i.e. the actor model), and the set of allocated resources by means of re-invasion and run-time requirement enforcement (RRE) to such variances and fault scenarios.
A comprehensive summary of the major achievements of the second funding phase can be found by accessing Project A4 second phase website.
Publications
[1] | Jan Spieck, Stefan Wildermann, and Jürgen Teich. A learning-based methodology for scenario-aware mapping of soft real-time applications onto heterogeneous mpsocs. ACM Trans. Design Autom. Electr. Syst., 28(1):4:1–4:40, 2023. [ DOI | http ] |
[2] | Yakup Budanaz, Mario Wille, and Michael Bader. Asynchronous workload balancing through persistent work-stealing and offloading for a distributed actor model library. In 2022 IEEE/ACM Parallel Applications Workshop: Alternatives To MPI+X (PAW-ATM), pages 39–51, November 14, 2022. [ DOI ] |
[3] | Jan Spieck, Stefan Wildermann, and Jürgen Teich. On Transferring Application Mapping Knowledge Between Differing MPSoC Architectures. In CODES+ISSS 2022, October 2022. |
[4] | Nidhi Anantharajaiah, Tamim Asfour, Michael Bader, Lars Bauer, Jürgen Becker, Simon Bischof, Marcel Brand, Hans-Joachim Bungartz, Christian Eichler, Khalil Esper, Joachim Falk, Nael Fasfous, Felix Freiling, Andreas Fried, Michael Gerndt, Michael Glaß, Jeferson Gonzalez, Frank Hannig, Christian Heidorn, Jörg Henkel, Andreas Herkersdorf, Benedict Herzog, Jophin John, Timo Hönig, Felix Hundhausen, Heba Khdr, Tobias Langer, Oliver Lenke, Fabian Lesniak, Alexander Lindermayr, Alexandra Listl, Sebastian Maier, Nicole Megow, Marcel Mettler, Daniel Müller-Gritschneder, Hassan Nassar, Fabian Paus, Alexander Pöppl, Behnaz Pourmohseni, Jonas Rabenstein, Phillip Raffeck, Martin Rapp, Santiago Narváez Rivas, Mark Sagi, Franziska Schirrmacher, Ulf Schlichtmann, Florian Schmaus, Wolfgang Schröder-Preikschat, Tobias Schwarzer, Mohammed Bakr Sikal, Bertrand Simon, Gregor Snelting, Jan Spieck, Akshay Srivatsa, Walter Stechele, Jürgen Teich, Furkan Turan, Isaías A. Comprés Ureña, Ingrid Verbauwhede, Dominik Walter, Thomas Wild, Stefan Wildermann, Mario Wille, Michael Witterauf, and Li Zhang. Invasive Computing. FAU University Press, August 16, 2022. [ DOI ] |
[5] | Jan Spieck, Stefan Wildermann, and Jürgen Teich. A Learning-Based Methodology for Scenario-Aware Mapping of Soft Real-Time Applications onto Heterogeneous MPSoCs. ACM Transactions on Design Automation of Electronic Systems, 2022. [ DOI ] |
[6] | Behnaz Pourmohseni, Stefan Wildermann, Fedor Smirnov, Paul Meyer, and Jürgen Teich. Task Migration Policy for Thermal-Aware Dynamic Performance Optimization in Many-Core Systems. IEEE Access, 2022. [ DOI ] |
[7] | Jan Spieck, Stefan Wildermann, and Jürgen Teich. Domain-Adaptive Soft Real-Time Hybrid Application Mapping for MPSoCs. In 2021 ACM/IEEE 3rd Workshop on Machine Learning for CAD (MLCAD), September 2021. |
[8] | Behnaz Pourmohseni. System-Level Mapping, Analysis, and Management of Real-Time Applications in Many-Core Systems. Dissertation, Hardware/Software Co-Design, Department of Computer Science, Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany, 2021. |
[9] |
Martin Bogusz, Philipp Samfass, Alexander Pöppl, Jannis Klinkenberg, and
Michael Bader.
Evaluation of multiple hpc parallelization frameworks in a shallow
water proxy application with multi-rate local time stepping.
In The 3rd Annual Parallel Applications Workshop, Alternatives
To MPI+X, Atlanta, Georgia, USA (virtual), November 2020. IEEE/ACM sigARCH.
With a widening gap between processor speed and communication latencies, overlap of computation and communication becomes increasingly important. At the same time, variable CPU clock frequencies (e.g., with DVFS and Turbo Boost) and novel numerical techniques such as local time stepping make it challenging to balance parallel execution times, even in the case of balanced computational load. This limits parallel efficiency. In order to tackle these challenges, emerging runtime systems may be used. In this paper, we present a thorough study of four selected parallelization frameworks – Chameleon, HPX, Charm++ and UPC++ – in a proxy application for solving the shallow water equations. In addition, we augment the traditional MPI baseline variant with support for these frameworks and evaluate them in detail with respect to strong scaling efficiency and load balancing for global and local time stepping. |
[10] | Jan Spieck, Stefan Wildermann, and Jürgen Teich. Run-Time Scenario-Based MPSoC Mapping Reconfiguration Using Machine Learning Models. In 2019 ACM/IEEE 1st Workshop on Machine Learning for CAD (MLCAD), July 2020. [ DOI ] |
[11] | Tobias Schwarzer. System-level Mapping of Dataflow Applications onto MPSoCs. Dissertation, Hardware/Software Co-Design, Department of Computer Science, Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany, 2020. |
[12] | Jürgen Teich, Behnaz Pourmohseni, Oliver Keszöcze, Jan Spieck, and Stefan Wildermann. Run-Time Enforcement of Non-Functional Application Requirements in Heterogeneous Many-Core Systems. In Proceedings of the 25th Asia and South Pacific Design Automation Conference (ASP-DAC), pages 629–636, January 2020. [ DOI ] |
[13] | Behnaz Pourmohseni, Fedor Smirnov, Stefan Wildermann, and Jürgen Teich. Real-Time Task Migration for Dynamic Resource Management in Many-Core Systems. In Proceedings of the Workshop on Next Generation Real-Time Embedded Systems (NG-RES), pages 5:1–5:14, 2020. [ DOI ] |
[14] | Behnaz Pourmohseni. System-level mapping, analysis, and management of real-time applications in many-core systems. Ph.D. Forum at the Design, Automation and Test in Europe Conference (DATE). Ph.D. Forum Best Poster Award, 2020. |
[15] | Behnaz Pourmohseni, Michael Glaß, Jörg Henkel, Heba Khdr, Martin Rapp, Valentina Richthammer, Tobias Schwarzer, Fedor Smirnov, Jan Spieck, Jürgen Teich, Andreas Weichslgartner, and Stefan Wildermann. Hybrid Application Mapping for Composable Many-Core Systems: Overview and Future Perspective. Journal of Low Power Electronics and Applications, 10:1–37, 2020. [ DOI ] |
[16] | Jürgen Teich, Pouya Mahmoody, Behnaz Pourmohseni, Sascha Roloff, Wolfgang Schröder-Preikschat, and Stefan Wildermann. Run-Time Enforcement of Non-functional Program Properties on MPSoCs. In Jian-Jia Chen, editor, A Journey of Embedded and Cyber-Physical Systems, pages 125–149. Springer, 2020. [ DOI ] |
[17] | J. Spieck, S. Wildermann, and J. Teich. Scenario-Based Soft Real-Time Hybrid Application Mapping for MPSoCs. In 2020 57th ACM/IEEE Design Automation Conference (DAC), pages 1–6, 2020. [ DOI ] |
[18] | Alexander Pöppl, Scott Baden, and Michael Bader. A UPC++ actor library and its evaluation on a shallow water proxy application. In Parallel Applications Workshop, Alternatives To MPI+X, Denver, Colorado, United States of America, November 2019. IEEE, IEEE. |
[19] | Jan Spieck, Stefan Wildermann, Tobias Schwarzer, Jürgen Teich, and Michael Glaß. Data-Driven Scenario-based Application Mapping for Heterogeneous Many-Core Systems. In Multicore/Many-core Systems-on-Chip (MCSoC 2019), pages 334–341, October 2019. [ DOI ] |
[20] | Tobias Schwarzer, Joachim Falk, Simone Müller, Martin Letras, Christian Heidorn, Stefan Wildermann, and Jürgen Teich. Compilation of dataflow applications for multi-cores using adaptive multi-objective optimization. ACM Transactions on Design Automation of Electronic Systems, 24(3):29:1–29:23, March 2019. [ DOI ] |
[21] | Behnaz Pourmohseni, Fedor Smirnov, Heba Khdr, Stefan Wildermann, Jürgen Teich, and Jörg Henkel. Thermally composable hybrid application mapping for real-time applications in heterogeneous many-core systems. In Proceedings of the 40th IEEE Real-Time Systems Symposium (RTSS), 2019. [ DOI ] |
[22] | Behnaz Pourmohseni, Fedor Smirnov, Stefan Wildermann, and Jürgen Teich. Isolation-aware timing analysis and design space exploration for predictable and composable many-core systems. In Proceedings of the 31th Euromicro Conference on Real-Time Systems (ECRTS 2019), pages 12:1–12:24, 2019. [ DOI ] |
[23] | Behnaz Pourmohseni, Stefan Wildermann, Michael Glaß, and Jürgen Teich. Hard Real-Time Application Mapping Reconfiguration for NoC-Based Many-Core Systems. Real-Time Systems, 55(2):433–469, 2019. [ DOI ] |
[24] | Dirk Gabriel, Walter Stechele, and Stefan Wildermann. Resource-aware parameter tuning for real-time applications. In Martin Schoeberl, Christian Hochberger, Sascha Uhrig, Jürgen Brehm, and Thilo Pionteck, editors, Architecture of Computing Systems – ARCS 2019, pages 45–55. Springer International Publishing, 2019. [ DOI ] |
[25] | Jörg Henkel, Jürgen Teich, Stefan Wildermann, and Hussam Amrouch. Dynamic resource management for heterogeneous many-cores. In Proceedings of the International Conference on Computer-Aided Design (ICCAD), pages 60:1–60:6. ACM, November 2018. [ DOI ] |
[26] | Andreas Weichslgartner, Stefan Wildermann, Deepak Gangadharan, Michael Glaß, and Jürgen Teich. A design-time/run-time application mapping methodology for predictable execution time in mpsocs. ACM Transactions on Embedded Computing Systems (TECS), 17(5):89:1–89:25, November 2018. [ DOI ] |
[27] | Tobias Schwarzer, Sascha Roloff, Valentina Richthammer, Rami Khaldi, Stefan Wildermann, Michael Glaß, and Jürgen Teich. On the Complexity of Mapping Feasibility in Many-Core Architectures. In Proceedings of Multicore/Many-core Systems-on-Chip (MCSoC-2018), September 2018. |
[28] | Valentina Richthammer, Tobias Schwarzer, Stefan Wildermann, Jürgen Teich, and Michael Glaß. Architecture Decomposition in System Synthesis of Heterogeneous Many-Core Systems. In 55th ACM/EDAC/IEEE Design Automation Conference (DAC 2018), June 2018. |
[29] | Tobias Schwarzer, Andreas Weichslgartner, Michael Glaß, Stefan Wildermann, Peter Brand, and Jürgen Teich. Symmetry-eliminating Design Space Exploration for Hybrid Application Mapping on Many-Core Architectures. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 37(2):297–310, February 2018. [ DOI ] |
[30] | Andreas Weichslgartner, Stefan Wildermann, Michael Glaß, and Jürgen Teich. Invasive Computing for Mapping Parallel Programs to Many-Core Architectures. Springer, January 15, 2018. [ DOI ] |
[31] | Alexander Pöppl, Marvin Damschen, Florian Schmaus, Andreas Fried, Manuel Mohr, Matthias Blankertz, Lars Bauer, Jörg Henkel, Wolfgang Schröder-Preikschat, and Michael Bader. Shallow water waves on a deep technology stack: Accelerating a finite volume tsunami model using reconfigurable hardware in invasive computing. In Dora B. Heras, Luc Bougé, Gabriele Mencagli, Emmanuel Jeannot, Rizos Sakellariou, Rosa M. Badia, Jorge G. Barbosa, Laura Ricci, Stephen L. Scott, Stefan Lankes, and Josef Weidendorfer, editors, Euro-Par 2017: Proceedings of the 10th Workshop on UnConventional High Performance Computing (UCHPC 2017), Lecture Notes in Computer Science (LNCS), pages 676–687, Cham, 2018. Springer International Publishing. |
[32] | Michael Glaß, Jürgen Teich, Martin Lukasiewycz, and Felix Reimann. Hybrid optimization techniques for system-level design space exploration. In Soonhoi Ha and Jürgen Teich, editors, Handbook of Hardware/Software Codesign, volume 1, pages 217–246. Springer, Dordrecht, The Netherlands, September 2017. [ DOI ] |
[33] | Hananeh Aliee. Reliability Analysis and Optimization of Embedded Systems using Stochastic Logic and Importance Measures. Dissertation, Hardware/Software Co-Design, Department of Computer Science, Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany, March 15, 2017. |
[34] | Behnaz Pourmohseni, Michael Glaß, and Jürgen Teich. Automatic Operating Point Distillation for Hybrid Mapping Methodologies. In Proceedings of Design, Automation and Test in Europe Conference Exhibition (DATE), pages 1135–1140. IEEE, 2017. [ DOI ] |
[35] | Behnaz Pourmohseni, Stefan Wildermann, Michael Glaß, and Jürgen Teich. Predictable run-time mapping reconfiguration for real-time applications on many-core systems. In Proceedings of the International Conference on Real-Time Networks and Systems (RTNS). IEEE, 2017. Outstanding paper award. [ DOI ] |
[36] | Jürgen Teich. Invasive computing – editorial. it – Information Technology, 58(6):263–265, November 24, 2016. [ DOI ] |
[37] | Alexander Pöppl, Michael Bader, Tobias Schwarzer, and Michael Glaß. SWE-X10: Simulating shallow water waves with lazy activation of patches using ActorX10. In Proceedings of the 2nd International Workshop on Extreme Scale Programming Models and Middleware (ESPM2), pages 32–39. IEEE, November 2016. [ .pdf ] |
[38] | Stefan Wildermann, Michael Bader, Lars Bauer, Marvin Damschen, Dirk Gabriel, Michael Gerndt, Michael Glaß, Jörg Henkel, Johny Paul, Alexander Pöppl, Sascha Roloff, Tobias Schwarzer, Gregor Snelting, Walter Stechele, Jürgen Teich, Andreas Weichslgartner, and Andreas Zwinkau. Invasive computing for timing-predictable stream processing on MPSoCs. it – Information Technology, 58(6):267–280, September 30, 2016. [ DOI ] |
[39] | Jürgen Teich, Michael Glaß, Sascha Roloff, Wolfgang Schröder-Preikschat, Gregor Snelting, Andreas Weichslgartner, and Stefan Wildermann. Language and compilation of parallel programs for *-predictable MPSoC execution using invasive computing. In Proceedings of the 10th IEEE International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC), pages 313–320, Lyon, France, September 2016. [ DOI ] |
[40] | Sascha Roloff, Alexander Pöppl, Tobias Schwarzer, Stefan Wildermann, Michael Bader, Michael Glaß, Frank Hannig, and Jürgen Teich. ActorX10: An actor library for X10. In Proceedings of the 6th ACM SIGPLAN X10 Workshop (X10), pages 24–29. ACM, June 14, 2016. [ DOI ] |
[41] | Andreas Weichslgartner, Stefan Wildermann, Johannes Götzfried, Felix Freiling, Michael Glaß, and Jürgen Teich. Design-time/run-time mapping of security-critical applications in heterogeneous mpsocs. In Proceedings of the 19th International Workshop on Software and Compilers for Embedded Systems (SCOPES), pages 153–162. ACM, May 23, 2016. [ DOI ] |
[42] |
Alexander Pöppl and Michael Bader.
Swe-x10: An actor-based and locally coordinated solver for the
shallow water equations.
In Proceedings of the 6th ACM SIGPLAN Workshop on X10, X10
2016, pages 30–31. ACM, 2016.
[ DOI ]
Keywords: APGAS, Actor Model, Parallel Algorithms, Shallow Water Equations |
[43] | Alexander Pöppl and Alexander Herz. A cache-aware performance prediction framework for gpgpu computations. In Sascha Hunold, Jesper Larsson Träff, and Francesco Versaci, editors, Euro-Par 2015: Parallel Processing Workshops, Lecture Notes in Computer Science, Berlin, Heidelberg, 2016. Springer-Verlag. |
[44] | Hans Michael Gerndt, Michael Glaß, Sri Parameswaran, and Barry L. Rountree. Dark Silicon: From Embedded to HPC Systems (Dagstuhl Seminar 16052). Dagstuhl Reports, 6(1):224–244, 2016. [ DOI ] |
[45] | Michael Bader. Performance optimzation vs. power - experiences with petascale earthquake simulations on supermuc. Talk at the Workshop on Power-Bounded HPC Performance Optimisation at Schloss Dagstuhl, August 2015. |
[46] | Alexander Breuer, Alexander Heinecke, Leonhard Rannabauer, and Michael Bader. High-order ader-dg minimizes energy- and time-to-solution of seissol. In High Performance Computing, 30th International Conference, ISC High Performance 2015, Frankfurt, Germany, July 12-16, 2015, Proceedings, volume 9137 of Lecture Notes in Computer Science, pages 340–357. Springer, July 2015. |
[47] | Michael Bader. Vectorization of riemann solvers for the shallow water equations. Minisymposium Flooding the Cores - Computing Flooding Events with Many-Core and Accelerator Technologies at SIAM Conference on Computational Science and Engineering, March 2015. |
[48] | Sebastian Graf, Felix Reimann, Michael Glaß, and Jürgen Teich. Towards scalable symbolic routing for multi-objective networked embedded system design and optimization. In Proceedings of the International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS 2014), pages 2:1–2:10, October 12, 2014. [ DOI ] |
[49] | Andreas Weichslgartner, Deepak Gangadharan, Stefan Wildermann, Michael Glaß, and Jürgen Teich. Daarm: Design-time application analysis and run-time mapping for predictable execution in many-core systems. In Proceedings of the International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS 2014), pages 1–10, October 2014. [ DOI ] |
[50] | Tobias Weinzierl, Michael Bader, Kristof Unterweger, and Roland Wittmann. Block fusion on dynamically adaptive spacetree grids for shallow water waves. Parallel Processing Letters, 24(3):1441006, September 2014. |
[51] | Michael Glaß, Michael Bader, Jürgen Teich, and Stefan Wildermann. Assisting run-time optimization of many-core systems by design-time characterization. HiPEAC Spring Computing Systems Week, Barcelona, Invited Talk, May 13, 2014. |
[52] | Stefan Wildermann, Michael Glaß, and Jürgen Teich. Multi-objective distributed run-time resource management for many-cores. In Proceedings of Design, Automation and Test in Europe (DATE), pages 1–6, March 2014. [ DOI ] |
[53] | Michael Bader. On the performance of adaptive mesh-based simulations on modern hpc architectures. Invited presentation at SIAM Conference on Parallel Processing in Scientific Computing - SIAM PP 2014, February 2014. [ .pdf ] |
[54] | Michael Bader. Exploiting locality properties of space-filling curves in scientific computing. Colloquium talk at TU Eindhoven, November 2013. |