TY - JOUR AB - N-body methods are one of the essential algorithmic building blocks of high-performance and parallel computing. Previous research has shown promising performance for implementing n-body simulations with pairwise force calculations on FPGAs. However, to avoid challenges with accumulation and memory access patterns, the presented designs calculate each pair of forces twice, along with both force sums of the involved particles. Also, they require large problem instances with hundreds of thousands of particles to reach their respective peak performance, limiting the applicability for strong scaling scenarios. This work addresses both issues by presenting a novel FPGA design that uses each calculated force twice and overlaps data transfers and computations in a way that allows to reach peak performance even for small problem instances, outperforming previous single precision results even in double precision, and scaling linearly over multiple interconnected FPGAs. For a comparison across architectures, we provide an equally optimized CPU reference, which for large problems actually achieves higher peak performance per device, however, given the strong scaling advantages of the FPGA design, in parallel setups with few thousand particles per device, the FPGA platform achieves highest performance and power efficiency. AU - Menzel, Johannes AU - Plessl, Christian AU - Kenter, Tobias ID - 28099 IS - 1 JF - ACM Transactions on Reconfigurable Technology and Systems SN - 1936-7406 TI - The Strong Scaling Advantage of FPGAs in HPC for N-body Simulations VL - 15 ER - TY - JOUR AB -

State-of-the-art methods in materials science such as artificial intelligence and data-driven techniques advance the investigation of photovoltaic materials.

AU - Mirhosseini, Hossein AU - Kormath Madam Raghupathy, Ramya AU - Sahoo, Sudhir K. AU - Wiebeler, Hendrik AU - Chugh, Manjusha AU - Kühne, Thomas D. ID - 32246 IS - 46 JF - Physical Chemistry Chemical Physics KW - Physical and Theoretical Chemistry KW - General Physics and Astronomy SN - 1463-9076 TI - In silico investigation of Cu(In,Ga)Se2-based solar cells VL - 22 ER - TY - JOUR AB - In scientific computing, the acceleration of atomistic computer simulations by means of custom hardware is finding ever-growing application. A major limitation, however, is that the high efficiency in terms of performance and low power consumption entails the massive usage of low precision computing units. Here, based on the approximate computing paradigm, we present an algorithmic method to compensate for numerical inaccuracies due to low accuracy arithmetic operations rigorously, yet still obtaining exact expectation values using a properly modified Langevin-type equation. AU - Rengaraj, Varadarajan AU - Lass, Michael AU - Plessl, Christian AU - Kühne, Thomas ID - 12878 IS - 2 JF - Computation TI - Accurate Sampling with Noisy Forces from Approximate Computing VL - 8 ER - TY - JOUR AB - CP2K is an open source electronic structure and molecular dynamics software package to perform atomistic simulations of solid-state, liquid, molecular, and biological systems. It is especially aimed at massively parallel and linear-scaling electronic structure methods and state-of-theart ab initio molecular dynamics simulations. Excellent performance for electronic structure calculations is achieved using novel algorithms implemented for modern high-performance computing systems. This review revisits the main capabilities of CP2K to perform efficient and accurate electronic structure simulations. The emphasis is put on density functional theory and multiple post–Hartree–Fock methods using the Gaussian and plane wave approach and its augmented all-electron extension. AU - Kühne, Thomas AU - Iannuzzi, Marcella AU - Ben, Mauro Del AU - Rybkin, Vladimir V. AU - Seewald, Patrick AU - Stein, Frederick AU - Laino, Teodoro AU - Khaliullin, Rustam Z. AU - Schütt, Ole AU - Schiffmann, Florian AU - Golze, Dorothea AU - Wilhelm, Jan AU - Chulkov, Sergey AU - Mohammad Hossein Bani-Hashemian, Mohammad Hossein Bani-Hashemian AU - Weber, Valéry AU - Borstnik, Urban AU - Taillefumier, Mathieu AU - Jakobovits, Alice Shoshana AU - Lazzaro, Alfio AU - Pabst, Hans AU - Müller, Tiziano AU - Schade, Robert AU - Guidon, Manuel AU - Andermatt, Samuel AU - Holmberg, Nico AU - Schenter, Gregory K. AU - Hehn, Anna AU - Bussy, Augustin AU - Belleflamme, Fabian AU - Tabacchi, Gloria AU - Glöß, Andreas AU - Lass, Michael AU - Bethune, Iain AU - Mundy, Christopher J. AU - Plessl, Christian AU - Watkins, Matt AU - VandeVondele, Joost AU - Krack, Matthias AU - Hutter, Jürg ID - 16277 IS - 19 JF - The Journal of Chemical Physics TI - CP2K: An electronic structure and molecular dynamics software package - Quickstep: Efficient and accurate electronic structure calculations VL - 152 ER - TY - JOUR AB - We address the general mathematical problem of computing the inverse p-th root of a given matrix in an efficient way. A new method to construct iteration functions that allow calculating arbitrary p-th roots and their inverses of symmetric positive definite matrices is presented. We show that the order of convergence is at least quadratic and that adaptively adjusting a parameter q always leads to an even faster convergence. In this way, a better performance than with previously known iteration schemes is achieved. The efficiency of the iterative functions is demonstrated for various matrices with different densities, condition numbers and spectral radii. AU - Richters, Dorothee AU - Lass, Michael AU - Walther, Andrea AU - Plessl, Christian AU - Kühne, Thomas ID - 21 IS - 2 JF - Communications in Computational Physics TI - A General Algorithm to Calculate the Inverse Principal p-th Root of Symmetric Positive Definite Matrices VL - 25 ER - TY - JOUR AU - Platzner, Marco AU - Plessl, Christian ID - 12871 JF - Informatik Spektrum SN - 0170-6012 TI - FPGAs im Rechenzentrum ER - TY - JOUR AU - Riebler, Heinrich AU - Vaz, Gavin Francis AU - Kenter, Tobias AU - Plessl, Christian ID - 7689 IS - 2 JF - ACM Trans. Archit. Code Optim. (TACO) KW - htrop TI - Transparent Acceleration for Heterogeneous Platforms with Compilation to OpenCL VL - 16 ER - TY - JOUR AU - Mertens, Jan Cedric AU - Boschmann, Alexander AU - Schmidt, M. AU - Plessl, Christian ID - 6516 IS - 4 JF - Sports Engineering SN - 1369-7072 TI - Sprint diagnostic with GPS and inertial sensor fusion VL - 21 ER - TY - JOUR AU - Luk, Samuel M. H. AU - Lewandowski, P. AU - Kwong, N. H. AU - Baudin, E. AU - Lafont, O. AU - Tignon, J. AU - Leung, P. T. AU - Chan, Ch. K. P. AU - Babilon, M. AU - Schumacher, Stefan AU - Binder, R. ID - 13348 IS - 1 JF - Journal of the Optical Society of America B SN - 0740-3224 TI - Theory of optically controlled anisotropic polariton transport in semiconductor double microcavities VL - 35 ER - TY - JOUR AB - Approximate computing has shown to provide new ways to improve performance and power consumption of error-resilient applications. While many of these applications can be found in image processing, data classification or machine learning, we demonstrate its suitability to a problem from scientific computing. Utilizing the self-correcting behavior of iterative algorithms, we show that approximate computing can be applied to the calculation of inverse matrix p-th roots which are required in many applications in scientific computing. Results show great opportunities to reduce the computational effort and bandwidth required for the execution of the discussed algorithm, especially when targeting special accelerator hardware. AU - Lass, Michael AU - Kühne, Thomas AU - Plessl, Christian ID - 20 IS - 2 JF - Embedded Systems Letters SN - 1943-0663 TI - Using Approximate Computing for the Calculation of Inverse Matrix p-th Roots VL - 10 ER - TY - JOUR AB - Branch and bound (B&B) algorithms structure the search space as a tree and eliminate infeasible solutions early by pruning subtrees that cannot lead to a valid or optimal solution. Custom hardware designs significantly accelerate the execution of these algorithms. In this article, we demonstrate a high-performance B&B implementation on FPGAs. First, we identify general elements of B&B algorithms and describe their implementation as a finite state machine. Then, we introduce workers that autonomously cooperate using work stealing to allow parallel execution and full utilization of the target FPGA. Finally, we explore advantages of instance-specific designs that target a specific problem instance to improve performance. We evaluate our concepts by applying them to a branch and bound problem, the reconstruction of corrupted AES keys obtained from cold-boot attacks. The evaluation shows that our work stealing approach is scalable with the available resources and provides speedups proportional to the number of workers. Instance-specific designs allow us to achieve an overall speedup of 47 × compared to the fastest implementation of AES key reconstruction so far. Finally, we demonstrate how instance-specific designs can be generated just-in-time such that the provided speedups outweigh the additional time required for design synthesis. AU - Riebler, Heinrich AU - Lass, Michael AU - Mittendorf, Robert AU - Löcke, Thomas AU - Plessl, Christian ID - 18 IS - 3 JF - ACM Transactions on Reconfigurable Technology and Systems (TRETS) KW - coldboot SN - 1936-7406 TI - Efficient Branch and Bound on FPGAs Using Work Stealing and Instance-Specific Designs VL - 10 ER - TY - JOUR AU - Schumacher, Jörn AU - Plessl, Christian AU - Vandelli, Wainer ID - 1589 JF - Journal of Physics: Conference Series TI - High-Throughput and Low-Latency Network Communication with NetIO VL - 898 ER - TY - JOUR AB - A broad spectrum of applications can be accelerated by offloading computation intensive parts to reconfigurable hardware. However, to achieve speedups, the number of loop it- erations (trip count) needs to be sufficiently large to amortize offloading overheads. Trip counts are frequently not known at compile time, but only at runtime just before entering a loop. Therefore, we propose to generate code for both the CPU and the coprocessor, and defer the offloading decision to the application runtime. We demonstrate how a toolflow, based on the LLVM compiler framework, can automatically embed dynamic offloading de- cisions into the application code. We perform in-depth static and dynamic analysis of pop- ular benchmarks, which confirm the general potential of such an approach. We also pro- pose to optimize the offloading process by decoupling the runtime decision from the loop execution (decision slack). The feasibility of our approach is demonstrated by a toolflow that automatically identifies suitable data-parallel loops and generates code for the FPGA coprocessor of a Convey HC-1. We evaluate the integrated toolflow with representative loops executed for different input data sizes. AU - Vaz, Gavin Francis AU - Riebler, Heinrich AU - Kenter, Tobias AU - Plessl, Christian ID - 165 JF - Computers and Electrical Engineering SN - 0045-7906 TI - Potential and Methods for Embedding Dynamic Offloading Decisions into Application Code VL - 55 ER - TY - JOUR AU - Plessl, Christian AU - Platzner, Marco AU - Schreier, Peter J. ID - 1768 IS - 5 JF - Informatik Spektrum KW - approximate computing KW - survey TI - Aktuelles Schlagwort: Approximate Computing ER - TY - JOUR AB - FPGAs are known to permit huge gains in performance and efficiency for suitable applications but still require reduced design efforts and shorter development cycles for wider adoption. In this work, we compare the resulting performance of two design concepts that in different ways promise such increased productivity. As common starting point, we employ a kernel-centric design approach, where computational hotspots in an application are identified and individually accelerated on FPGA. By means of a complex stereo matching application, we evaluate two fundamentally different design philosophies and approaches for implementing the required kernels on FPGAs. In the first implementation approach, we designed individually specialized data flow kernels in a spatial programming language for a Maxeler FPGA platform; in the alternative design approach, we target a vector coprocessor with large vector lengths, which is implemented as a form of programmable overlay on the application FPGAs of a Convey HC-1. We assess both approaches in terms of overall system performance, raw kernel performance, and performance relative to invested resources. After compensating for the effects of the underlying hardware platforms, the specialized dataflow kernels on the Maxeler platform are around 3x faster than kernels executing on the Convey vector coprocessor. In our concrete scenario, due to trade-offs between reconfiguration overheads and exposed parallelism, the advantage of specialized dataflow kernels is reduced to around 2.5x. AU - Kenter, Tobias AU - Schmitz, Henning AU - Plessl, Christian ID - 296 JF - International Journal of Reconfigurable Computing (IJRC) TI - Exploring Tradeoffs between Specialized Kernels and a Reusable Overlay in a Stereo-Matching Case Study VL - 2015 ER - TY - JOUR AB - The ATLAS experiment at CERN is planning full deployment of a new unified optical link technology for connecting detector front end electronics on the timescale of the LHC Run 4 (2025). It is estimated that roughly 8000 GBT (GigaBit Transceiver) links, with transfer rates up to 10.24 Gbps, will replace existing links used for readout, detector control and distribution of timing and trigger information. A new class of devices will be needed to interface many GBT links to the rest of the trigger, data-acquisition and detector control systems. In this paper FELIX (Front End LInk eXchange) is presented, a PC-based device to route data from and to multiple GBT links via a high-performance general purpose network capable of a total throughput up to O(20 Tbps). FELIX implies architectural changes to the ATLAS data acquisition system, such as the use of industry standard COTS components early in the DAQ chain. Additionally the design and implementation of a FELIX demonstration platform is presented and hardware and software aspects will be discussed. AU - Anderson, J AU - Borga, A AU - Boterenbrood, H AU - Chen, H AU - Chen, K AU - Drake, G AU - Francis, D AU - Gorini, B AU - Lanni, F AU - Lehmann Miotto, G AU - Levinson, L AU - Narevicius, J AU - Plessl, Christian AU - Roich, A AU - Ryu, S AU - Schreuder, F AU - Schumacher, Jörn AU - Vandelli, Wainer AU - Vermeulen, J AU - Zhang, J ID - 1775 JF - Journal of Physics: Conference Series TI - FELIX: a High-Throughput Network Approach for Interfacing to Front End Electronics for ATLAS Upgrades VL - 664 ER - TY - JOUR AB - In this article an efficient numerical method to solve multiobjective optimization problems for fluid flow governed by the Navier Stokes equations is presented. In order to decrease the computational effort, a reduced order model is introduced using Proper Orthogonal Decomposition and a corresponding Galerkin Projection. A global, derivative free multiobjective optimization algorithm is applied to compute the Pareto set (i.e. the set of optimal compromises) for the concurrent objectives minimization of flow field fluctuations and control cost. The method is illustrated for a 2D flow around a cylinder at Re = 100. AU - Peitz, Sebastian AU - Dellnitz, Michael ID - 1774 IS - 1 JF - PAMM SN - 1617-7061 TI - Multiobjective Optimization of the Flow Around a Cylinder Using Model Order Reduction VL - 15 ER - TY - JOUR AU - Torresen, Jim AU - Plessl, Christian AU - Yao, Xin ID - 1772 IS - 7 JF - IEEE Computer KW - self-awareness KW - self-expression TI - Self-Aware and Self-Expressive Systems – Guest Editor's Introduction VL - 48 ER - TY - JOUR AB - Große zylindrische Stahlprüflinge werden mittels der Methode der finiten Differenzen im Zeitbereich (engl. finite differences in time domain, FDTD) simulativ untersucht. Dabei werden Pitch-Catch-Messanordnungen verwendet. Es werden zwei Bildgebungsansätze vorgestellt: ersterer basiert auf dem Imaging Principle nach Claerbout, letzterer basiert auf gradientenbasierter Optimierung eines Zielfunktionals. AU - Hegler, Sebastian AU - Statz, Christoph AU - Mütze, Marco AU - Mooshofer, Hubert AU - Goldammer, Matthias AU - Fendt, Karl AU - Schwarzer, Stefan AU - Feldhoff, Kim AU - Flehmig, Martin AU - Markwardt, Ulf AU - E. Nagel, Wolfgang AU - Schütte, Maria AU - Walther, Andrea AU - Meinel, Michael AU - Basermann, Achim AU - Plettemeier, Dirk ID - 1769 IS - 9 JF - tm - Technisches Messen TI - Simulative Ultraschall-Untersuchung von Pitch-Catch-Messanordnungen für große zylindrische Stahl-Prüflinge und gradientenbasierte Bildgebung VL - 82 ER - TY - JOUR AU - Giefers, Heiner AU - Plessl, Christian AU - Förstner, Jens ID - 1779 IS - 5 JF - ACM SIGARCH Computer Architecture News KW - funding-maxup KW - tet_topic_hpc SN - 0163-5964 TI - Accelerating Finite Difference Time Domain Simulations with Reconfigurable Dataflow Computers VL - 41 ER -