TY - JOUR AB - The rise of exascale supercomputers has fueled competition among GPU vendors, driving lattice QCD developers to write code that supports multiple APIs. Moreover, new developments in algorithms and physics research require frequent updates to existing software. These challenges have to be balanced against constantly changing personnel. At the same time, there is a wide range of applications for HISQ fermions in QCD studies. This situation encourages the development of software featuring a HISQ action that is flexible, high-performing, open source, easy to use, and easy to adapt. In this technical paper, we explain the design strategy, provide implementation details, list available algorithms and modules, and show key performance indicators for SIMULATeQCD, a simple multi-GPU lattice code for large-scale QCD calculations, mainly developed and used by the HotQCD collaboration. The code is publicly available on GitHub. AU - Mazur, Lukas AU - Bollweg, Dennis AU - Clarke, David A. AU - Altenkort, Luis AU - Kaczmarek, Olaf AU - Larsen, Rasmus AU - Shu, Hai-Tao AU - Goswami, Jishnu AU - Scior, Philipp AU - Sandmeyer, Hauke AU - Neumann, Marius AU - Dick, Henrik AU - Ali, Sajid AU - Kim, Jangho AU - Schmidt, Christian AU - Petreczky, Peter AU - Mukherjee, Swagato ID - 46120 JF - Computer Physics Communications TI - SIMULATeQCD: A simple multi-GPU lattice code for QCD calculations ER - TY - JOUR AU - Altenkort, Luis AU - Eller, Alexander M. AU - Francis, Anthony AU - Kaczmarek, Olaf AU - Mazur, Lukas AU - Moore, Guy D. AU - Shu, Hai-Tao ID - 46119 IS - 1 JF - Physical Review D SN - 2470-0010 TI - Viscosity of pure-glue QCD from the lattice VL - 108 ER - TY - JOUR AB - While FPGA accelerator boards and their respective high-level design tools are maturing, there is still a lack of multi-FPGA applications, libraries, and not least, benchmarks and reference implementations towards sustained HPC usage of these devices. As in the early days of GPUs in HPC, for workloads that can reasonably be decoupled into loosely coupled working sets, multi-accelerator support can be achieved by using standard communication interfaces like MPI on the host side. However, for performance and productivity, some applications can profit from a tighter coupling of the accelerators. FPGAs offer unique opportunities here when extending the dataflow characteristics to their communication interfaces. In this work, we extend the HPCC FPGA benchmark suite by multi-FPGA support and three missing benchmarks that particularly characterize or stress inter-device communication: b_eff, PTRANS, and LINPACK. With all benchmarks implemented for current boards with Intel and Xilinx FPGAs, we established a baseline for multi-FPGA performance. Additionally, for the communication-centric benchmarks, we explored the potential of direct FPGA-to-FPGA communication with a circuit-switched inter-FPGA network that is currently only available for one of the boards. The evaluation with parallel execution on up to 26 FPGA boards makes use of one of the largest academic FPGA installations. AU - Meyer, Marius AU - Kenter, Tobias AU - Plessl, Christian ID - 38041 JF - ACM Transactions on Reconfigurable Technology and Systems KW - General Computer Science SN - 1936-7406 TI - Multi-FPGA Designs and Scaling of HPC Challenge Benchmarks via MPI and Circuit-Switched Inter-FPGA Networks ER - TY - JOUR AB - The non-orthogonal local submatrix method applied to electronic structure–based molecular dynamics simulations is shown to exceed 1.1 EFLOP/s in FP16/FP32-mixed floating-point arithmetic when using 4400 NVIDIA A100 GPUs of the Perlmutter system. This is enabled by a modification of the original method that pushes the sustained fraction of the peak performance to about 80%. Example calculations are performed for SARS-CoV-2 spike proteins with up to 83 million atoms. AU - Schade, Robert AU - Kenter, Tobias AU - Elgabarty, Hossam AU - Lass, Michael AU - Kühne, Thomas AU - Plessl, Christian ID - 45361 JF - The International Journal of High Performance Computing Applications KW - Hardware and Architecture KW - Theoretical Computer Science KW - Software SN - 1094-3420 TI - Breaking the exascale barrier for the electronic structure problem in ab-initio molecular dynamics ER - TY - JOUR AU - Hou, W AU - Yao, Y AU - Li, Y AU - Peng, B AU - Shi, K AU - Zhou, Z AU - Pan, J AU - Liu, M AU - Hu, J ID - 32183 IS - 1 JF - Frontiers of materials science SN - 2095-025x TI - Linearly shifting ferromagnetic resonance response of La0.7Sr0.3MnO3 thin film for body temperature sensors VL - 16 ER - TY - JOUR AU - Wojciechowski, M ID - 32234 JF - Data Brief SN - 2352-3409 TI - Dataset for random uniform distributions of 2D circles and 3D spheres. VL - 43 ER - TY - JOUR AB - AbstractTailored nanoscale quantum light sources, matching the specific needs of use cases, are crucial building blocks for photonic quantum technologies. Several different approaches to realize solid-state quantum emitters with high performance have been pursued and different concepts for energy tuning have been established. However, the properties of the emitted photons are always defined by the individual quantum emitter and can therefore not be controlled with full flexibility. Here we introduce an all-optical nonlinear method to tailor and control the single photon emission. We demonstrate a laser-controlled down-conversion process from an excited state of a semiconductor quantum three-level system. Based on this concept, we realize energy tuning and polarization control of the single photon emission with a control-laser field. Our results mark an important step towards tailored single photon emission from a photonic quantum system based on quantum optical principles. AU - Jonas, B. AU - Heinze, Dirk Florian AU - Schöll, E. AU - Kallert, P. AU - Langer, T. AU - Krehs, S. AU - Widhalm, A. AU - Jöns, Klaus AU - Reuter, Dirk AU - Schumacher, Stefan AU - Zrenner, Artur ID - 40523 IS - 1 JF - Nature Communications KW - General Physics and Astronomy KW - General Biochemistry KW - Genetics and Molecular Biology KW - General Chemistry KW - Multidisciplinary SN - 2041-1723 TI - Nonlinear down-conversion in a single quantum dot VL - 13 ER - TY - JOUR AU - Altenkort, Luis AU - Eller, Alexander M. AU - Kaczmarek, O. AU - Mazur, Lukas AU - Moore, Guy D. AU - Shu, Hai-Tao ID - 46121 IS - 9 JF - Physical Review D SN - 2470-0010 TI - Lattice QCD noise reduction for bosonic correlators through blocking VL - 105 ER - TY - JOUR AB - A parallel hybrid quantum-classical algorithm for the solution of the quantum-chemical ground-state energy problem on gate-based quantum computers is presented. This approach is based on the reduced density-matrix functional theory (RDMFT) formulation of the electronic structure problem. For that purpose, the density-matrix functional of the full system is decomposed into an indirectly coupled sum of density-matrix functionals for all its subsystems using the adaptive cluster approximation to RDMFT. The approximations involved in the decomposition and the adaptive cluster approximation itself can be systematically converged to the exact result. The solutions for the density-matrix functionals of the effective subsystems involves a constrained minimization over many-particle states that are approximated by parametrized trial states on the quantum computer similarly to the variational quantum eigensolver. The independence of the density-matrix functionals of the effective subsystems introduces a new level of parallelization and allows for the computational treatment of much larger molecules on a quantum computer with a given qubit count. In addition, for the proposed algorithm techniques are presented to reduce the qubit count, the number of quantum programs, as well as its depth. The evaluation of a density-matrix functional as the essential part of our approach is demonstrated for Hubbard-like systems on IBM quantum computers based on superconducting transmon qubits. AU - Schade, Robert AU - Bauer, Carsten AU - Tamoev, Konstantin AU - Mazur, Lukas AU - Plessl, Christian AU - Kühne, Thomas ID - 33226 JF - Phys. Rev. Research TI - Parallel quantum chemistry on noisy intermediate-scale quantum computers VL - 4 ER - TY - JOUR AU - Schade, Robert AU - Kenter, Tobias AU - Elgabarty, Hossam AU - Lass, Michael AU - Schütt, Ole AU - Lazzaro, Alfio AU - Pabst, Hans AU - Mohr, Stephan AU - Hutter, Jürg AU - Kühne, Thomas AU - Plessl, Christian ID - 33684 JF - Parallel Computing KW - Artificial Intelligence KW - Computer Graphics and Computer-Aided Design KW - Computer Networks and Communications KW - Hardware and Architecture KW - Theoretical Computer Science KW - Software SN - 0167-8191 TI - Towards electronic structure-based ab-initio molecular dynamics simulations with hundreds of millions of atoms VL - 111 ER - TY - JOUR AU - Meyer, Marius AU - Kenter, Tobias AU - Plessl, Christian ID - 27364 JF - Journal of Parallel and Distributed Computing SN - 0743-7315 TI - In-depth FPGA Accelerator Performance Evaluation with Single Node Benchmarks from the HPC Challenge Benchmark Suite for Intel and Xilinx FPGAs using OpenCL ER - TY - JOUR AB - Recent advances in numerical methods significantly pushed forward the understanding of electrons coupled to quantized lattice vibrations. At this stage, it becomes increasingly important to also account for the effects of physically inevitable environments. In particular, we study the transport properties of the Hubbard-Holstein Hamiltonian that models a large class of materials characterized by strong electron-phonon coupling, in contact with a dissipative environment. Even in the one-dimensional and isolated case, simulating the quantum dynamics of such a system with high accuracy is very challenging due to the infinite dimensionality of the phononic Hilbert spaces. For this reason, the effects of dissipation on the conductance properties of such systems have not been investigated systematically so far. We combine the non-Markovian hierarchy of pure states method and the Markovian quantum jumps method with the newly introduced projected purified density-matrix renormalization group, creating powerful tensor-network methods for dissipative quantum many-body systems. Investigating their numerical properties, we find a significant speedup up to a factor $\sim 30$ compared to conventional tensor-network techniques. We apply these methods to study dissipative quenches, aiming for an in-depth understanding of the formation, stability, and quasi-particle properties of bipolarons. Surprisingly, our results show that in the metallic phase dissipation localizes the bipolarons, which is reminiscent of an indirect quantum Zeno effect. However, the bipolaronic binding energy remains mainly unaffected, even in the presence of strong dissipation, exhibiting remarkable bipolaron stability. These findings shed light on the problem of designing real materials exhibiting phonon-mediated high-$T_\mathrm{C}$ superconductivity. AU - Moroder, Mattia AU - Grundner, Martin AU - Damanet, François AU - Schollwöck, Ulrich AU - Mardazad, Sam AU - Flannigan, Stuart AU - Köhler, Thomas AU - Paeckel, Sebastian ID - 50146 JF - Physical Review B 107, 214310 (2023) TI - Stable bipolarons in open quantum systems ER - TY - JOUR AB - We develop a general decomposition of an ensemble of initial density profiles in terms of an average state and a basis of modes that represent the event-by-event fluctuations of the initial state. The basis is determined such that the probability distributions of the amplitudes of different modes are uncorrelated. Based on this decomposition, we quantify the different types and probabilities of event-by-event fluctuations in Glauber and Saturation models and investigate how the various modes affect different characteristics of the initial state. We perform simulations of the dynamical evolution with KoMPoST and MUSIC to investigate the impact of the modes on final-state observables and their correlations. AU - Borghini, Nicolas AU - Borrell, Marc AU - Feld, Nina AU - Roch, Hendrik AU - Schlichting, Sören AU - Werthmann, Clemens ID - 50148 JF - Phys. Rev. C 107 (2023) 034905 TI - Statistical analysis of initial state and final state response in heavy-ion collisions ER - TY - JOUR AB - Abstract RNA editing processes are strikingly different in animals and plants. Up to thousands of specific cytidines are converted into uridines in plant chloroplasts and mitochondria whereas up to millions of adenosines are converted into inosines in animal nucleo-cytosolic RNAs. It is unknown whether these two different RNA editing machineries are mutually incompatible. RNA-binding pentatricopeptide repeat (PPR) proteins are the key factors of plant organelle cytidine-to-uridine RNA editing. The complete absence of PPR mediated editing of cytosolic RNAs might be due to a yet unknown barrier that prevents its activity in the cytosol. Here, we transferred two plant mitochondrial PPR-type editing factors into human cell lines to explore whether they could operate in the nucleo-cytosolic environment. PPR56 and PPR65 not only faithfully edited their native, co-transcribed targets but also different sets of off-targets in the human background transcriptome. More than 900 of such off-targets with editing efficiencies up to 91%, largely explained by known PPR-RNA binding properties, were identified for PPR56. Engineering two crucial amino acid positions in its PPR array led to predictable shifts in target recognition. We conclude that plant PPR editing factors can operate in the entirely different genetic environment of the human nucleo-cytosol and can be intentionally re-engineered towards new targets. AU - Lesch, Elena AU - Schilling, Maximilian T AU - Brenner, Sarah AU - Yang, Yingying AU - Gruss, Oliver J AU - Knoop, Volker AU - Schallenberg-Rüdinger, Mareike ID - 50149 IS - 17 JF - Nucleic Acids Research KW - Genetics SN - 0305-1048 TI - Plant mitochondrial RNA editing factors can perform targeted C-to-U editing of nuclear transcripts in human cells VL - 50 ER - TY - JOUR AB - N-body methods are one of the essential algorithmic building blocks of high-performance and parallel computing. Previous research has shown promising performance for implementing n-body simulations with pairwise force calculations on FPGAs. However, to avoid challenges with accumulation and memory access patterns, the presented designs calculate each pair of forces twice, along with both force sums of the involved particles. Also, they require large problem instances with hundreds of thousands of particles to reach their respective peak performance, limiting the applicability for strong scaling scenarios. This work addresses both issues by presenting a novel FPGA design that uses each calculated force twice and overlaps data transfers and computations in a way that allows to reach peak performance even for small problem instances, outperforming previous single precision results even in double precision, and scaling linearly over multiple interconnected FPGAs. For a comparison across architectures, we provide an equally optimized CPU reference, which for large problems actually achieves higher peak performance per device, however, given the strong scaling advantages of the FPGA design, in parallel setups with few thousand particles per device, the FPGA platform achieves highest performance and power efficiency. AU - Menzel, Johannes AU - Plessl, Christian AU - Kenter, Tobias ID - 28099 IS - 1 JF - ACM Transactions on Reconfigurable Technology and Systems SN - 1936-7406 TI - The Strong Scaling Advantage of FPGAs in HPC for N-body Simulations VL - 15 ER - TY - JOUR AB - Abstract The defining feature of active particles is that they constantly propel themselves by locally converting chemical energy into directed motion. This active self-propulsion prevents them from equilibrating with their thermal environment (e.g. an aqueous solution), thus keeping them permanently out of equilibrium. Nevertheless, the spatial dynamics of active particles might share certain equilibrium features, in particular in the steady state. We here focus on the time-reversal symmetry of individual spatial trajectories as a distinct equilibrium characteristic. We investigate to what extent the steady-state trajectories of a trapped active particle obey or break this time-reversal symmetry. Within the framework of active Ornstein–Uhlenbeck particles we find that the steady-state trajectories in a harmonic potential fulfill path-wise time-reversal symmetry exactly, while this symmetry is typically broken in anharmonic potentials. AU - Dabelow, Lennart AU - Bo, Stefano AU - Eichhorn, Ralf ID - 32243 IS - 3 JF - Journal of Statistical Mechanics: Theory and Experiment KW - Statistics KW - Probability and Uncertainty KW - Statistics and Probability KW - Statistical and Nonlinear Physics SN - 1742-5468 TI - How irreversible are steady-state trajectories of a trapped active particle? VL - 2021 ER - TY - JOUR AU - Kaczmarek, Olaf AU - Mazur, Lukas AU - Sharma, Sayantan ID - 46122 IS - 9 JF - Physical Review D SN - 2470-0010 TI - Eigenvalue spectra of QCD and the fate of UA(1) breaking towards the chiral limit VL - 104 ER - TY - JOUR AU - Altenkort, Luis AU - Eller, Alexander M. AU - Kaczmarek, O. AU - Mazur, Lukas AU - Moore, Guy D. AU - Shu, H.-T. ID - 46124 IS - 1 JF - Physical Review D SN - 2470-0010 TI - Heavy quark momentum diffusion from the lattice using gradient flow VL - 103 ER - TY - JOUR AU - Altenkort, Luis AU - Eller, Alexander M. AU - Kaczmarek, O. AU - Mazur, Lukas AU - Moore, Guy D. AU - Shu, H.-T. ID - 46123 IS - 11 JF - Physical Review D SN - 2470-0010 TI - Sphaleron rate from Euclidean lattice correlators: An exploration VL - 103 ER - TY - JOUR AU - Alhaddad, Samer AU - Förstner, Jens AU - Groth, Stefan AU - Grünewald, Daniel AU - Grynko, Yevgen AU - Hannig, Frank AU - Kenter, Tobias AU - Pfreundt, Franz‐Josef AU - Plessl, Christian AU - Schotte, Merlind AU - Steinke, Thomas AU - Teich, Jürgen AU - Weiser, Martin AU - Wende, Florian ID - 24788 JF - Concurrency and Computation: Practice and Experience KW - tet_topic_hpc SN - 1532-0626 TI - The HighPerMeshes framework for numerical algorithms on unstructured grids ER - TY - JOUR AB -

The effect of traces of ethanol in supercritical carbon dioxide on the mixture's thermodynamic properties is studied by molecular simulations and Taylor dispersion measurements.

AU - Chatwell, René Spencer AU - Guevara-Carrion, Gabriela AU - Gaponenko, Yuri AU - Shevtsova, Valentina AU - Vrabec, Jadran ID - 32240 IS - 4 JF - Physical Chemistry Chemical Physics KW - Physical and Theoretical Chemistry KW - General Physics and Astronomy SN - 1463-9076 TI - Diffusion of the carbon dioxide–ethanol mixture in the extended critical region VL - 23 ER - TY - JOUR AB -

State-of-the-art methods in materials science such as artificial intelligence and data-driven techniques advance the investigation of photovoltaic materials.

AU - Mirhosseini, Hossein AU - Kormath Madam Raghupathy, Ramya AU - Sahoo, Sudhir K. AU - Wiebeler, Hendrik AU - Chugh, Manjusha AU - Kühne, Thomas D. ID - 32246 IS - 46 JF - Physical Chemistry Chemical Physics KW - Physical and Theoretical Chemistry KW - General Physics and Astronomy SN - 1463-9076 TI - In silico investigation of Cu(In,Ga)Se2-based solar cells VL - 22 ER - TY - JOUR AB - CP2K is an open source electronic structure and molecular dynamics software package to perform atomistic simulations of solid-state, liquid, molecular, and biological systems. It is especially aimed at massively parallel and linear-scaling electronic structure methods and state-of-theart ab initio molecular dynamics simulations. Excellent performance for electronic structure calculations is achieved using novel algorithms implemented for modern high-performance computing systems. This review revisits the main capabilities of CP2K to perform efficient and accurate electronic structure simulations. The emphasis is put on density functional theory and multiple post–Hartree–Fock methods using the Gaussian and plane wave approach and its augmented all-electron extension. AU - Kühne, Thomas AU - Iannuzzi, Marcella AU - Ben, Mauro Del AU - Rybkin, Vladimir V. AU - Seewald, Patrick AU - Stein, Frederick AU - Laino, Teodoro AU - Khaliullin, Rustam Z. AU - Schütt, Ole AU - Schiffmann, Florian AU - Golze, Dorothea AU - Wilhelm, Jan AU - Chulkov, Sergey AU - Mohammad Hossein Bani-Hashemian, Mohammad Hossein Bani-Hashemian AU - Weber, Valéry AU - Borstnik, Urban AU - Taillefumier, Mathieu AU - Jakobovits, Alice Shoshana AU - Lazzaro, Alfio AU - Pabst, Hans AU - Müller, Tiziano AU - Schade, Robert AU - Guidon, Manuel AU - Andermatt, Samuel AU - Holmberg, Nico AU - Schenter, Gregory K. AU - Hehn, Anna AU - Bussy, Augustin AU - Belleflamme, Fabian AU - Tabacchi, Gloria AU - Glöß, Andreas AU - Lass, Michael AU - Bethune, Iain AU - Mundy, Christopher J. AU - Plessl, Christian AU - Watkins, Matt AU - VandeVondele, Joost AU - Krack, Matthias AU - Hutter, Jürg ID - 16277 IS - 19 JF - The Journal of Chemical Physics TI - CP2K: An electronic structure and molecular dynamics software package - Quickstep: Efficient and accurate electronic structure calculations VL - 152 ER - TY - JOUR AB - In scientific computing, the acceleration of atomistic computer simulations by means of custom hardware is finding ever-growing application. A major limitation, however, is that the high efficiency in terms of performance and low power consumption entails the massive usage of low precision computing units. Here, based on the approximate computing paradigm, we present an algorithmic method to compensate for numerical inaccuracies due to low accuracy arithmetic operations rigorously, yet still obtaining exact expectation values using a properly modified Langevin-type equation. AU - Rengaraj, Varadarajan AU - Lass, Michael AU - Plessl, Christian AU - Kühne, Thomas ID - 12878 IS - 2 JF - Computation TI - Accurate Sampling with Noisy Forces from Approximate Computing VL - 8 ER - TY - JOUR AU - Riebler, Heinrich AU - Vaz, Gavin Francis AU - Kenter, Tobias AU - Plessl, Christian ID - 7689 IS - 2 JF - ACM Trans. Archit. Code Optim. (TACO) KW - htrop TI - Transparent Acceleration for Heterogeneous Platforms with Compilation to OpenCL VL - 16 ER - TY - JOUR AB - We address the general mathematical problem of computing the inverse p-th root of a given matrix in an efficient way. A new method to construct iteration functions that allow calculating arbitrary p-th roots and their inverses of symmetric positive definite matrices is presented. We show that the order of convergence is at least quadratic and that adaptively adjusting a parameter q always leads to an even faster convergence. In this way, a better performance than with previously known iteration schemes is achieved. The efficiency of the iterative functions is demonstrated for various matrices with different densities, condition numbers and spectral radii. AU - Richters, Dorothee AU - Lass, Michael AU - Walther, Andrea AU - Plessl, Christian AU - Kühne, Thomas ID - 21 IS - 2 JF - Communications in Computational Physics TI - A General Algorithm to Calculate the Inverse Principal p-th Root of Symmetric Positive Definite Matrices VL - 25 ER - TY - JOUR AU - Platzner, Marco AU - Plessl, Christian ID - 12871 JF - Informatik Spektrum SN - 0170-6012 TI - FPGAs im Rechenzentrum ER - TY - JOUR AB - Approximate computing has shown to provide new ways to improve performance and power consumption of error-resilient applications. While many of these applications can be found in image processing, data classification or machine learning, we demonstrate its suitability to a problem from scientific computing. Utilizing the self-correcting behavior of iterative algorithms, we show that approximate computing can be applied to the calculation of inverse matrix p-th roots which are required in many applications in scientific computing. Results show great opportunities to reduce the computational effort and bandwidth required for the execution of the discussed algorithm, especially when targeting special accelerator hardware. AU - Lass, Michael AU - Kühne, Thomas AU - Plessl, Christian ID - 20 IS - 2 JF - Embedded Systems Letters SN - 1943-0663 TI - Using Approximate Computing for the Calculation of Inverse Matrix p-th Roots VL - 10 ER - TY - JOUR AU - Mertens, Jan Cedric AU - Boschmann, Alexander AU - Schmidt, M. AU - Plessl, Christian ID - 6516 IS - 4 JF - Sports Engineering SN - 1369-7072 TI - Sprint diagnostic with GPS and inertial sensor fusion VL - 21 ER - TY - JOUR AU - Luk, Samuel M. H. AU - Lewandowski, P. AU - Kwong, N. H. AU - Baudin, E. AU - Lafont, O. AU - Tignon, J. AU - Leung, P. T. AU - Chan, Ch. K. P. AU - Babilon, M. AU - Schumacher, Stefan AU - Binder, R. ID - 13348 IS - 1 JF - Journal of the Optical Society of America B SN - 0740-3224 TI - Theory of optically controlled anisotropic polariton transport in semiconductor double microcavities VL - 35 ER - TY - JOUR AB - Branch and bound (B&B) algorithms structure the search space as a tree and eliminate infeasible solutions early by pruning subtrees that cannot lead to a valid or optimal solution. Custom hardware designs significantly accelerate the execution of these algorithms. In this article, we demonstrate a high-performance B&B implementation on FPGAs. First, we identify general elements of B&B algorithms and describe their implementation as a finite state machine. Then, we introduce workers that autonomously cooperate using work stealing to allow parallel execution and full utilization of the target FPGA. Finally, we explore advantages of instance-specific designs that target a specific problem instance to improve performance. We evaluate our concepts by applying them to a branch and bound problem, the reconstruction of corrupted AES keys obtained from cold-boot attacks. The evaluation shows that our work stealing approach is scalable with the available resources and provides speedups proportional to the number of workers. Instance-specific designs allow us to achieve an overall speedup of 47 × compared to the fastest implementation of AES key reconstruction so far. Finally, we demonstrate how instance-specific designs can be generated just-in-time such that the provided speedups outweigh the additional time required for design synthesis. AU - Riebler, Heinrich AU - Lass, Michael AU - Mittendorf, Robert AU - Löcke, Thomas AU - Plessl, Christian ID - 18 IS - 3 JF - ACM Transactions on Reconfigurable Technology and Systems (TRETS) KW - coldboot SN - 1936-7406 TI - Efficient Branch and Bound on FPGAs Using Work Stealing and Instance-Specific Designs VL - 10 ER - TY - JOUR AU - Schumacher, Jörn AU - Plessl, Christian AU - Vandelli, Wainer ID - 1589 JF - Journal of Physics: Conference Series TI - High-Throughput and Low-Latency Network Communication with NetIO VL - 898 ER - TY - JOUR AB - A broad spectrum of applications can be accelerated by offloading computation intensive parts to reconfigurable hardware. However, to achieve speedups, the number of loop it- erations (trip count) needs to be sufficiently large to amortize offloading overheads. Trip counts are frequently not known at compile time, but only at runtime just before entering a loop. Therefore, we propose to generate code for both the CPU and the coprocessor, and defer the offloading decision to the application runtime. We demonstrate how a toolflow, based on the LLVM compiler framework, can automatically embed dynamic offloading de- cisions into the application code. We perform in-depth static and dynamic analysis of pop- ular benchmarks, which confirm the general potential of such an approach. We also pro- pose to optimize the offloading process by decoupling the runtime decision from the loop execution (decision slack). The feasibility of our approach is demonstrated by a toolflow that automatically identifies suitable data-parallel loops and generates code for the FPGA coprocessor of a Convey HC-1. We evaluate the integrated toolflow with representative loops executed for different input data sizes. AU - Vaz, Gavin Francis AU - Riebler, Heinrich AU - Kenter, Tobias AU - Plessl, Christian ID - 165 JF - Computers and Electrical Engineering SN - 0045-7906 TI - Potential and Methods for Embedding Dynamic Offloading Decisions into Application Code VL - 55 ER - TY - JOUR AB - Große zylindrische Stahlprüflinge werden mittels der Methode der finiten Differenzen im Zeitbereich (engl. finite differences in time domain, FDTD) simulativ untersucht. Dabei werden Pitch-Catch-Messanordnungen verwendet. Es werden zwei Bildgebungsansätze vorgestellt: ersterer basiert auf dem Imaging Principle nach Claerbout, letzterer basiert auf gradientenbasierter Optimierung eines Zielfunktionals. AU - Hegler, Sebastian AU - Statz, Christoph AU - Mütze, Marco AU - Mooshofer, Hubert AU - Goldammer, Matthias AU - Fendt, Karl AU - Schwarzer, Stefan AU - Feldhoff, Kim AU - Flehmig, Martin AU - Markwardt, Ulf AU - E. Nagel, Wolfgang AU - Schütte, Maria AU - Walther, Andrea AU - Meinel, Michael AU - Basermann, Achim AU - Plettemeier, Dirk ID - 1769 IS - 9 JF - tm - Technisches Messen TI - Simulative Ultraschall-Untersuchung von Pitch-Catch-Messanordnungen für große zylindrische Stahl-Prüflinge und gradientenbasierte Bildgebung VL - 82 ER - TY - JOUR AU - Torresen, Jim AU - Plessl, Christian AU - Yao, Xin ID - 1772 IS - 7 JF - IEEE Computer KW - self-awareness KW - self-expression TI - Self-Aware and Self-Expressive Systems – Guest Editor's Introduction VL - 48 ER - TY - JOUR AB - In this article an efficient numerical method to solve multiobjective optimization problems for fluid flow governed by the Navier Stokes equations is presented. In order to decrease the computational effort, a reduced order model is introduced using Proper Orthogonal Decomposition and a corresponding Galerkin Projection. A global, derivative free multiobjective optimization algorithm is applied to compute the Pareto set (i.e. the set of optimal compromises) for the concurrent objectives minimization of flow field fluctuations and control cost. The method is illustrated for a 2D flow around a cylinder at Re = 100. AU - Peitz, Sebastian AU - Dellnitz, Michael ID - 1774 IS - 1 JF - PAMM SN - 1617-7061 TI - Multiobjective Optimization of the Flow Around a Cylinder Using Model Order Reduction VL - 15 ER - TY - JOUR AB - FPGAs are known to permit huge gains in performance and efficiency for suitable applications but still require reduced design efforts and shorter development cycles for wider adoption. In this work, we compare the resulting performance of two design concepts that in different ways promise such increased productivity. As common starting point, we employ a kernel-centric design approach, where computational hotspots in an application are identified and individually accelerated on FPGA. By means of a complex stereo matching application, we evaluate two fundamentally different design philosophies and approaches for implementing the required kernels on FPGAs. In the first implementation approach, we designed individually specialized data flow kernels in a spatial programming language for a Maxeler FPGA platform; in the alternative design approach, we target a vector coprocessor with large vector lengths, which is implemented as a form of programmable overlay on the application FPGAs of a Convey HC-1. We assess both approaches in terms of overall system performance, raw kernel performance, and performance relative to invested resources. After compensating for the effects of the underlying hardware platforms, the specialized dataflow kernels on the Maxeler platform are around 3x faster than kernels executing on the Convey vector coprocessor. In our concrete scenario, due to trade-offs between reconfiguration overheads and exposed parallelism, the advantage of specialized dataflow kernels is reduced to around 2.5x. AU - Kenter, Tobias AU - Schmitz, Henning AU - Plessl, Christian ID - 296 JF - International Journal of Reconfigurable Computing (IJRC) TI - Exploring Tradeoffs between Specialized Kernels and a Reusable Overlay in a Stereo-Matching Case Study VL - 2015 ER - TY - JOUR AU - Plessl, Christian AU - Platzner, Marco AU - Schreier, Peter J. ID - 1768 IS - 5 JF - Informatik Spektrum KW - approximate computing KW - survey TI - Aktuelles Schlagwort: Approximate Computing ER - TY - JOUR AB - The ATLAS experiment at CERN is planning full deployment of a new unified optical link technology for connecting detector front end electronics on the timescale of the LHC Run 4 (2025). It is estimated that roughly 8000 GBT (GigaBit Transceiver) links, with transfer rates up to 10.24 Gbps, will replace existing links used for readout, detector control and distribution of timing and trigger information. A new class of devices will be needed to interface many GBT links to the rest of the trigger, data-acquisition and detector control systems. In this paper FELIX (Front End LInk eXchange) is presented, a PC-based device to route data from and to multiple GBT links via a high-performance general purpose network capable of a total throughput up to O(20 Tbps). FELIX implies architectural changes to the ATLAS data acquisition system, such as the use of industry standard COTS components early in the DAQ chain. Additionally the design and implementation of a FELIX demonstration platform is presented and hardware and software aspects will be discussed. AU - Anderson, J AU - Borga, A AU - Boterenbrood, H AU - Chen, H AU - Chen, K AU - Drake, G AU - Francis, D AU - Gorini, B AU - Lanni, F AU - Lehmann Miotto, G AU - Levinson, L AU - Narevicius, J AU - Plessl, Christian AU - Roich, A AU - Ryu, S AU - Schreuder, F AU - Schumacher, Jörn AU - Vandelli, Wainer AU - Vermeulen, J AU - Zhang, J ID - 1775 JF - Journal of Physics: Conference Series TI - FELIX: a High-Throughput Network Approach for Interfacing to Front End Electronics for ATLAS Upgrades VL - 664 ER - TY - JOUR AB - Due to the continuously shrinking device structures and increasing densities of FPGAs, thermal aspects have become the new focus for many research projects over the last years. Most researchers rely on temperature simulations to evaluate their novel thermal management techniques. However, these temperature simulations require a high computational effort if a detailed thermal model is used and their accuracies are often unclear. In contrast to simulations, the use of synthetic heat sources allows for experimental evaluation of temperature management methods. In this paper we investigate the creation of significant rises in temperature on modern FPGAs to enable future evaluation of thermal management techniques based on experiments. To that end, we have developed seven different heat-generating cores that use different subsets of FPGA resources. Our experimental results show that, according to external temperature probes connected to the FPGA’s heat sink, we can increase the temperature by an average of 81 !C. This corresponds to an average increase of 156.3 !C as measured by the built-in thermal diodes of our Virtex-5 FPGAs in less than 30 min by only utilizing about 21 percent of the slices. AU - Agne, Andreas AU - Hangmann, Hendrik AU - Happe, Markus AU - Platzner, Marco AU - Plessl, Christian ID - 363 IS - 8, Part B JF - Microprocessors and Microsystems TI - Seven Recipes for Setting Your FPGA on Fire – A Cookbook on Heat Generators VL - 38 ER - TY - JOUR AB - Self-aware computing is a paradigm for structuring and simplifying the design and operation of computing systems that face unprecedented levels of system dynamics and thus require novel forms of adaptivity. The generality of the paradigm makes it applicable to many types of computing systems and, previously, researchers started to introduce concepts of self-awareness to multicore architectures. In our work we build on a recent reference architectural framework as a model for self-aware computing and instantiate it for an FPGA-based heterogeneous multicore running the ReconOS reconfigurable architecture and operating system. After presenting the model for self-aware computing and ReconOS, we demonstrate with a case study how a multicore application built on the principle of self-awareness, autonomously adapts to changes in the workload and system state. Our work shows that the reference architectural framework as a model for self-aware computing can be practically applied and allows us to structure and simplify the design process, which is essential for designing complex future computing systems. AU - Agne, Andreas AU - Happe, Markus AU - Lösch, Achim AU - Plessl, Christian AU - Platzner, Marco ID - 365 IS - 2 JF - ACM Transactions on Reconfigurable Technology and Systems (TRETS) TI - Self-awareness as a Model for Designing and Operating Heterogeneous Multicores VL - 7 ER - TY - JOUR AB - The ReconOS operating system for reconfigurable computing offers a unified multi-threaded programming model and operating system services for threads executing in software and threads mapped to reconfigurable hardware. The operating system interface allows hardware threads to interact with software threads using well-known mechanisms such as semaphores, mutexes, condition variables, and message queues. By semantically integrating hardware accelerators into a standard operating system environment, ReconOS allows for rapid design space exploration, supports a structured application development process and improves the portability of applications AU - Agne, Andreas AU - Happe, Markus AU - Keller, Ariane AU - Lübbers, Enno AU - Plattner, Bernhard AU - Platzner, Marco AU - Plessl, Christian ID - 328 IS - 1 JF - IEEE Micro TI - ReconOS - An Operating System Approach for Reconfigurable Computing VL - 34 ER - TY - JOUR AU - Giefers, Heiner AU - Plessl, Christian AU - Förstner, Jens ID - 1779 IS - 5 JF - ACM SIGARCH Computer Architecture News KW - funding-maxup KW - tet_topic_hpc SN - 0163-5964 TI - Accelerating Finite Difference Time Domain Simulations with Reconfigurable Dataflow Computers VL - 41 ER - TY - JOUR AU - Kasap, Server AU - Redif, Soydan ID - 1792 IS - 3 JF - IEEE Trans. on Very Large Scale Integration (VLSI) Systems TI - Novel Field-Programmable Gate Array Architecture for Computing the Eigenvalue Decomposition of Para-Hermitian Polynomial Matrices VL - 22 ER - TY - JOUR AB - Virtualization technology makes data centers more dynamic and easier to administrate. Today, cloud providers offer customers access to complex applications running on virtualized hardware. Nevertheless, big virtualized data centers become stochastic environments and the simplification on the user side leads to many challenges for the provider. He has to find cost-efficient configurations and has to deal with dynamic environments to ensure service level objectives (SLOs). We introduce a software solution that reduces the degree of human intervention to manage clouds. It is designed as a multi-agent system (MAS) and placed on top of the Infrastructure as a Service (IaaS) layer. Worker agents allocate resources, configure applications, check the feasibility of requests, and generate cost estimates. They are equipped with application specific knowledge allowing it to estimate the type and number of necessary resources. During runtime, a worker agent monitors the job and adapts its resources to ensure the specified quality of service—even in noisy clouds where the job instances are influenced by other jobs. They interact with a scheduler agent, which takes care of limited resources and does a cost-aware scheduling by assigning jobs to times with low costs. The whole architecture is self-optimizing and able to use public or private clouds. Building a private cloud needs to face the challenge to find a mapping of virtual machines (VMs) to hosts. We present a rule-based mapping algorithm for VMs. It offers an interface where policies can be defined and combined in a generic way. The algorithm performs the initial mapping at request time as well as a remapping during runtime. It deals with policy and infrastructure changes. An energy-aware scheduler and the availability of cheap resources provided by a spot market are analyzed. We evaluated our approach by building up an SaaS stack, which assigns resources in consideration of an energy function and that ensures SLOs of two different applications, a brokerage system and a high-performance computing software. Experiments were done on a real cloud system and by simulations. AU - Niehörster, Oliver AU - Simon, Jens AU - Brinkmann, André AU - Keller, Axel AU - Krüger, Jens ID - 1965 IS - 3 JF - Journal of Grid Computing TI - Cost-aware and SLO Fulfilling Software as a Service VL - 10 ER - TY - JOUR AU - Gesing, Sandra AU - Grunzke, Richard AU - Krüger, Jens AU - Birkenheuer, Georg AU - Wewior, Martin AU - Schäfer, Patrick AU - Schuller, Bernd AU - Schuster, Johannes AU - Herres-Pawlis, Sonja AU - Breuers, Sebastian AU - Balaskó, Ákos AU - Kozlovszky, Miklos AU - Szikszay Fabri, Anna AU - Packschies, Lars AU - Kacsuk, Peter AU - Blunk, Dirk AU - Steinke, Thomas AU - Brinkmann, André AU - Fels, Gregor AU - Müller-Pfefferkorn, Ralph AU - Jäkel, René AU - Kohlbacher, Oliver ID - 2102 IS - 4 JF - Journal of Grid Computing TI - A Single Sign-On Infrastructure for Science Gateways on a Use Case for Structural Bioinformatics VL - 10 ER - TY - JOUR AU - Thielemans, Kris AU - Tsoumpas, Charalampos AU - Mustafovic, Sanida AU - Beisel, Tobias AU - Aguiar, Pablo AU - Dikaios, Nikolaos AU - W Jacobson, Matthew ID - 2172 IS - 4 JF - Physics in Medicine and Biology TI - STIR: Software for Tomographic Image Reconstruction Release 2 VL - 57 ER - TY - JOUR AU - Redif, Soydan AU - Kasap, Server ID - 2173 IS - 12 JF - Int. Journal of Electronics TI - Parallel algorithm for computation of second-order sequential best rotations VL - 100 ER - TY - JOUR AU - Kasap, Server AU - Benkrid, Khaled ID - 2174 IS - 6 JF - Journal of Computers TI - Parallel Processor Design and Implementation for Molecular Dynamics Simulations on a FPGA Parallel Computer VL - 7 ER - TY - JOUR AU - Herres-Pawlis, Sonja AU - Birkenheuer, Georg AU - Brinkmann, André AU - Gesing, Sandra AU - Grunzke, Richard AU - Jäkel, René AU - Kohlbacher, Oliver AU - Krüger, Jens AU - Dos Santos Vieira, Ines ID - 2176 JF - Studies in Health Technology and Informatics TI - Workflow-enhanced conformational analysis of guanidine zinc complexes via a science gateway VL - 175 ER -