TY - JOUR
AB - The rise of exascale supercomputers has fueled competition among GPU vendors, driving lattice QCD developers to write code that supports multiple APIs. Moreover, new developments in algorithms and physics research require frequent updates to existing software. These challenges have to be balanced against constantly changing personnel. At the same time, there is a wide range of applications for HISQ fermions in QCD studies. This situation encourages the development of software featuring a HISQ action that is flexible, high-performing, open source, easy to use, and easy to adapt. In this technical paper, we explain the design strategy, provide implementation details, list available algorithms and modules, and show key performance indicators for SIMULATeQCD, a simple multi-GPU lattice code for large-scale QCD calculations, mainly developed and used by the HotQCD collaboration. The code is publicly available on GitHub.
AU - Mazur, Lukas
AU - Bollweg, Dennis
AU - Clarke, David A.
AU - Altenkort, Luis
AU - Kaczmarek, Olaf
AU - Larsen, Rasmus
AU - Shu, Hai-Tao
AU - Goswami, Jishnu
AU - Scior, Philipp
AU - Sandmeyer, Hauke
AU - Neumann, Marius
AU - Dick, Henrik
AU - Ali, Sajid
AU - Kim, Jangho
AU - Schmidt, Christian
AU - Petreczky, Peter
AU - Mukherjee, Swagato
ID - 46120
JF - Computer Physics Communications
TI - SIMULATeQCD: A simple multi-GPU lattice code for QCD calculations
ER -
TY - JOUR
AU - Altenkort, Luis
AU - Eller, Alexander M.
AU - Francis, Anthony
AU - Kaczmarek, Olaf
AU - Mazur, Lukas
AU - Moore, Guy D.
AU - Shu, Hai-Tao
ID - 46119
IS - 1
JF - Physical Review D
SN - 2470-0010
TI - Viscosity of pure-glue QCD from the lattice
VL - 108
ER -
TY - JOUR
AB - While FPGA accelerator boards and their respective high-level design tools are maturing, there is still a lack of multi-FPGA applications, libraries, and not least, benchmarks and reference implementations towards sustained HPC usage of these devices. As in the early days of GPUs in HPC, for workloads that can reasonably be decoupled into loosely coupled working sets, multi-accelerator support can be achieved by using standard communication interfaces like MPI on the host side. However, for performance and productivity, some applications can profit from a tighter coupling of the accelerators. FPGAs offer unique opportunities here when extending the dataflow characteristics to their communication interfaces.
In this work, we extend the HPCC FPGA benchmark suite by multi-FPGA support and three missing benchmarks that particularly characterize or stress inter-device communication: b_eff, PTRANS, and LINPACK. With all benchmarks implemented for current boards with Intel and Xilinx FPGAs, we established a baseline for multi-FPGA performance. Additionally, for the communication-centric benchmarks, we explored the potential of direct FPGA-to-FPGA communication with a circuit-switched inter-FPGA network that is currently only available for one of the boards. The evaluation with parallel execution on up to 26 FPGA boards makes use of one of the largest academic FPGA installations.
AU - Meyer, Marius
AU - Kenter, Tobias
AU - Plessl, Christian
ID - 38041
JF - ACM Transactions on Reconfigurable Technology and Systems
KW - General Computer Science
SN - 1936-7406
TI - Multi-FPGA Designs and Scaling of HPC Challenge Benchmarks via MPI and Circuit-Switched Inter-FPGA Networks
ER -
TY - CHAP
AU - Hansmeier, Tim
AU - Kenter, Tobias
AU - Meyer, Marius
AU - Riebler, Heinrich
AU - Platzner, Marco
AU - Plessl, Christian
ED - Haake, Claus-Jochen
ED - Meyer auf der Heide, Friedhelm
ED - Platzner, Marco
ED - Wachsmuth, Henning
ED - Wehrheim, Heike
ID - 45893
T2 - On-The-Fly Computing -- Individualized IT-services in dynamic markets
TI - Compute Centers I: Heterogeneous Execution Environments
VL - 412
ER -
TY - CONF
AU - Opdenhövel, Jan-Oliver
AU - Plessl, Christian
AU - Kenter, Tobias
ID - 46190
T2 - Proceedings of the 13th International Symposium on Highly Efficient Accelerators and Reconfigurable Technologies
TI - Mutation Tree Reconstruction of Tumor Cells on FPGAs Using a Bit-Level Matrix Representation
ER -
TY - CONF
AU - Faj, Jennifer
AU - Kenter, Tobias
AU - Faghih-Naini, Sara
AU - Plessl, Christian
AU - Aizinger, Vadym
ID - 46188
T2 - Proceedings of the Platform for Advanced Scientific Computing Conference
TI - Scalable Multi-FPGA Design of a Discontinuous Galerkin Shallow-Water Model on Unstructured Meshes
ER -
TY - CONF
AU - Prouveur, Charles
AU - Haefele, Matthieu
AU - Kenter, Tobias
AU - Voss, Nils
ID - 46189
T2 - Proceedings of the Platform for Advanced Scientific Computing Conference
TI - FPGA Acceleration for HPC Supercapacitor Simulations
ER -
TY - CONF
AB - The computation of electron repulsion integrals (ERIs) over Gaussian-type orbitals (GTOs) is a challenging problem in quantum-mechanics-based atomistic simulations. In practical simulations, several trillions of ERIs may have to be
computed for every time step.
In this work, we investigate FPGAs as accelerators for the ERI computation. We use template parameters, here within the Intel oneAPI tool flow, to create customized designs for 256 different ERI quartet classes, based on their orbitals. To maximize data reuse, all intermediates are buffered in FPGA on-chip memory with customized layout. The pre-calculation of intermediates also helps to overcome data dependencies caused by multi-dimensional recurrence
relations. The involved loop structures are partially or even fully unrolled for high throughput of FPGA kernels. Furthermore, a lossy compression algorithm utilizing arbitrary bitwidth integers is integrated in the FPGA kernels. To our
best knowledge, this is the first work on ERI computation on FPGAs that supports more than just the single most basic quartet class. Also, the integration of ERI computation and compression it a novelty that is not even covered by CPU or GPU libraries so far.
Our evaluation shows that using 16-bit integer for the ERI compression, the fastest FPGA kernels exceed the performance of 10 GERIS ($10 \times 10^9$ ERIs per second) on one Intel Stratix 10 GX 2800 FPGA, with maximum absolute errors around $10^{-7}$ - $10^{-5}$ Hartree. The measured throughput can be accurately explained by a performance model. The FPGA kernels deployed on 2 FPGAs outperform similar computations using the widely used libint reference on a two-socket server with 40 Xeon Gold 6148 CPU cores of the same process technology by factors up to 6.0x and on a new two-socket server with 128 EPYC 7713 CPU cores by up to 1.9x.
AU - Wu, Xin
AU - Kenter, Tobias
AU - Schade, Robert
AU - Kühne, Thomas
AU - Plessl, Christian
ID - 43228
T2 - 2023 IEEE 31st Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)
TI - Computing and Compressing Electron Repulsion Integrals on FPGAs
ER -
TY - JOUR
AB - The non-orthogonal local submatrix method applied to electronic structure–based molecular dynamics simulations is shown to exceed 1.1 EFLOP/s in FP16/FP32-mixed floating-point arithmetic when using 4400 NVIDIA A100 GPUs of the Perlmutter system. This is enabled by a modification of the original method that pushes the sustained fraction of the peak performance to about 80%. Example calculations are performed for SARS-CoV-2 spike proteins with up to 83 million atoms.
AU - Schade, Robert
AU - Kenter, Tobias
AU - Elgabarty, Hossam
AU - Lass, Michael
AU - Kühne, Thomas
AU - Plessl, Christian
ID - 45361
JF - The International Journal of High Performance Computing Applications
KW - Hardware and Architecture
KW - Theoretical Computer Science
KW - Software
SN - 1094-3420
TI - Breaking the exascale barrier for the electronic structure problem in ab-initio molecular dynamics
ER -
TY - GEN
AB - Viscous hydrodynamics serves as a successful mesoscopic description of the
Quark-Gluon Plasma produced in relativistic heavy-ion collisions. In order to
investigate, how such an effective description emerges from the underlying
microscopic dynamics we calculate the hydrodynamic and non-hydrodynamic modes
of linear response in the sound channel from a first-principle calculation in
kinetic theory. We do this with a new approach wherein we discretize the
collision kernel to directly calculate eigenvalues and eigenmodes of the
evolution operator. This allows us to study the Green's functions at any point
in the complex frequency space. Our study focuses on scalar theory with quartic
interaction and we find that the analytic structure of Green's functions in the
complex plane is far more complicated than just poles or cuts which is a first
step towards an equivalent study in QCD kinetic theory.
AU - Ochsenfeld, Stephan
AU - Schlichting, Sören
ID - 50172
T2 - arXiv:2308.04491
TI - Hydrodynamic and Non-hydrodynamic Excitations in Kinetic Theory -- A Numerical Analysis in Scalar Field Theory
ER -
TY - GEN
AB - Memory Gym presents a suite of 2D partially observable environments, namely
Mortar Mayhem, Mystery Path, and Searing Spotlights, designed to benchmark
memory capabilities in decision-making agents. These environments, originally
with finite tasks, are expanded into innovative, endless formats, mirroring the
escalating challenges of cumulative memory games such as ``I packed my bag''.
This progression in task design shifts the focus from merely assessing sample
efficiency to also probing the levels of memory effectiveness in dynamic,
prolonged scenarios. To address the gap in available memory-based Deep
Reinforcement Learning baselines, we introduce an implementation that
integrates Transformer-XL (TrXL) with Proximal Policy Optimization. This
approach utilizes TrXL as a form of episodic memory, employing a sliding window
technique. Our comparative study between the Gated Recurrent Unit (GRU) and
TrXL reveals varied performances across different settings. TrXL, on the finite
environments, demonstrates superior sample efficiency in Mystery Path and
outperforms in Mortar Mayhem. However, GRU is more efficient on Searing
Spotlights. Most notably, in all endless tasks, GRU makes a remarkable
resurgence, consistently outperforming TrXL by significant margins. Website and
Source Code: https://github.com/MarcoMeter/endless-memory-gym/
AU - Pleines, Marco
AU - Pallasch, Matthias
AU - Zimmer, Frank
AU - Preuss, Mike
ID - 50221
T2 - arXiv:2309.17207
TI - Memory Gym: Towards Endless Tasks to Benchmark Memory Capabilities of Agents
ER -
TY - CHAP
AU - Alt, Christoph
AU - Kenter, Tobias
AU - Faghih-Naini, Sara
AU - Faj, Jennifer
AU - Opdenhövel, Jan-Oliver
AU - Plessl, Christian
AU - Aizinger, Vadym
AU - Hönig, Jan
AU - Köstler, Harald
ID - 46191
SN - 0302-9743
T2 - Lecture Notes in Computer Science
TI - Shallow Water DG Simulations on FPGAs: Design and Comparison of a Novel Code Generation Pipeline
ER -
TY - GEN
AB - This preprint makes the claim of having computed the $9^{th}$ Dedekind
Number. This was done by building an efficient FPGA Accelerator for the core
operation of the process, and parallelizing it on the Noctua 2 Supercluster at
Paderborn University. The resulting value is
286386577668298411128469151667598498812366. This value can be verified in two
steps. We have made the data file containing the 490M results available, each
of which can be verified separately on CPU, and the whole file sums to our
proposed value.
AU - Van Hirtum, Lennart
AU - De Causmaecker, Patrick
AU - Goemaere, Jens
AU - Kenter, Tobias
AU - Riebler, Heinrich
AU - Lass, Michael
AU - Plessl, Christian
ID - 43439
T2 - arXiv:2304.03039
TI - A computation of D(9) using FPGA Supercomputing
ER -
TY - GEN
AB - We investigate the early time development of the anisotropic transverse flow
and spatial eccentricities of a fireball with various particle-based transport
approaches using a fixed initial condition. In numerical simulations ranging
from the quasi-collisionless case to the hydrodynamic regime, we find that the
onset of $v_n$ and of related measures of anisotropic flow can be described
with a simple power-law ansatz, with an exponent that depends on the amount of
rescatterings in the system. In the few-rescatterings regime we perform
semi-analytical calculations, based on a systematic expansion in powers of time
and the cross section, which can reproduce the numerical findings.
AU - Borghini, Nicolas
AU - Borrell, Marc
AU - Roch, Hendrik
ID - 32177
T2 - arXiv:2201.13294
TI - Early time behavior of spatial and momentum anisotropies in kinetic theory across different Knudsen numbers
ER -
TY - GEN
AB - We test the ability of the "escape mechanism" to create the anisotropic flow
observed in high-energy nuclear collisions. We compare the flow harmonics $v_n$
in the few-rescatterings regime from two types of transport simulations, with
$2\to 2$ and $2\to 0$ collision kernels respectively, and from analytical
calculations neglecting the gain term of the Boltzmann equation. We find that
the even flow harmonics are similar in the three approaches, while the odd
harmonics differ significantly.
AU - Bachmann, Benedikt
AU - Borghini, Nicolas
AU - Feld, Nina
AU - Roch, Hendrik
ID - 32178
T2 - arXiv:2203.13306
TI - Even anisotropic-flow harmonics are from Venus, odd ones are from Mars
ER -
TY - JOUR
AU - Hou, W
AU - Yao, Y
AU - Li, Y
AU - Peng, B
AU - Shi, K
AU - Zhou, Z
AU - Pan, J
AU - Liu, M
AU - Hu, J
ID - 32183
IS - 1
JF - Frontiers of materials science
SN - 2095-025x
TI - Linearly shifting ferromagnetic resonance response of La0.7Sr0.3MnO3 thin film for body temperature sensors
VL - 16
ER -
TY - JOUR
AU - Wojciechowski, M
ID - 32234
JF - Data Brief
SN - 2352-3409
TI - Dataset for random uniform distributions of 2D circles and 3D spheres.
VL - 43
ER -
TY - THES
AU - Lass, Michael
ID - 32414
TI - Bringing Massive Parallelism and Hardware Acceleration to Linear Scaling Density Functional Theory Through Targeted Approximations
ER -
TY - GEN
AB - The Julia programming language has evolved into a modern alternative to fill existing gaps in scientific computing and data science applications. Julia leverages a unified and coordinated single-language and ecosystem paradigm and has a proven track record of achieving high performance without sacrificing user productivity. These aspects make Julia a viable alternative to high-performance computing's (HPC's) existing and increasingly costly many-body workflow composition strategy in which traditional HPC languages (e.g., Fortran, C, C++) are used for simulations, and higher-level languages (e.g., Python, R, MATLAB) are used for data analysis and interactive computing. Julia's rapid growth in language capabilities, package ecosystem, and community make it a promising universal language for HPC. This paper presents the views of a multidisciplinary group of researchers from academia, government, and industry that advocate for an HPC software development paradigm that emphasizes developer productivity, workflow portability, and low barriers for entry. We believe that the Julia programming language, its ecosystem, and its community provide modern and powerful capabilities that enable this group's objectives. Crucially, we believe that Julia can provide a feasible and less costly approach to programming scientific applications and workflows that target HPC facilities. In this work, we examine the current practice and role of Julia as a common, end-to-end programming model to address major challenges in scientific reproducibility, data-driven AI/machine learning, co-design and workflows, scalability and performance portability in heterogeneous computing, network communication, data management, and community education. As a result, the diversification of current investments to fulfill the needs of the upcoming decade is crucial as more supercomputing centers prepare for the exascale era.
AU - Churavy, Valentin
AU - Godoy, William F
AU - Bauer, Carsten
AU - Ranocha, Hendrik
AU - Schlottke-Lakemper, Michael
AU - Räss, Ludovic
AU - Blaschke, Johannes
AU - Giordano, Mosè
AU - Schnetter, Erik
AU - Omlin, Samuel
AU - Vetter, Jeffrey S
AU - Edelman, Alan
ID - 36879
TI - Bridging HPC Communities through the Julia Programming Language
ER -
TY - JOUR
AB - AbstractTailored nanoscale quantum light sources, matching the specific needs of use cases, are crucial building blocks for photonic quantum technologies. Several different approaches to realize solid-state quantum emitters with high performance have been pursued and different concepts for energy tuning have been established. However, the properties of the emitted photons are always defined by the individual quantum emitter and can therefore not be controlled with full flexibility. Here we introduce an all-optical nonlinear method to tailor and control the single photon emission. We demonstrate a laser-controlled down-conversion process from an excited state of a semiconductor quantum three-level system. Based on this concept, we realize energy tuning and polarization control of the single photon emission with a control-laser field. Our results mark an important step towards tailored single photon emission from a photonic quantum system based on quantum optical principles.
AU - Jonas, B.
AU - Heinze, Dirk Florian
AU - Schöll, E.
AU - Kallert, P.
AU - Langer, T.
AU - Krehs, S.
AU - Widhalm, A.
AU - Jöns, Klaus
AU - Reuter, Dirk
AU - Schumacher, Stefan
AU - Zrenner, Artur
ID - 40523
IS - 1
JF - Nature Communications
KW - General Physics and Astronomy
KW - General Biochemistry
KW - Genetics and Molecular Biology
KW - General Chemistry
KW - Multidisciplinary
SN - 2041-1723
TI - Nonlinear down-conversion in a single quantum dot
VL - 13
ER -
TY - JOUR
AU - Altenkort, Luis
AU - Eller, Alexander M.
AU - Kaczmarek, O.
AU - Mazur, Lukas
AU - Moore, Guy D.
AU - Shu, Hai-Tao
ID - 46121
IS - 9
JF - Physical Review D
SN - 2470-0010
TI - Lattice QCD noise reduction for bosonic correlators through blocking
VL - 105
ER -
TY - GEN
AB - Electronic structure calculations have been instrumental in providing many
important insights into a range of physical and chemical properties of various
molecular and solid-state systems. Their importance to various fields,
including materials science, chemical sciences, computational chemistry and
device physics, is underscored by the large fraction of available public
supercomputing resources devoted to these calculations. As we enter the
exascale era, exciting new opportunities to increase simulation numbers, sizes,
and accuracies present themselves. In order to realize these promises, the
community of electronic structure software developers will however first have
to tackle a number of challenges pertaining to the efficient use of new
architectures that will rely heavily on massive parallelism and hardware
accelerators. This roadmap provides a broad overview of the state-of-the-art in
electronic structure calculations and of the various new directions being
pursued by the community. It covers 14 electronic structure codes, presenting
their current status, their development priorities over the next five years,
and their plans towards tackling the challenges and leveraging the
opportunities presented by the advent of exascale computing.
AU - Gavini, Vikram
AU - Baroni, Stefano
AU - Blum, Volker
AU - Bowler, David R.
AU - Buccheri, Alexander
AU - Chelikowsky, James R.
AU - Das, Sambit
AU - Dawson, William
AU - Delugas, Pietro
AU - Dogan, Mehmet
AU - Draxl, Claudia
AU - Galli, Giulia
AU - Genovese, Luigi
AU - Giannozzi, Paolo
AU - Giantomassi, Matteo
AU - Gonze, Xavier
AU - Govoni, Marco
AU - Gulans, Andris
AU - Gygi, François
AU - Herbert, John M.
AU - Kokott, Sebastian
AU - Kühne, Thomas
AU - Liou, Kai-Hsin
AU - Miyazaki, Tsuyoshi
AU - Motamarri, Phani
AU - Nakata, Ayako
AU - Pask, John E.
AU - Plessl, Christian
AU - Ratcliff, Laura E.
AU - Richard, Ryan M.
AU - Rossi, Mariana
AU - Schade, Robert
AU - Scheffler, Matthias
AU - Schütt, Ole
AU - Suryanarayana, Phanish
AU - Torrent, Marc
AU - Truflandier, Lionel
AU - Windus, Theresa L.
AU - Xu, Qimen
AU - Yu, Victor W. -Z.
AU - Perez, Danny
ID - 33493
T2 - arXiv:2209.12747
TI - Roadmap on Electronic Structure Codes in the Exascale Era
ER -
TY - CONF
AU - Karp, Martin
AU - Podobas, Artur
AU - Kenter, Tobias
AU - Jansson, Niclas
AU - Plessl, Christian
AU - Schlatter, Philipp
AU - Markidis, Stefano
ID - 46193
T2 - International Conference on High Performance Computing in Asia-Pacific Region
TI - A High-Fidelity Flow Solver for Unstructured Meshes on Field-Programmable Gate Arrays: Design, Evaluation, and Future Challenges
ER -
TY - GEN
AB - The CP2K program package, which can be considered as the swiss army knife of
atomistic simulations, is presented with a special emphasis on ab-initio
molecular dynamics using the second-generation Car-Parrinello method. After
outlining current and near-term development efforts with regards to massively
parallel low-scaling post-Hartree-Fock and eigenvalue solvers, novel approaches
on how we plan to take full advantage of future low-precision hardware
architectures are introduced. Our focus here is on combining our submatrix
method with the approximate computing paradigm to address the immanent exascale
era.
AU - Kühne, Thomas
AU - Plessl, Christian
AU - Schade, Robert
AU - Schütt, Ole
ID - 32404
T2 - arXiv:2205.14741
TI - CP2K on the road to exascale
ER -
TY - JOUR
AB - A parallel hybrid quantum-classical algorithm for the solution of the quantum-chemical ground-state energy problem on gate-based quantum computers is presented. This approach is based on the reduced density-matrix functional theory (RDMFT) formulation of the electronic structure problem. For that purpose, the density-matrix functional of the full system is decomposed into an indirectly coupled sum of density-matrix functionals for all its subsystems using the adaptive cluster approximation to RDMFT. The approximations involved in the decomposition and the adaptive cluster approximation itself can be systematically converged to the exact result. The solutions for the density-matrix functionals of the effective subsystems involves a constrained minimization over many-particle states that are approximated by parametrized trial states on the quantum computer similarly to the variational quantum eigensolver. The independence of the density-matrix functionals of the effective subsystems introduces a new level of parallelization and allows for the computational treatment of much larger molecules on a quantum computer with a given qubit count. In addition, for the proposed algorithm techniques are presented to reduce the qubit count, the number of quantum programs, as well as its depth. The evaluation of a density-matrix functional as the essential part of our approach is demonstrated for Hubbard-like systems on IBM quantum computers based on superconducting transmon qubits.
AU - Schade, Robert
AU - Bauer, Carsten
AU - Tamoev, Konstantin
AU - Mazur, Lukas
AU - Plessl, Christian
AU - Kühne, Thomas
ID - 33226
JF - Phys. Rev. Research
TI - Parallel quantum chemistry on noisy intermediate-scale quantum computers
VL - 4
ER -
TY - GEN
AB - Electronic structure calculations have been instrumental in providing many
important insights into a range of physical and chemical properties of various
molecular and solid-state systems. Their importance to various fields,
including materials science, chemical sciences, computational chemistry and
device physics, is underscored by the large fraction of available public
supercomputing resources devoted to these calculations. As we enter the
exascale era, exciting new opportunities to increase simulation numbers, sizes,
and accuracies present themselves. In order to realize these promises, the
community of electronic structure software developers will however first have
to tackle a number of challenges pertaining to the efficient use of new
architectures that will rely heavily on massive parallelism and hardware
accelerators. This roadmap provides a broad overview of the state-of-the-art in
electronic structure calculations and of the various new directions being
pursued by the community. It covers 14 electronic structure codes, presenting
their current status, their development priorities over the next five years,
and their plans towards tackling the challenges and leveraging the
opportunities presented by the advent of exascale computing.
AU - Gavini, Vikram
AU - Baroni, Stefano
AU - Blum, Volker
AU - Bowler, David R.
AU - Buccheri, Alexander
AU - Chelikowsky, James R.
AU - Das, Sambit
AU - Dawson, William
AU - Delugas, Pietro
AU - Dogan, Mehmet
AU - Draxl, Claudia
AU - Galli, Giulia
AU - Genovese, Luigi
AU - Giannozzi, Paolo
AU - Giantomassi, Matteo
AU - Gonze, Xavier
AU - Govoni, Marco
AU - Gulans, Andris
AU - Gygi, François
AU - Herbert, John M.
AU - Kokott, Sebastian
AU - Kühne, Thomas
AU - Liou, Kai-Hsin
AU - Miyazaki, Tsuyoshi
AU - Motamarri, Phani
AU - Nakata, Ayako
AU - Pask, John E.
AU - Plessl, Christian
AU - Ratcliff, Laura E.
AU - Richard, Ryan M.
AU - Rossi, Mariana
AU - Schade, Robert
AU - Scheffler, Matthias
AU - Schütt, Ole
AU - Suryanarayana, Phanish
AU - Torrent, Marc
AU - Truflandier, Lionel
AU - Windus, Theresa L.
AU - Xu, Qimen
AU - Yu, Victor W. -Z.
AU - Perez, Danny
ID - 46275
T2 - arXiv:2209.12747
TI - Roadmap on Electronic Structure Codes in the Exascale Era
ER -
TY - JOUR
AU - Schade, Robert
AU - Kenter, Tobias
AU - Elgabarty, Hossam
AU - Lass, Michael
AU - Schütt, Ole
AU - Lazzaro, Alfio
AU - Pabst, Hans
AU - Mohr, Stephan
AU - Hutter, Jürg
AU - Kühne, Thomas
AU - Plessl, Christian
ID - 33684
JF - Parallel Computing
KW - Artificial Intelligence
KW - Computer Graphics and Computer-Aided Design
KW - Computer Networks and Communications
KW - Hardware and Architecture
KW - Theoretical Computer Science
KW - Software
SN - 0167-8191
TI - Towards electronic structure-based ab-initio molecular dynamics simulations with hundreds of millions of atoms
VL - 111
ER -
TY - JOUR
AU - Meyer, Marius
AU - Kenter, Tobias
AU - Plessl, Christian
ID - 27364
JF - Journal of Parallel and Distributed Computing
SN - 0743-7315
TI - In-depth FPGA Accelerator Performance Evaluation with Single Node Benchmarks from the HPC Challenge Benchmark Suite for Intel and Xilinx FPGAs using OpenCL
ER -
TY - JOUR
AB - Recent advances in numerical methods significantly pushed forward the
understanding of electrons coupled to quantized lattice vibrations. At this
stage, it becomes increasingly important to also account for the effects of
physically inevitable environments. In particular, we study the transport
properties of the Hubbard-Holstein Hamiltonian that models a large class of
materials characterized by strong electron-phonon coupling, in contact with a
dissipative environment. Even in the one-dimensional and isolated case,
simulating the quantum dynamics of such a system with high accuracy is very
challenging due to the infinite dimensionality of the phononic Hilbert spaces.
For this reason, the effects of dissipation on the conductance properties of
such systems have not been investigated systematically so far. We combine the
non-Markovian hierarchy of pure states method and the Markovian quantum jumps
method with the newly introduced projected purified density-matrix
renormalization group, creating powerful tensor-network methods for dissipative
quantum many-body systems. Investigating their numerical properties, we find a
significant speedup up to a factor $\sim 30$ compared to conventional
tensor-network techniques. We apply these methods to study dissipative
quenches, aiming for an in-depth understanding of the formation, stability, and
quasi-particle properties of bipolarons. Surprisingly, our results show that in
the metallic phase dissipation localizes the bipolarons, which is reminiscent
of an indirect quantum Zeno effect. However, the bipolaronic binding energy
remains mainly unaffected, even in the presence of strong dissipation,
exhibiting remarkable bipolaron stability. These findings shed light on the
problem of designing real materials exhibiting phonon-mediated
high-$T_\mathrm{C}$ superconductivity.
AU - Moroder, Mattia
AU - Grundner, Martin
AU - Damanet, François
AU - Schollwöck, Ulrich
AU - Mardazad, Sam
AU - Flannigan, Stuart
AU - Köhler, Thomas
AU - Paeckel, Sebastian
ID - 50146
JF - Physical Review B 107, 214310 (2023)
TI - Stable bipolarons in open quantum systems
ER -
TY - JOUR
AB - We develop a general decomposition of an ensemble of initial density profiles
in terms of an average state and a basis of modes that represent the
event-by-event fluctuations of the initial state. The basis is determined such
that the probability distributions of the amplitudes of different modes are
uncorrelated. Based on this decomposition, we quantify the different types and
probabilities of event-by-event fluctuations in Glauber and Saturation models
and investigate how the various modes affect different characteristics of the
initial state. We perform simulations of the dynamical evolution with KoMPoST
and MUSIC to investigate the impact of the modes on final-state observables and
their correlations.
AU - Borghini, Nicolas
AU - Borrell, Marc
AU - Feld, Nina
AU - Roch, Hendrik
AU - Schlichting, Sören
AU - Werthmann, Clemens
ID - 50148
JF - Phys. Rev. C 107 (2023) 034905
TI - Statistical analysis of initial state and final state response in heavy-ion collisions
ER -
TY - JOUR
AB - Abstract
RNA editing processes are strikingly different in animals and plants. Up to thousands of specific cytidines are converted into uridines in plant chloroplasts and mitochondria whereas up to millions of adenosines are converted into inosines in animal nucleo-cytosolic RNAs. It is unknown whether these two different RNA editing machineries are mutually incompatible. RNA-binding pentatricopeptide repeat (PPR) proteins are the key factors of plant organelle cytidine-to-uridine RNA editing. The complete absence of PPR mediated editing of cytosolic RNAs might be due to a yet unknown barrier that prevents its activity in the cytosol. Here, we transferred two plant mitochondrial PPR-type editing factors into human cell lines to explore whether they could operate in the nucleo-cytosolic environment. PPR56 and PPR65 not only faithfully edited their native, co-transcribed targets but also different sets of off-targets in the human background transcriptome. More than 900 of such off-targets with editing efficiencies up to 91%, largely explained by known PPR-RNA binding properties, were identified for PPR56. Engineering two crucial amino acid positions in its PPR array led to predictable shifts in target recognition. We conclude that plant PPR editing factors can operate in the entirely different genetic environment of the human nucleo-cytosol and can be intentionally re-engineered towards new targets.
AU - Lesch, Elena
AU - Schilling, Maximilian T
AU - Brenner, Sarah
AU - Yang, Yingying
AU - Gruss, Oliver J
AU - Knoop, Volker
AU - Schallenberg-Rüdinger, Mareike
ID - 50149
IS - 17
JF - Nucleic Acids Research
KW - Genetics
SN - 0305-1048
TI - Plant mitochondrial RNA editing factors can perform targeted C-to-U editing of nuclear transcripts in human cells
VL - 50
ER -
TY - JOUR
AB - N-body methods are one of the essential algorithmic building blocks of high-performance and parallel computing. Previous research has shown promising performance for implementing n-body simulations with pairwise force calculations on FPGAs. However, to avoid challenges with accumulation and memory access patterns, the presented designs calculate each pair of forces twice, along with both force sums of the involved particles. Also, they require large problem instances with hundreds of thousands of particles to reach their respective peak performance, limiting the applicability for strong scaling scenarios. This work addresses both issues by presenting a novel FPGA design that uses each calculated force twice and overlaps data transfers and computations in a way that allows to reach peak performance even for small problem instances, outperforming previous single precision results even in double precision, and scaling linearly over multiple interconnected FPGAs. For a comparison across architectures, we provide an equally optimized CPU reference, which for large problems actually achieves higher peak performance per device, however, given the strong scaling advantages of the FPGA design, in parallel setups with few thousand particles per device, the FPGA platform achieves highest performance and power efficiency.
AU - Menzel, Johannes
AU - Plessl, Christian
AU - Kenter, Tobias
ID - 28099
IS - 1
JF - ACM Transactions on Reconfigurable Technology and Systems
SN - 1936-7406
TI - The Strong Scaling Advantage of FPGAs in HPC for N-body Simulations
VL - 15
ER -
TY - CONF
AU - Meyer, Marius
ID - 27365
T2 - Proceedings of the 11th International Symposium on Highly Efficient Accelerators and Reconfigurable Technologies
TI - Towards Performance Characterization of FPGAs in Context of HPC using OpenCL Benchmarks
ER -
TY - CONF
AU - Nickchen, Tobias
AU - Heindorf, Stefan
AU - Engels, Gregor
ID - 20886
T2 - Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision
TI - Generating Physically Sound Training Data for Image Recognition of Additively Manufactured Parts
ER -
TY - JOUR
AB - Abstract
The defining feature of active particles is that they constantly propel themselves by locally converting chemical energy into directed motion. This active self-propulsion prevents them from equilibrating with their thermal environment (e.g. an aqueous solution), thus keeping them permanently out of equilibrium. Nevertheless, the spatial dynamics of active particles might share certain equilibrium features, in particular in the steady state. We here focus on the time-reversal symmetry of individual spatial trajectories as a distinct equilibrium characteristic. We investigate to what extent the steady-state trajectories of a trapped active particle obey or break this time-reversal symmetry. Within the framework of active Ornstein–Uhlenbeck particles we find that the steady-state trajectories in a harmonic potential fulfill path-wise time-reversal symmetry exactly, while this symmetry is typically broken in anharmonic potentials.
AU - Dabelow, Lennart
AU - Bo, Stefano
AU - Eichhorn, Ralf
ID - 32243
IS - 3
JF - Journal of Statistical Mechanics: Theory and Experiment
KW - Statistics
KW - Probability and Uncertainty
KW - Statistics and Probability
KW - Statistical and Nonlinear Physics
SN - 1742-5468
TI - How irreversible are steady-state trajectories of a trapped active particle?
VL - 2021
ER -
TY - GEN
AB - We push the boundaries of electronic structure-based \textit{ab-initio}
molecular dynamics (AIMD) beyond 100 million atoms. This scale is otherwise
barely reachable with classical force-field methods or novel neural network and
machine learning potentials. We achieve this breakthrough by combining
innovations in linear-scaling AIMD, efficient and approximate sparse linear
algebra, low and mixed-precision floating-point computation on GPUs, and a
compensation scheme for the errors introduced by numerical approximations. The
core of our work is the non-orthogonalized local submatrix method (NOLSM),
which scales very favorably to massively parallel computing systems and
translates large sparse matrix operations into highly parallel, dense matrix
operations that are ideally suited to hardware accelerators. We demonstrate
that the NOLSM method, which is at the center point of each AIMD step, is able
to achieve a sustained performance of 324 PFLOP/s in mixed FP16/FP32 precision
corresponding to an efficiency of 67.7% when running on 1536 NVIDIA A100 GPUs.
AU - Schade, Robert
AU - Kenter, Tobias
AU - Elgabarty, Hossam
AU - Lass, Michael
AU - Schütt, Ole
AU - Lazzaro, Alfio
AU - Pabst, Hans
AU - Mohr, Stephan
AU - Hutter, Jürg
AU - Kühne, Thomas D.
AU - Plessl, Christian
ID - 32244
T2 - arXiv:2104.08245
TI - Towards Electronic Structure-Based Ab-Initio Molecular Dynamics Simulations with Hundreds of Millions of Atoms
ER -
TY - GEN
AB - Optical travelling wave antennas offer unique opportunities to control and
selectively guide light into a specific direction which renders them as
excellent candidates for optical communication and sensing. These applications
require state of the art engineering to reach optimized functionalities such as
high directivity and radiation efficiency, low side lobe level, broadband and
tunable capabilities, and compact design. In this work we report on the
numerical optimization of the directivity of optical travelling wave antennas
made from low-loss dielectric materials using full-wave numerical simulations
in conjunction with a particle swarm optimization algorithm. The antennas are
composed of a reflector and a director deposited on a glass substrate and an
emitter placed in the feed gap between them serves as an internal source of
excitation. In particular, we analysed antennas with rectangular- and
horn-shaped directors made of either Hafnium dioxide or Silicon. The optimized
antennas produce highly directional emission due to the presence of two
dominant guided TE modes in the director in addition to leaky modes. These
guided modes dominate the far-field emission pattern and govern the direction
of the main lobe emission which predominately originates from the end facet of
the director. Our work also provides a comprehensive analysis of the modes,
radiation patterns, parametric influences, and bandwidths of the antennas that
highlights their robust nature.
AU - Farheen, Henna
AU - Leuteritz, Till
AU - Linden, Stefan
AU - Myroshnychenko, Viktor
AU - Förstner, Jens
ID - 32245
T2 - arXiv:2106.02468
TI - Optimization of optical waveguide antennas for directive emission of light
ER -
TY - GEN
AB - The interaction between quantum light and matter is being intensively studied
for systems that are enclosed in high-$Q$ cavities which strongly enhance the
light-matter coupling. However, for many applications, cavities with lower
$Q$-factors are preferred due to the increased spectral width of the cavity
mode. Here, we investigate the interaction between quantum light and matter
represented by a $\Lambda$-type three-level system in lossy cavities, assuming
that cavity losses are the dominant loss mechanism. We demonstrate that cavity
losses lead to non-trivial steady states of the electronic occupations that can
be controlled by the loss rate and the initial statistics of the quantum
fields. The mechanism of formation of such steady states can be understood on
the basis of the equations of motion. Analytical expressions for steady states
and their numerical simulations are presented and discussed.
AU - Rose, H.
AU - Tikhonova, O. V.
AU - Meier, T.
AU - Sharapova, P.
ID - 32236
T2 - arXiv:2109.00842
TI - Steady states of $Λ$-type three-level systems excited by quantum light in lossy cavities
ER -
TY - JOUR
AU - Kaczmarek, Olaf
AU - Mazur, Lukas
AU - Sharma, Sayantan
ID - 46122
IS - 9
JF - Physical Review D
SN - 2470-0010
TI - Eigenvalue spectra of QCD and the fate of UA(1) breaking towards the chiral limit
VL - 104
ER -
TY - JOUR
AU - Altenkort, Luis
AU - Eller, Alexander M.
AU - Kaczmarek, O.
AU - Mazur, Lukas
AU - Moore, Guy D.
AU - Shu, H.-T.
ID - 46124
IS - 1
JF - Physical Review D
SN - 2470-0010
TI - Heavy quark momentum diffusion from the lattice using gradient flow
VL - 103
ER -
TY - JOUR
AU - Altenkort, Luis
AU - Eller, Alexander M.
AU - Kaczmarek, O.
AU - Mazur, Lukas
AU - Moore, Guy D.
AU - Shu, H.-T.
ID - 46123
IS - 11
JF - Physical Review D
SN - 2470-0010
TI - Sphaleron rate from Euclidean lattice correlators: An exploration
VL - 103
ER -
TY - CONF
AU - Kenter, Tobias
AU - Shambhu, Adesh
AU - Faghih-Naini, Sara
AU - Aizinger, Vadym
ID - 46194
T2 - Proceedings of the Platform for Advanced Scientific Computing Conference
TI - Algorithm-hardware co-design of a discontinuous Galerkin shallow-water model for a dataflow architecture on FPGA
ER -
TY - CONF
AU - Karp, Martin
AU - Podobas, Artur
AU - Jansson, Niclas
AU - Kenter, Tobias
AU - Plessl, Christian
AU - Schlatter, Philipp
AU - Markidis, Stefano
ID - 46195
T2 - 2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS)
TI - High-Performance Spectral Element Methods on Field-Programmable Gate Arrays : Implementation, Evaluation, and Future Projection
ER -
TY - CHAP
AB - Solving partial differential equations on unstructured grids is a cornerstone of engineering and scientific computing. Nowadays, heterogeneous parallel platforms with CPUs, GPUs, and FPGAs enable energy-efficient and computationally demanding simulations. We developed the HighPerMeshes C++-embedded Domain-Specific Language (DSL) for bridging the abstraction gap between the mathematical and algorithmic formulation of mesh-based algorithms for PDE problems on the one hand and an increasing number of heterogeneous platforms with their different parallel programming and runtime models on the other hand. Thus, the HighPerMeshes DSL aims at higher productivity in the code development process for multiple target platforms. We introduce the concepts as well as the basic structure of the HighPerMeshes DSL, and demonstrate its usage with three examples, a Poisson and monodomain problem, respectively, solved by the continuous finite element method, and the discontinuous Galerkin method for Maxwell’s equation. The mapping of the abstract algorithmic description onto parallel hardware, including distributed memory compute clusters, is presented. Finally, the achievable performance and scalability are demonstrated for a typical example problem on a multi-core CPU cluster.
AU - Alhaddad, Samer
AU - Förstner, Jens
AU - Groth, Stefan
AU - Grünewald, Daniel
AU - Grynko, Yevgen
AU - Hannig, Frank
AU - Kenter, Tobias
AU - Pfreundt, Franz-Josef
AU - Plessl, Christian
AU - Schotte, Merlind
AU - Steinke, Thomas
AU - Teich, Jürgen
AU - Weiser, Martin
AU - Wende, Florian
ID - 21587
KW - tet_topic_hpc
SN - 0302-9743
T2 - Euro-Par 2020: Parallel Processing Workshops
TI - HighPerMeshes – A Domain-Specific Language for Numerical Algorithms on Unstructured Grids
ER -
TY - CHAP
AU - Ramaswami, Arjun
AU - Kenter, Tobias
AU - Kühne, Thomas
AU - Plessl, Christian
ID - 29936
SN - 0302-9743
T2 - Applied Reconfigurable Computing. Architectures, Tools, and Applications
TI - Evaluating the Design Space for Offloading 3D FFT Calculations to an FPGA for High-Performance Computing
ER -
TY - JOUR
AU - Alhaddad, Samer
AU - Förstner, Jens
AU - Groth, Stefan
AU - Grünewald, Daniel
AU - Grynko, Yevgen
AU - Hannig, Frank
AU - Kenter, Tobias
AU - Pfreundt, Franz‐Josef
AU - Plessl, Christian
AU - Schotte, Merlind
AU - Steinke, Thomas
AU - Teich, Jürgen
AU - Weiser, Martin
AU - Wende, Florian
ID - 24788
JF - Concurrency and Computation: Practice and Experience
KW - tet_topic_hpc
SN - 1532-0626
TI - The HighPerMeshes framework for numerical algorithms on unstructured grids
ER -
TY - JOUR
AB -
The effect of traces of ethanol in supercritical carbon dioxide on the mixture's thermodynamic properties is studied by molecular simulations and Taylor dispersion measurements.
AU - Chatwell, René Spencer
AU - Guevara-Carrion, Gabriela
AU - Gaponenko, Yuri
AU - Shevtsova, Valentina
AU - Vrabec, Jadran
ID - 32240
IS - 4
JF - Physical Chemistry Chemical Physics
KW - Physical and Theoretical Chemistry
KW - General Physics and Astronomy
SN - 1463-9076
TI - Diffusion of the carbon dioxide–ethanol mixture in the extended critical region
VL - 23
ER -
TY - CONF
AU - Karp, Martin
AU - Podobas, Artur
AU - Jansson, Niclas
AU - Kenter, Tobias
AU - Plessl, Christian
AU - Schlatter, Philipp
AU - Markidis, Stefano
ID - 29937
T2 - 2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS)
TI - High-Performance Spectral Element Methods on Field-Programmable Gate Arrays : Implementation, Evaluation, and Future Projection
ER -
TY - CHAP
AU - Nickchen, Tobias
AU - Engels, Gregor
AU - Lohn, Johannes
ID - 18789
SN - 9783030543334
T2 - Industrializing Additive Manufacturing
TI - Opportunities of 3D Machine Learning for Manufacturability Analysis and Component Recognition in the Additive Manufacturing Process Chain
ER -
TY - JOUR
AB - State-of-the-art methods in materials science such as artificial intelligence and data-driven techniques advance the investigation of photovoltaic materials.
AU - Mirhosseini, Hossein
AU - Kormath Madam Raghupathy, Ramya
AU - Sahoo, Sudhir K.
AU - Wiebeler, Hendrik
AU - Chugh, Manjusha
AU - Kühne, Thomas D.
ID - 32246
IS - 46
JF - Physical Chemistry Chemical Physics
KW - Physical and Theoretical Chemistry
KW - General Physics and Astronomy
SN - 1463-9076
TI - In silico investigation of Cu(In,Ga)Se2-based solar cells
VL - 22
ER -