TY - JOUR
AB - The rise of exascale supercomputers has fueled competition among GPU vendors, driving lattice QCD developers to write code that supports multiple APIs. Moreover, new developments in algorithms and physics research require frequent updates to existing software. These challenges have to be balanced against constantly changing personnel. At the same time, there is a wide range of applications for HISQ fermions in QCD studies. This situation encourages the development of software featuring a HISQ action that is flexible, high-performing, open source, easy to use, and easy to adapt. In this technical paper, we explain the design strategy, provide implementation details, list available algorithms and modules, and show key performance indicators for SIMULATeQCD, a simple multi-GPU lattice code for large-scale QCD calculations, mainly developed and used by the HotQCD collaboration. The code is publicly available on GitHub.
AU - Mazur, Lukas
AU - Bollweg, Dennis
AU - Clarke, David A.
AU - Altenkort, Luis
AU - Kaczmarek, Olaf
AU - Larsen, Rasmus
AU - Shu, Hai-Tao
AU - Goswami, Jishnu
AU - Scior, Philipp
AU - Sandmeyer, Hauke
AU - Neumann, Marius
AU - Dick, Henrik
AU - Ali, Sajid
AU - Kim, Jangho
AU - Schmidt, Christian
AU - Petreczky, Peter
AU - Mukherjee, Swagato
ID - 46120
JF - Computer Physics Communications
TI - SIMULATeQCD: A simple multi-GPU lattice code for QCD calculations
ER -
TY - JOUR
AU - Altenkort, Luis
AU - Eller, Alexander M.
AU - Francis, Anthony
AU - Kaczmarek, Olaf
AU - Mazur, Lukas
AU - Moore, Guy D.
AU - Shu, Hai-Tao
ID - 46119
IS - 1
JF - Physical Review D
SN - 2470-0010
TI - Viscosity of pure-glue QCD from the lattice
VL - 108
ER -
TY - JOUR
AB - While FPGA accelerator boards and their respective high-level design tools are maturing, there is still a lack of multi-FPGA applications, libraries, and not least, benchmarks and reference implementations towards sustained HPC usage of these devices. As in the early days of GPUs in HPC, for workloads that can reasonably be decoupled into loosely coupled working sets, multi-accelerator support can be achieved by using standard communication interfaces like MPI on the host side. However, for performance and productivity, some applications can profit from a tighter coupling of the accelerators. FPGAs offer unique opportunities here when extending the dataflow characteristics to their communication interfaces.
In this work, we extend the HPCC FPGA benchmark suite by multi-FPGA support and three missing benchmarks that particularly characterize or stress inter-device communication: b_eff, PTRANS, and LINPACK. With all benchmarks implemented for current boards with Intel and Xilinx FPGAs, we established a baseline for multi-FPGA performance. Additionally, for the communication-centric benchmarks, we explored the potential of direct FPGA-to-FPGA communication with a circuit-switched inter-FPGA network that is currently only available for one of the boards. The evaluation with parallel execution on up to 26 FPGA boards makes use of one of the largest academic FPGA installations.
AU - Meyer, Marius
AU - Kenter, Tobias
AU - Plessl, Christian
ID - 38041
JF - ACM Transactions on Reconfigurable Technology and Systems
KW - General Computer Science
SN - 1936-7406
TI - Multi-FPGA Designs and Scaling of HPC Challenge Benchmarks via MPI and Circuit-Switched Inter-FPGA Networks
ER -
TY - JOUR
AB - The non-orthogonal local submatrix method applied to electronic structure–based molecular dynamics simulations is shown to exceed 1.1 EFLOP/s in FP16/FP32-mixed floating-point arithmetic when using 4400 NVIDIA A100 GPUs of the Perlmutter system. This is enabled by a modification of the original method that pushes the sustained fraction of the peak performance to about 80%. Example calculations are performed for SARS-CoV-2 spike proteins with up to 83 million atoms.
AU - Schade, Robert
AU - Kenter, Tobias
AU - Elgabarty, Hossam
AU - Lass, Michael
AU - Kühne, Thomas
AU - Plessl, Christian
ID - 45361
JF - The International Journal of High Performance Computing Applications
KW - Hardware and Architecture
KW - Theoretical Computer Science
KW - Software
SN - 1094-3420
TI - Breaking the exascale barrier for the electronic structure problem in ab-initio molecular dynamics
ER -
TY - JOUR
AU - Hou, W
AU - Yao, Y
AU - Li, Y
AU - Peng, B
AU - Shi, K
AU - Zhou, Z
AU - Pan, J
AU - Liu, M
AU - Hu, J
ID - 32183
IS - 1
JF - Frontiers of materials science
SN - 2095-025x
TI - Linearly shifting ferromagnetic resonance response of La0.7Sr0.3MnO3 thin film for body temperature sensors
VL - 16
ER -
TY - JOUR
AU - Wojciechowski, M
ID - 32234
JF - Data Brief
SN - 2352-3409
TI - Dataset for random uniform distributions of 2D circles and 3D spheres.
VL - 43
ER -
TY - JOUR
AB - AbstractTailored nanoscale quantum light sources, matching the specific needs of use cases, are crucial building blocks for photonic quantum technologies. Several different approaches to realize solid-state quantum emitters with high performance have been pursued and different concepts for energy tuning have been established. However, the properties of the emitted photons are always defined by the individual quantum emitter and can therefore not be controlled with full flexibility. Here we introduce an all-optical nonlinear method to tailor and control the single photon emission. We demonstrate a laser-controlled down-conversion process from an excited state of a semiconductor quantum three-level system. Based on this concept, we realize energy tuning and polarization control of the single photon emission with a control-laser field. Our results mark an important step towards tailored single photon emission from a photonic quantum system based on quantum optical principles.
AU - Jonas, B.
AU - Heinze, Dirk Florian
AU - Schöll, E.
AU - Kallert, P.
AU - Langer, T.
AU - Krehs, S.
AU - Widhalm, A.
AU - Jöns, Klaus
AU - Reuter, Dirk
AU - Schumacher, Stefan
AU - Zrenner, Artur
ID - 40523
IS - 1
JF - Nature Communications
KW - General Physics and Astronomy
KW - General Biochemistry
KW - Genetics and Molecular Biology
KW - General Chemistry
KW - Multidisciplinary
SN - 2041-1723
TI - Nonlinear down-conversion in a single quantum dot
VL - 13
ER -
TY - JOUR
AU - Altenkort, Luis
AU - Eller, Alexander M.
AU - Kaczmarek, O.
AU - Mazur, Lukas
AU - Moore, Guy D.
AU - Shu, Hai-Tao
ID - 46121
IS - 9
JF - Physical Review D
SN - 2470-0010
TI - Lattice QCD noise reduction for bosonic correlators through blocking
VL - 105
ER -
TY - JOUR
AB - A parallel hybrid quantum-classical algorithm for the solution of the quantum-chemical ground-state energy problem on gate-based quantum computers is presented. This approach is based on the reduced density-matrix functional theory (RDMFT) formulation of the electronic structure problem. For that purpose, the density-matrix functional of the full system is decomposed into an indirectly coupled sum of density-matrix functionals for all its subsystems using the adaptive cluster approximation to RDMFT. The approximations involved in the decomposition and the adaptive cluster approximation itself can be systematically converged to the exact result. The solutions for the density-matrix functionals of the effective subsystems involves a constrained minimization over many-particle states that are approximated by parametrized trial states on the quantum computer similarly to the variational quantum eigensolver. The independence of the density-matrix functionals of the effective subsystems introduces a new level of parallelization and allows for the computational treatment of much larger molecules on a quantum computer with a given qubit count. In addition, for the proposed algorithm techniques are presented to reduce the qubit count, the number of quantum programs, as well as its depth. The evaluation of a density-matrix functional as the essential part of our approach is demonstrated for Hubbard-like systems on IBM quantum computers based on superconducting transmon qubits.
AU - Schade, Robert
AU - Bauer, Carsten
AU - Tamoev, Konstantin
AU - Mazur, Lukas
AU - Plessl, Christian
AU - Kühne, Thomas
ID - 33226
JF - Phys. Rev. Research
TI - Parallel quantum chemistry on noisy intermediate-scale quantum computers
VL - 4
ER -
TY - JOUR
AU - Schade, Robert
AU - Kenter, Tobias
AU - Elgabarty, Hossam
AU - Lass, Michael
AU - Schütt, Ole
AU - Lazzaro, Alfio
AU - Pabst, Hans
AU - Mohr, Stephan
AU - Hutter, Jürg
AU - Kühne, Thomas
AU - Plessl, Christian
ID - 33684
JF - Parallel Computing
KW - Artificial Intelligence
KW - Computer Graphics and Computer-Aided Design
KW - Computer Networks and Communications
KW - Hardware and Architecture
KW - Theoretical Computer Science
KW - Software
SN - 0167-8191
TI - Towards electronic structure-based ab-initio molecular dynamics simulations with hundreds of millions of atoms
VL - 111
ER -