TY - JOUR AB - The non-orthogonal local submatrix method applied to electronic structure–based molecular dynamics simulations is shown to exceed 1.1 EFLOP/s in FP16/FP32-mixed floating-point arithmetic when using 4400 NVIDIA A100 GPUs of the Perlmutter system. This is enabled by a modification of the original method that pushes the sustained fraction of the peak performance to about 80%. Example calculations are performed for SARS-CoV-2 spike proteins with up to 83 million atoms. AU - Schade, Robert AU - Kenter, Tobias AU - Elgabarty, Hossam AU - Lass, Michael AU - Kühne, Thomas AU - Plessl, Christian ID - 45361 JF - The International Journal of High Performance Computing Applications KW - Hardware and Architecture KW - Theoretical Computer Science KW - Software SN - 1094-3420 TI - Breaking the exascale barrier for the electronic structure problem in ab-initio molecular dynamics ER - TY - JOUR AB - While FPGA accelerator boards and their respective high-level design tools are maturing, there is still a lack of multi-FPGA applications, libraries, and not least, benchmarks and reference implementations towards sustained HPC usage of these devices. As in the early days of GPUs in HPC, for workloads that can reasonably be decoupled into loosely coupled working sets, multi-accelerator support can be achieved by using standard communication interfaces like MPI on the host side. However, for performance and productivity, some applications can profit from a tighter coupling of the accelerators. FPGAs offer unique opportunities here when extending the dataflow characteristics to their communication interfaces. In this work, we extend the HPCC FPGA benchmark suite by multi-FPGA support and three missing benchmarks that particularly characterize or stress inter-device communication: b_eff, PTRANS, and LINPACK. With all benchmarks implemented for current boards with Intel and Xilinx FPGAs, we established a baseline for multi-FPGA performance. Additionally, for the communication-centric benchmarks, we explored the potential of direct FPGA-to-FPGA communication with a circuit-switched inter-FPGA network that is currently only available for one of the boards. The evaluation with parallel execution on up to 26 FPGA boards makes use of one of the largest academic FPGA installations. AU - Meyer, Marius AU - Kenter, Tobias AU - Plessl, Christian ID - 38041 JF - ACM Transactions on Reconfigurable Technology and Systems KW - General Computer Science SN - 1936-7406 TI - Multi-FPGA Designs and Scaling of HPC Challenge Benchmarks via MPI and Circuit-Switched Inter-FPGA Networks ER - TY - JOUR AB - The rise of exascale supercomputers has fueled competition among GPU vendors, driving lattice QCD developers to write code that supports multiple APIs. Moreover, new developments in algorithms and physics research require frequent updates to existing software. These challenges have to be balanced against constantly changing personnel. At the same time, there is a wide range of applications for HISQ fermions in QCD studies. This situation encourages the development of software featuring a HISQ action that is flexible, high-performing, open source, easy to use, and easy to adapt. In this technical paper, we explain the design strategy, provide implementation details, list available algorithms and modules, and show key performance indicators for SIMULATeQCD, a simple multi-GPU lattice code for large-scale QCD calculations, mainly developed and used by the HotQCD collaboration. The code is publicly available on GitHub. AU - Mazur, Lukas AU - Bollweg, Dennis AU - Clarke, David A. AU - Altenkort, Luis AU - Kaczmarek, Olaf AU - Larsen, Rasmus AU - Shu, Hai-Tao AU - Goswami, Jishnu AU - Scior, Philipp AU - Sandmeyer, Hauke AU - Neumann, Marius AU - Dick, Henrik AU - Ali, Sajid AU - Kim, Jangho AU - Schmidt, Christian AU - Petreczky, Peter AU - Mukherjee, Swagato ID - 46120 JF - Computer Physics Communications TI - SIMULATeQCD: A simple multi-GPU lattice code for QCD calculations ER - TY - JOUR AU - Altenkort, Luis AU - Eller, Alexander M. AU - Francis, Anthony AU - Kaczmarek, Olaf AU - Mazur, Lukas AU - Moore, Guy D. AU - Shu, Hai-Tao ID - 46119 IS - 1 JF - Physical Review D SN - 2470-0010 TI - Viscosity of pure-glue QCD from the lattice VL - 108 ER - TY - JOUR AU - Wojciechowski, M ID - 32234 JF - Data Brief SN - 2352-3409 TI - Dataset for random uniform distributions of 2D circles and 3D spheres. VL - 43 ER - TY - JOUR AU - Meyer, Marius AU - Kenter, Tobias AU - Plessl, Christian ID - 27364 JF - Journal of Parallel and Distributed Computing SN - 0743-7315 TI - In-depth FPGA Accelerator Performance Evaluation with Single Node Benchmarks from the HPC Challenge Benchmark Suite for Intel and Xilinx FPGAs using OpenCL ER - TY - JOUR AU - Altenkort, Luis AU - Eller, Alexander M. AU - Kaczmarek, O. AU - Mazur, Lukas AU - Moore, Guy D. AU - Shu, Hai-Tao ID - 46121 IS - 9 JF - Physical Review D SN - 2470-0010 TI - Lattice QCD noise reduction for bosonic correlators through blocking VL - 105 ER - TY - JOUR AU - Hou, W AU - Yao, Y AU - Li, Y AU - Peng, B AU - Shi, K AU - Zhou, Z AU - Pan, J AU - Liu, M AU - Hu, J ID - 32183 IS - 1 JF - Frontiers of materials science SN - 2095-025x TI - Linearly shifting ferromagnetic resonance response of La0.7Sr0.3MnO3 thin film for body temperature sensors VL - 16 ER - TY - JOUR AB - AbstractTailored nanoscale quantum light sources, matching the specific needs of use cases, are crucial building blocks for photonic quantum technologies. Several different approaches to realize solid-state quantum emitters with high performance have been pursued and different concepts for energy tuning have been established. However, the properties of the emitted photons are always defined by the individual quantum emitter and can therefore not be controlled with full flexibility. Here we introduce an all-optical nonlinear method to tailor and control the single photon emission. We demonstrate a laser-controlled down-conversion process from an excited state of a semiconductor quantum three-level system. Based on this concept, we realize energy tuning and polarization control of the single photon emission with a control-laser field. Our results mark an important step towards tailored single photon emission from a photonic quantum system based on quantum optical principles. AU - Jonas, B. AU - Heinze, Dirk Florian AU - Schöll, E. AU - Kallert, P. AU - Langer, T. AU - Krehs, S. AU - Widhalm, A. AU - Jöns, Klaus AU - Reuter, Dirk AU - Schumacher, Stefan AU - Zrenner, Artur ID - 40523 IS - 1 JF - Nature Communications KW - General Physics and Astronomy KW - General Biochemistry KW - Genetics and Molecular Biology KW - General Chemistry KW - Multidisciplinary SN - 2041-1723 TI - Nonlinear down-conversion in a single quantum dot VL - 13 ER - TY - JOUR AB - A parallel hybrid quantum-classical algorithm for the solution of the quantum-chemical ground-state energy problem on gate-based quantum computers is presented. This approach is based on the reduced density-matrix functional theory (RDMFT) formulation of the electronic structure problem. For that purpose, the density-matrix functional of the full system is decomposed into an indirectly coupled sum of density-matrix functionals for all its subsystems using the adaptive cluster approximation to RDMFT. The approximations involved in the decomposition and the adaptive cluster approximation itself can be systematically converged to the exact result. The solutions for the density-matrix functionals of the effective subsystems involves a constrained minimization over many-particle states that are approximated by parametrized trial states on the quantum computer similarly to the variational quantum eigensolver. The independence of the density-matrix functionals of the effective subsystems introduces a new level of parallelization and allows for the computational treatment of much larger molecules on a quantum computer with a given qubit count. In addition, for the proposed algorithm techniques are presented to reduce the qubit count, the number of quantum programs, as well as its depth. The evaluation of a density-matrix functional as the essential part of our approach is demonstrated for Hubbard-like systems on IBM quantum computers based on superconducting transmon qubits. AU - Schade, Robert AU - Bauer, Carsten AU - Tamoev, Konstantin AU - Mazur, Lukas AU - Plessl, Christian AU - Kühne, Thomas ID - 33226 JF - Phys. Rev. Research TI - Parallel quantum chemistry on noisy intermediate-scale quantum computers VL - 4 ER -