TY  - JOUR
AU  - Krenz, Marvin
AU  - Gerstmann, Uwe
AU  - Schmidt, Wolf Gero
ID  - 54865
IS  - 7
JF  - Physical Review Letters
SN  - 0031-9007
TI  - Defect-Assisted Exciton Transfer across the Tetracene-Si(111):H Interface
VL  - 132
ER  - 
TY  - JOUR
AU  - Schäfer, F.
AU  - Trautmann, A.
AU  - Ngo, C.
AU  - Steiner, J. T.
AU  - Fuchs, C.
AU  - Volz, K.
AU  - Dobener, F.
AU  - Stein, M.
AU  - Meier, Torsten
AU  - Chatterjee, S.
ID  - 55267
IS  - 7
JF  - Physical Review B
SN  - 2469-9950
TI  - Optical Stark effect in type-II semiconductor heterostructures
VL  - 109
ER  - 
TY  - JOUR
AB  - <jats:title>Abstract</jats:title><jats:p>Most properties of solid materials are defined by their internal electric field and charge density distributions which so far are difficult to measure with high spatial resolution. Especially for 2D materials, the atomic electric fields influence the optoelectronic properties. In this study, the atomic‐scale electric field and charge density distribution of WSe<jats:sub>2</jats:sub> bi‐ and trilayers are revealed using an emerging microscopy technique, differential phase contrast (DPC) imaging in scanning transmission electron microscopy (STEM). For pristine material, a higher positive charge density located at the selenium atomic columns compared to the tungsten atomic columns is obtained and tentatively explained by a coherent scattering effect. Furthermore, the change in the electric field distribution induced by a missing selenium atomic column is investigated. A characteristic electric field distribution in the vicinity of the defect with locally reduced magnitudes compared to the pristine lattice is observed. This effect is accompanied by a considerable inward relaxation of the surrounding lattice, which according to first principles DFT calculation is fully compatible with a missing column of Se atoms. This shows that DPC imaging, as an electric field sensitive technique, provides additional and remarkable information to the otherwise only structural analysis obtained with conventional STEM imaging.</jats:p>
AU  - Groll, Maja
AU  - Bürger, Julius
AU  - Caltzidis, Ioannis
AU  - Jöns, Klaus D.
AU  - Schmidt, Wolf Gero
AU  - Gerstmann, Uwe
AU  - Lindner, Jörg K. N.
ID  - 54868
JF  - Small
SN  - 1613-6810
TI  - DFT‐Assisted Investigation of the Electric Field and Charge Density Distribution of Pristine and Defective 2D WSe<sub>2</sub> by Differential Phase Contrast Imaging
ER  - 
TY  - CHAP
AB  - <jats:title>Abstract</jats:title><jats:p>Most FPGA boards in the HPC domain are well-suited for parallel scaling because of the direct integration of versatile and high-throughput network ports. However, the utilization of their network capabilities is often challenging and error-prone because the whole network stack and communication patterns have to be implemented and managed on the FPGAs. Also, this approach conceptually involves a trade-off between the performance potential of improved communication and the impact of resource consumption for communication infrastructure, since the utilized resources on the FPGAs could otherwise be used for computations. In this work, we investigate this trade-off, firstly, by using synthetic benchmarks to evaluate the different configuration options of the communication framework ACCL and their impact on communication latency and throughput. Finally, we use our findings to implement a shallow water simulation whose scalability heavily depends on low-latency communication. With a suitable configuration of ACCL, good scaling behavior can be shown to all 48 FPGAs installed in the system. Overall, the results show that the availability of inter-FPGA communication frameworks as well as the configurability of framework and network stack are crucial to achieve the best application performance with low latency communication.</jats:p>
AU  - Meyer, Marius
AU  - Kenter, Tobias
AU  - Petrica, Lucian
AU  - O’Brien, Kenneth
AU  - Blott, Michaela
AU  - Plessl, Christian
ID  - 56606
SN  - 0302-9743
T2  - Lecture Notes in Computer Science
TI  - Optimizing Communication for Latency Sensitive HPC Applications on up to 48 FPGAs Using ACCL
ER  - 
TY  - CONF
AU  - Opdenhövel, Jan-Oliver
AU  - Alt, Christoph
AU  - Plessl, Christian
AU  - Kenter, Tobias
ID  - 56605
T2  - 2024 34th International Conference on Field-Programmable Logic and Applications (FPL)
TI  - StencilStream: A SYCL-based Stencil Simulation Framework Targeting FPGAs
ER  - 
TY  - CONF
AU  - Tareen, Abdul Rehman
AU  - Meyer, Marius
AU  - Plessl, Christian
AU  - Kenter, Tobias
ID  - 56607
T2  - 2024 IEEE 32nd Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)
TI  - HiHiSpMV: Sparse Matrix Vector Multiplication with Hierarchical Row Reductions on FPGAs with High Bandwidth Memory
VL  - 35
ER  - 
TY  - CONF
AB  - The computation of electron repulsion integrals (ERIs) is a key component for quantum chemical methods. The intensive computation and bandwidth demand for ERI evaluation presents a significant challenge for quantum-mechanics-based atomistic simulations with hybrid density functional theory: due to the tens of trillions of ERI computations in each time step, practical applications are usually limited to thousands of atoms. In this work, we propose SERI, a high-throughput streaming accelerator for ERI computation on HBM-based FPGAs. In contrast to prior buffer-based designs, SERI proposes a novel streaming architecture to address the on-chip buffer limitation and the floorplanning challenge, and leverages the high-bandwidth memory to overcome the bandwidth bottleneck in prior designs. Moreover, to meet the varying computation, bandwidth, and floorplanning requirements between the 55 canonical quartet classes in ERI calculation, we design an automation tool, together with an accurate performance model, to automatically customize the architecture and floorplanning strategy for each canonical quartet class to maximize their throughput. Our performance evaluation on the AMD/Xilinx Alveo U280 FPGA board shows that, SERI achieves an average speedup of 9.80 x over the previous best-performing FPGA design, a 3.21x speedup over a 64-core AMD EPYC 7713 CPU, and a 15.64x speedup over an Nvidia A40 GPU. It reaches a peak throughput of 23.8 GERIS ($10^9$ ERIs per second) on one Alveo U280 FPGA. SERI will be released soon at https://github.com/SFU-HiAccel/SERI.
AU  - Stachura, Philip
AU  - Li, Guanyu
AU  - Wu, Xin
AU  - Plessl, Christian
AU  - Fang, Zhenman
ID  - 56609
T2  - 2024 34th International Conference on Field-Programmable Logic and Applications (FPL)
TI  - SERI: High-Throughput Streaming Acceleration of Electron Repulsion Integral Computation in Quantum Chemistry using HBM-based FPGAs
ER  - 
TY  - CONF
AU  - Büttner, Markus
AU  - Alt, Christoph
AU  - Kenter, Tobias
AU  - Köstler, Harald
AU  - Plessl, Christian
AU  - Aizinger, Vadym
ID  - 54312
T2  - Proceedings of the Platform for Advanced Scientific Computing Conference (PASC)
TI  - Enabling Performance Portability for Shallow Water Equations on CPUs, GPUs, and FPGAs with SYCL
ER  - 
TY  - JOUR
AB  - <jats:p>
            This manuscript makes the claim of having computed the
            <jats:inline-formula content-type="math/tex">
              <jats:tex-math notation="LaTeX" version="MathJax">\(9\)</jats:tex-math>
            </jats:inline-formula>
            th Dedekind number, D(9). This was done by accelerating the core operation of the process with an efficient FPGA design that outperforms an optimized 64-core CPU reference by 95
            <jats:inline-formula content-type="math/tex">
              <jats:tex-math notation="LaTeX" version="MathJax">\(\times\)</jats:tex-math>
            </jats:inline-formula>
            . The FPGA execution was parallelized on the Noctua 2 supercomputer at Paderborn University. The resulting value for D(9) is 286386577668298411128469151667598498812366. This value can be verified in two steps. We have made the data file containing the 490 M results available, each of which can be verified separately on CPU, and the whole file sums to our proposed value. The paper explains the mathematical approach in the first part, before putting the focus on a deep dive into the FPGA accelerator implementation followed by a performance analysis. The FPGA implementation was done in Register-Transfer Level using a dual-clock architecture and shows how we achieved an impressive FMax of 450 MHz on the targeted Stratix 10 GX 2,800 FPGAs. The total compute time used was 47,000 FPGA hours.
          </jats:p>
AU  - Van Hirtum, Lennart
AU  - De Causmaecker, Patrick
AU  - Goemaere, Jens
AU  - Kenter, Tobias
AU  - Riebler, Heinrich
AU  - Lass, Michael
AU  - Plessl, Christian
ID  - 56604
IS  - 3
JF  - ACM Transactions on Reconfigurable Technology and Systems
SN  - 1936-7406
TI  - A Computation of the Ninth Dedekind Number Using FPGA Supercomputing
VL  - 17
ER  - 
TY  - JOUR
AB  - <jats:title>Abstract</jats:title>
               <jats:p>Experiments with ultracold atoms in optical lattices usually involve a weak parabolic trapping potential which merely serves to confine the atoms, but otherwise remains negligible. In contrast, we suggest a different class of experiments in which the presence of a stronger trap is an essential part of the set-up. Because the trap-modified on-site energies exhibit a slowly varying level spacing, similar to that of an anharmonic oscillator, an additional time-periodic trap modulation with judiciously chosen parameters creates nonlinear resonances which enable efficient Floquet engineering. We employ a Mathieu approximation for constructing the near-resonant Floquet states in an accurate manner and demonstrate the emergence of effective ground states from the resonant trap eigenstates. Moreover, we show that the population of the Floquet states is strongly affected by the phase of a sudden turn-on of the trap modulation, which leads to significantly modified and rich dynamics. As a guideline for further studies, we argue that the deliberate population of only the resonance-induced effective ground states will allow one to realize Floquet condensates which follow classical periodic orbits, thus providing challenging future perspectives for the investigation of the quantum–classical correspondence.</jats:p>
AU  - Ali, Usman
AU  - Holthaus, Martin
AU  - Meier, Torsten
ID  - 57839
IS  - 12
JF  - New Journal of Physics
SN  - 1367-2630
TI  - Floquet dynamics of ultracold atoms in optical lattices with a parametrically modulated trapping potential
VL  - 26
ER  -