---
_id: '63890'
abstract:
- lang: eng
  text: The computation of highly contracted electron repulsion integrals (ERIs) is
    essential to achieve quantum accuracy in atomistic simulations based on quantum
    mechanics. Its growing computational demands make energy efficiency a critical
    concern. Recent studies demonstrate FPGAs’ superior performance and energy efficiency
    for computing primitive ERIs, but the computation of highly contracted ERIs introduces
    significant algorithmic complexity and new design challenges for FPGA acceleration.In
    this work, we present SORCERI, the first streaming overlay acceleration for highly
    contracted ERI computations on FPGAs. SORCERI introduces a novel streaming Rys
    computing unit to calculate roots and weights of Rys polynomials on-chip, and
    a streaming contraction unit for the contraction of primitive ERIs. This shifts
    the design bottleneck from limited CPU-FPGA communication bandwidth to available
    FPGA computation resources. To address practical deployment challenges for a large
    number of quartet classes, we design three streaming overlays, together with an
    efficient memory transpose optimization, to cover the 21 most commonly used quartet
    classes in realistic atomistic simulations. To address the new computation constraints,
    we use flexible calculation stages with a free-running streaming architecture
    to achieve high DSP utilization and good timing closure.Experiments demonstrate
    that SORCERI achieves an average 5.96x, 1.99x, and 1.16x better performance per
    watt than libint on a 64-core AMD EPYC 7713 CPU, libintx on an Nvidia A40 GPU,
    and SERI, the prior best-performing FPGA design for primitive ERIs. Furthermore,
    SORCERI reaches a peak throughput of 44.11 GERIS (109 ERIs per second) that is
    1.52x, 1.13x, and 1.93x greater than libint, libintx and SERI, respectively. SORCERI
    will be released soon at https://github.com/SFU-HiAccel/SORCERI.
author:
- first_name: Philip
  full_name: Stachura, Philip
  last_name: Stachura
- first_name: Xin
  full_name: Wu, Xin
  id: '77439'
  last_name: Wu
- first_name: Christian
  full_name: Plessl, Christian
  id: '16153'
  last_name: Plessl
  orcid: 0000-0001-5728-9982
- first_name: Zhenman
  full_name: Fang, Zhenman
  last_name: Fang
citation:
  ama: 'Stachura P, Wu X, Plessl C, Fang Z. SORCERI: Streaming Overlay Acceleration
    for Highly Contracted Electron Repulsion Integral Computations in Quantum Chemistry.
    In: <i>Proceedings of the 2026 ACM/SIGDA International Symposium on Field Programmable
    Gate Arrays (FPGA ’26)</i>. Association for Computing Machinery; 2026:224-234.
    doi:<a href="https://doi.org/10.1145/3748173.3779198">10.1145/3748173.3779198</a>'
  apa: 'Stachura, P., Wu, X., Plessl, C., &#38; Fang, Z. (2026). SORCERI: Streaming
    Overlay Acceleration for Highly Contracted Electron Repulsion Integral Computations
    in Quantum Chemistry. <i>Proceedings of the 2026 ACM/SIGDA International Symposium
    on Field Programmable Gate Arrays (FPGA ’26)</i>, 224–234. <a href="https://doi.org/10.1145/3748173.3779198">https://doi.org/10.1145/3748173.3779198</a>'
  bibtex: '@inproceedings{Stachura_Wu_Plessl_Fang_2026, place={New York, NY, USA},
    title={SORCERI: Streaming Overlay Acceleration for Highly Contracted Electron
    Repulsion Integral Computations in Quantum Chemistry}, DOI={<a href="https://doi.org/10.1145/3748173.3779198">10.1145/3748173.3779198</a>},
    booktitle={Proceedings of the 2026 ACM/SIGDA International Symposium on Field
    Programmable Gate Arrays (FPGA ’26)}, publisher={Association for Computing Machinery},
    author={Stachura, Philip and Wu, Xin and Plessl, Christian and Fang, Zhenman},
    year={2026}, pages={224–234} }'
  chicago: 'Stachura, Philip, Xin Wu, Christian Plessl, and Zhenman Fang. “SORCERI:
    Streaming Overlay Acceleration for Highly Contracted Electron Repulsion Integral
    Computations in Quantum Chemistry.” In <i>Proceedings of the 2026 ACM/SIGDA International
    Symposium on Field Programmable Gate Arrays (FPGA ’26)</i>, 224–34. New York,
    NY, USA: Association for Computing Machinery, 2026. <a href="https://doi.org/10.1145/3748173.3779198">https://doi.org/10.1145/3748173.3779198</a>.'
  ieee: 'P. Stachura, X. Wu, C. Plessl, and Z. Fang, “SORCERI: Streaming Overlay Acceleration
    for Highly Contracted Electron Repulsion Integral Computations in Quantum Chemistry,”
    in <i>Proceedings of the 2026 ACM/SIGDA International Symposium on Field Programmable
    Gate Arrays (FPGA ’26)</i>, 2026, pp. 224–234, doi: <a href="https://doi.org/10.1145/3748173.3779198">10.1145/3748173.3779198</a>.'
  mla: 'Stachura, Philip, et al. “SORCERI: Streaming Overlay Acceleration for Highly
    Contracted Electron Repulsion Integral Computations in Quantum Chemistry.” <i>Proceedings
    of the 2026 ACM/SIGDA International Symposium on Field Programmable Gate Arrays
    (FPGA ’26)</i>, Association for Computing Machinery, 2026, pp. 224–34, doi:<a
    href="https://doi.org/10.1145/3748173.3779198">10.1145/3748173.3779198</a>.'
  short: 'P. Stachura, X. Wu, C. Plessl, Z. Fang, in: Proceedings of the 2026 ACM/SIGDA
    International Symposium on Field Programmable Gate Arrays (FPGA ’26), Association
    for Computing Machinery, New York, NY, USA, 2026, pp. 224–234.'
date_created: 2026-02-06T06:43:22Z
date_updated: 2026-02-09T09:16:32Z
department:
- _id: '27'
- _id: '518'
doi: 10.1145/3748173.3779198
keyword:
- electron repulsion integrals
- quantum chemistry
- atomistic simulation
- overlay architecture
- fpga acceleration
language:
- iso: eng
main_file_link:
- url: https://dl.acm.org/doi/10.1145/3748173.3779198
page: 224-234
place: New York, NY, USA
project:
- _id: '52'
  name: Computing Resources Provided by the Paderborn Center for Parallel Computing
publication: Proceedings of the 2026 ACM/SIGDA International Symposium on Field Programmable
  Gate Arrays (FPGA '26)
publication_identifier:
  isbn:
  - '9798400720796'
publication_status: published
publisher: Association for Computing Machinery
status: public
title: 'SORCERI: Streaming Overlay Acceleration for Highly Contracted Electron Repulsion
  Integral Computations in Quantum Chemistry'
type: conference
user_id: '77439'
year: '2026'
...
---
_id: '64071'
abstract:
- lang: eng
  text: Stimulated by the renewed interest and recent developments in semi-empirical
    quantum chemical (SQC) methods for noncovalent interactions, we examine the properties
    of liquid water at ambient conditions by means of molecular dynamics (MD) simulations,
    both with the conventional NDDO-type (neglect of diatomic differential overlap)
    methods, e.g. AM1 and PM6, and with DFTB-type (density-functional tight-binding)
    methods, e.g. DFTB2 and GFN-xTB. Besides the original parameter sets, some specifically
    reparametrized SQC methods (denoted as AM1-W, PM6-fm, and DFTB2-iBi) targeting
    various smaller water systems ranging from molecular clusters to bulk are considered
    as well. The quality of these different SQC methods for describing liquid water
    properties at ambient conditions are assessed by comparison to well-established
    experimental data and also to BLYP-D3 density functional theory-based ab initio
    MD simulations. Our analyses reveal that static and dynamics properties of bulk
    water are poorly described by all considered SQC methods with the original parameters,
    regardless of the underlying theoretical models, with most of the methods suffering
    from too weak hydrogen bonds and hence predicting a far too fluid water with highly
    distorted hydrogen bond kinetics. On the other hand, the reparametrized force-matchcd
    PM6-fm method is shown to be able to quantitatively reproduce the static and dynamic
    features of liquid water, and thus can be used as a computationally efficient
    alternative to electronic structure-based MD simulations for liquid water that
    requires extended length and time scales. DFTB2-iBi predicts a slightly overstructured
    water with reduced fluidity, whereas AM1-W gives an amorphous ice-like structure
    for water at ambient conditions.
author:
- first_name: Xin
  full_name: Wu, Xin
  id: '77439'
  last_name: Wu
- first_name: Hossam
  full_name: Elgabarty, Hossam
  id: '60250'
  last_name: Elgabarty
  orcid: 0000-0002-4945-1481
- first_name: Vahideh
  full_name: Alizadeh, Vahideh
  last_name: Alizadeh
- first_name: Andres
  full_name: Henao Aristizabal, Andres
  id: '67235'
  last_name: Henao Aristizabal
- first_name: Frederik
  full_name: Zysk, Frederik
  id: '14757'
  last_name: Zysk
- first_name: Christian
  full_name: Plessl, Christian
  id: '16153'
  last_name: Plessl
  orcid: 0000-0001-5728-9982
- first_name: Sebastian
  full_name: Ehlert, Sebastian
  last_name: Ehlert
- first_name: Jürg
  full_name: Hutter, Jürg
  last_name: Hutter
- first_name: Thomas D.
  full_name: Kühne, Thomas D.
  id: '49079'
  last_name: Kühne
citation:
  ama: Wu X, Elgabarty H, Alizadeh V, et al. Benchmarking semi-empirical quantum chemical
    methods on liquid water. Published online 2025.
  apa: Wu, X., Elgabarty, H., Alizadeh, V., Henao Aristizabal, A., Zysk, F., Plessl,
    C., Ehlert, S., Hutter, J., &#38; Kühne, T. D. (2025). <i>Benchmarking semi-empirical
    quantum chemical methods on liquid water</i>.
  bibtex: '@article{Wu_Elgabarty_Alizadeh_Henao Aristizabal_Zysk_Plessl_Ehlert_Hutter_Kühne_2025,
    title={Benchmarking semi-empirical quantum chemical methods on liquid water},
    author={Wu, Xin and Elgabarty, Hossam and Alizadeh, Vahideh and Henao Aristizabal,
    Andres and Zysk, Frederik and Plessl, Christian and Ehlert, Sebastian and Hutter,
    Jürg and Kühne, Thomas D.}, year={2025} }'
  chicago: Wu, Xin, Hossam Elgabarty, Vahideh Alizadeh, Andres Henao Aristizabal,
    Frederik Zysk, Christian Plessl, Sebastian Ehlert, Jürg Hutter, and Thomas D.
    Kühne. “Benchmarking Semi-Empirical Quantum Chemical Methods on Liquid Water,”
    2025.
  ieee: X. Wu <i>et al.</i>, “Benchmarking semi-empirical quantum chemical methods
    on liquid water.” 2025.
  mla: Wu, Xin, et al. <i>Benchmarking Semi-Empirical Quantum Chemical Methods on
    Liquid Water</i>. 2025.
  short: X. Wu, H. Elgabarty, V. Alizadeh, A. Henao Aristizabal, F. Zysk, C. Plessl,
    S. Ehlert, J. Hutter, T.D. Kühne, (2025).
date_created: 2026-02-09T09:03:41Z
date_updated: 2026-02-09T09:17:07Z
department:
- _id: '27'
- _id: '2'
language:
- iso: eng
main_file_link:
- url: https://arxiv.org/abs/2503.11867
project:
- _id: '52'
  name: Computing Resources Provided by the Paderborn Center for Parallel Computing
status: public
title: Benchmarking semi-empirical quantum chemical methods on liquid water
type: preprint
user_id: '77439'
year: '2025'
...
---
_id: '62981'
abstract:
- lang: eng
  text: "Otus is a high-performance computing cluster that was launched in 2025 and
    is operated by the Paderborn Center for Parallel Computing (PC2) at Paderborn
    University in Germany. The system is part of the National High Performance Computing
    (NHR) initiative. Otus complements the previous supercomputer Noctua 2, offering
    approximately twice the computing power while retaining the three node types that
    were characteristic of Noctua 2: 1) CPU compute nodes with different memory capacities,
    2) high-end GPU nodes, and 3) HPC-grade FPGA nodes. On the Top500 list, which
    ranks the 500 most powerful supercomputers in the world, Otus is in position 164
    with the CPU partition and in position 255 with the GPU partition (June 2025).
    On the Green500 list, ranking the 500 most energy-efficient supercomputers in
    the world, Otus is in position 5 with the GPU partition (June 2025).\r\n\r\n\r\nThis
    article provides a comprehensive overview of the system in terms of its hardware,
    software, system integration, and its overall integration into the data center
    building to ensure energy-efficient operation. The article aims to provide unique
    insights for scientists using the system and for other centers operating HPC clusters.
    The article will be continuously updated to reflect the latest system setup and
    measurements. "
author:
- first_name: Sadaf
  full_name: Ehtesabi, Sadaf
  id: '116116'
  last_name: Ehtesabi
- first_name: Manoar
  full_name: Hossain, Manoar
  id: '114619'
  last_name: Hossain
  orcid: https://orcid.org/0000-0002-0737-7981
- first_name: Tobias
  full_name: Kenter, Tobias
  id: '3145'
  last_name: Kenter
- first_name: Andreas
  full_name: Krawinkel, Andreas
  id: '15275'
  last_name: Krawinkel
- first_name: Lukas
  full_name: Ostermann, Lukas
  id: '69976'
  last_name: Ostermann
- first_name: Christian
  full_name: Plessl, Christian
  id: '16153'
  last_name: Plessl
  orcid: 0000-0001-5728-9982
- first_name: Heinrich
  full_name: Riebler, Heinrich
  id: '8961'
  last_name: Riebler
- first_name: Stefan
  full_name: Rohde, Stefan
  id: '34009'
  last_name: Rohde
- first_name: Robert
  full_name: Schade, Robert
  id: '75963'
  last_name: Schade
  orcid: 0000-0002-6268-5397
- first_name: Michael
  full_name: Schwarz, Michael
  id: '5312'
  last_name: Schwarz
- first_name: Jens
  full_name: Simon, Jens
  id: '15273'
  last_name: Simon
- first_name: Nils
  full_name: Winnwa, Nils
  id: '61189'
  last_name: Winnwa
- first_name: Alex
  full_name: Wiens, Alex
  id: '23522'
  last_name: Wiens
  orcid: 0000-0003-1764-9773
- first_name: Xin
  full_name: Wu, Xin
  id: '77439'
  last_name: Wu
citation:
  ama: Ehtesabi S, Hossain M, Kenter T, et al. <i>Otus Supercomputer</i>. Vol 1. Paderborn
    Center for Parallel Computing (PC2); 2025. doi:<a href="https://doi.org/10.48550/ARXIV.2512.07401">10.48550/ARXIV.2512.07401</a>
  apa: Ehtesabi, S., Hossain, M., Kenter, T., Krawinkel, A., Ostermann, L., Plessl,
    C., Riebler, H., Rohde, S., Schade, R., Schwarz, M., Simon, J., Winnwa, N., Wiens,
    A., &#38; Wu, X. (2025). <i>Otus Supercomputer</i> (Vol. 1). Paderborn Center
    for Parallel Computing (PC2). <a href="https://doi.org/10.48550/ARXIV.2512.07401">https://doi.org/10.48550/ARXIV.2512.07401</a>
  bibtex: '@book{Ehtesabi_Hossain_Kenter_Krawinkel_Ostermann_Plessl_Riebler_Rohde_Schade_Schwarz_et
    al._2025, place={Paderborn}, series={PC2 Tech­nic­al Re­port Series}, title={Otus
    Supercomputer}, volume={1}, DOI={<a href="https://doi.org/10.48550/ARXIV.2512.07401">10.48550/ARXIV.2512.07401</a>},
    publisher={Paderborn Center for Parallel Computing (PC2)}, author={Ehtesabi, Sadaf
    and Hossain, Manoar and Kenter, Tobias and Krawinkel, Andreas and Ostermann, Lukas
    and Plessl, Christian and Riebler, Heinrich and Rohde, Stefan and Schade, Robert
    and Schwarz, Michael and et al.}, year={2025}, collection={PC2 Tech­nic­al Re­port
    Series} }'
  chicago: 'Ehtesabi, Sadaf, Manoar Hossain, Tobias Kenter, Andreas Krawinkel, Lukas
    Ostermann, Christian Plessl, Heinrich Riebler, et al. <i>Otus Supercomputer</i>.
    Vol. 1. PC2 Tech­nic­al Re­port Series. Paderborn: Paderborn Center for Parallel
    Computing (PC2), 2025. <a href="https://doi.org/10.48550/ARXIV.2512.07401">https://doi.org/10.48550/ARXIV.2512.07401</a>.'
  ieee: 'S. Ehtesabi <i>et al.</i>, <i>Otus Supercomputer</i>, vol. 1. Paderborn:
    Paderborn Center for Parallel Computing (PC2), 2025.'
  mla: Ehtesabi, Sadaf, et al. <i>Otus Supercomputer</i>. Paderborn Center for Parallel
    Computing (PC2), 2025, doi:<a href="https://doi.org/10.48550/ARXIV.2512.07401">10.48550/ARXIV.2512.07401</a>.
  short: S. Ehtesabi, M. Hossain, T. Kenter, A. Krawinkel, L. Ostermann, C. Plessl,
    H. Riebler, S. Rohde, R. Schade, M. Schwarz, J. Simon, N. Winnwa, A. Wiens, X.
    Wu, Otus Supercomputer, Paderborn Center for Parallel Computing (PC2), Paderborn,
    2025.
date_created: 2025-12-09T09:11:04Z
date_updated: 2026-03-25T11:50:31Z
ddc:
- '004'
department:
- _id: '27'
- _id: '518'
doi: 10.48550/ARXIV.2512.07401
file:
- access_level: open_access
  content_type: application/pdf
  creator: deffel
  date_created: 2025-12-09T09:19:12Z
  date_updated: 2026-03-25T11:50:30Z
  file_id: '62982'
  file_name: 2512.07401v1.pdf
  file_size: 4535595
  relation: main_file
file_date_updated: 2026-03-25T11:50:30Z
has_accepted_license: '1'
intvolume: '         1'
keyword:
- Otus
- Supercomputer
- FPGA
- PC2
- Paderborn Center for Parallel Computing
- Noctua 2
- HPC
language:
- iso: eng
oa: '1'
page: '33'
place: Paderborn
publication_status: published
publisher: Paderborn Center for Parallel Computing (PC2)
report_number: PC2TR-2025-1
series_title: PC2 Tech­nic­al Re­port Series
status: public
title: Otus Supercomputer
type: report
user_id: '23522'
volume: 1
year: '2025'
...
---
_id: '53663'
abstract:
- lang: eng
  text: 'Noctua 2 is a supercomputer operated at the Paderborn Center for Parallel
    Computing (PC2) at Paderborn University in Germany. Noctua 2 was inaugurated in
    2022 and is an Atos BullSequana XH2000 system. It consists mainly of three node
    types: 1) CPU Compute nodes with AMD EPYC processors in different main memory
    configurations, 2) GPU nodes with NVIDIA A100 GPUs, and 3) FPGA nodes with Xilinx
    Alveo U280 and Intel Stratix 10 FPGA cards. While CPUs and GPUs are known off-the-shelf
    components in HPC systems, the operation of a large number of FPGA cards from
    different vendors and a dedicated FPGA-to-FPGA network are unique characteristics
    of Noctua 2. This paper describes in detail the overall setup of Noctua 2 and
    gives insights into the operation of the cluster from a hardware, software and
    facility perspective.'
article_type: original
author:
- first_name: Carsten
  full_name: Bauer, Carsten
  id: '90082'
  last_name: Bauer
- first_name: Tobias
  full_name: Kenter, Tobias
  id: '3145'
  last_name: Kenter
- first_name: Michael
  full_name: Lass, Michael
  id: '24135'
  last_name: Lass
  orcid: 0000-0002-5708-7632
- first_name: Lukas
  full_name: Mazur, Lukas
  id: '90492'
  last_name: Mazur
  orcid: ' 0000-0001-6304-7082'
- first_name: Marius
  full_name: Meyer, Marius
  id: '40778'
  last_name: Meyer
- first_name: Holger
  full_name: Nitsche, Holger
  id: '15272'
  last_name: Nitsche
- first_name: Heinrich
  full_name: Riebler, Heinrich
  id: '8961'
  last_name: Riebler
- first_name: Robert
  full_name: Schade, Robert
  id: '75963'
  last_name: Schade
  orcid: 0000-0002-6268-5397
- first_name: Michael
  full_name: Schwarz, Michael
  id: '5312'
  last_name: Schwarz
- first_name: Nils
  full_name: Winnwa, Nils
  id: '61189'
  last_name: Winnwa
- first_name: Alex
  full_name: Wiens, Alex
  id: '23522'
  last_name: Wiens
  orcid: 0000-0003-1764-9773
- first_name: Xin
  full_name: Wu, Xin
  id: '77439'
  last_name: Wu
- first_name: Christian
  full_name: Plessl, Christian
  id: '16153'
  last_name: Plessl
  orcid: 0000-0001-5728-9982
- first_name: Jens
  full_name: Simon, Jens
  id: '15273'
  last_name: Simon
citation:
  ama: Bauer C, Kenter T, Lass M, et al. Noctua 2 Supercomputer. <i>Journal of large-scale
    research facilities</i>. 2024;9. doi:<a href="https://doi.org/10.17815/jlsrf-8-187
    ">10.17815/jlsrf-8-187 </a>
  apa: Bauer, C., Kenter, T., Lass, M., Mazur, L., Meyer, M., Nitsche, H., Riebler,
    H., Schade, R., Schwarz, M., Winnwa, N., Wiens, A., Wu, X., Plessl, C., &#38;
    Simon, J. (2024). Noctua 2 Supercomputer. <i>Journal of Large-Scale Research Facilities</i>,
    <i>9</i>. <a href="https://doi.org/10.17815/jlsrf-8-187 ">https://doi.org/10.17815/jlsrf-8-187
    </a>
  bibtex: '@article{Bauer_Kenter_Lass_Mazur_Meyer_Nitsche_Riebler_Schade_Schwarz_Winnwa_et
    al._2024, title={Noctua 2 Supercomputer}, volume={9}, DOI={<a href="https://doi.org/10.17815/jlsrf-8-187
    ">10.17815/jlsrf-8-187 </a>}, journal={Journal of large-scale research facilities},
    author={Bauer, Carsten and Kenter, Tobias and Lass, Michael and Mazur, Lukas and
    Meyer, Marius and Nitsche, Holger and Riebler, Heinrich and Schade, Robert and
    Schwarz, Michael and Winnwa, Nils and et al.}, year={2024} }'
  chicago: Bauer, Carsten, Tobias Kenter, Michael Lass, Lukas Mazur, Marius Meyer,
    Holger Nitsche, Heinrich Riebler, et al. “Noctua 2 Supercomputer.” <i>Journal
    of Large-Scale Research Facilities</i> 9 (2024). <a href="https://doi.org/10.17815/jlsrf-8-187
    ">https://doi.org/10.17815/jlsrf-8-187 </a>.
  ieee: 'C. Bauer <i>et al.</i>, “Noctua 2 Supercomputer,” <i>Journal of large-scale
    research facilities</i>, vol. 9, 2024, doi: <a href="https://doi.org/10.17815/jlsrf-8-187
    ">10.17815/jlsrf-8-187 </a>.'
  mla: Bauer, Carsten, et al. “Noctua 2 Supercomputer.” <i>Journal of Large-Scale
    Research Facilities</i>, vol. 9, 2024, doi:<a href="https://doi.org/10.17815/jlsrf-8-187
    ">10.17815/jlsrf-8-187 </a>.
  short: C. Bauer, T. Kenter, M. Lass, L. Mazur, M. Meyer, H. Nitsche, H. Riebler,
    R. Schade, M. Schwarz, N. Winnwa, A. Wiens, X. Wu, C. Plessl, J. Simon, Journal
    of Large-Scale Research Facilities 9 (2024).
date_created: 2024-04-26T07:39:41Z
date_updated: 2024-04-26T08:44:30Z
ddc:
- '004'
department:
- _id: '27'
- _id: '518'
doi: '10.17815/jlsrf-8-187 '
file:
- access_level: open_access
  content_type: application/pdf
  creator: deffel
  date_created: 2024-04-26T07:30:20Z
  date_updated: 2024-04-26T08:35:17Z
  file_id: '53664'
  file_name: Noctua2_Supercomputer.pdf
  file_size: 3825480
  relation: main_file
file_date_updated: 2024-04-26T08:35:17Z
has_accepted_license: '1'
intvolume: '         9'
keyword:
- Noctua 2
- Supercomputer
- FPGA
- PC2
- Paderborn Center for Parallel Computing
language:
- iso: eng
oa: '1'
project:
- _id: '52'
  name: 'PC2: Computing Resources Provided by the Paderborn Center for Parallel Computing'
publication: Journal of large-scale research facilities
publication_status: published
status: public
title: Noctua 2 Supercomputer
type: journal_article
user_id: '8961'
volume: 9
year: '2024'
...
---
_id: '56609'
abstract:
- lang: eng
  text: 'The computation of electron repulsion integrals (ERIs) is a key component
    for quantum chemical methods. The intensive computation and bandwidth demand for
    ERI evaluation presents a significant challenge for quantum-mechanics-based atomistic
    simulations with hybrid density functional theory: due to the tens of trillions
    of ERI computations in each time step, practical applications are usually limited
    to thousands of atoms. In this work, we propose SERI, a high-throughput streaming
    accelerator for ERI computation on HBM-based FPGAs. In contrast to prior buffer-based
    designs, SERI proposes a novel streaming architecture to address the on-chip buffer
    limitation and the floorplanning challenge, and leverages the high-bandwidth memory
    to overcome the bandwidth bottleneck in prior designs. Moreover, to meet the varying
    computation, bandwidth, and floorplanning requirements between the 55 canonical
    quartet classes in ERI calculation, we design an automation tool, together with
    an accurate performance model, to automatically customize the architecture and
    floorplanning strategy for each canonical quartet class to maximize their throughput.
    Our performance evaluation on the AMD/Xilinx Alveo U280 FPGA board shows that,
    SERI achieves an average speedup of 9.80 x over the previous best-performing FPGA
    design, a 3.21x speedup over a 64-core AMD EPYC 7713 CPU, and a 15.64x speedup
    over an Nvidia A40 GPU. It reaches a peak throughput of 23.8 GERIS ($10^9$ ERIs
    per second) on one Alveo U280 FPGA. SERI will be released soon at https://github.com/SFU-HiAccel/SERI.'
author:
- first_name: Philip
  full_name: Stachura, Philip
  last_name: Stachura
- first_name: Guanyu
  full_name: Li, Guanyu
  last_name: Li
- first_name: Xin
  full_name: Wu, Xin
  id: '77439'
  last_name: Wu
- first_name: Christian
  full_name: Plessl, Christian
  id: '16153'
  last_name: Plessl
  orcid: 0000-0001-5728-9982
- first_name: Zhenman
  full_name: Fang, Zhenman
  last_name: Fang
citation:
  ama: 'Stachura P, Li G, Wu X, Plessl C, Fang Z. SERI: High-Throughput Streaming
    Acceleration of Electron Repulsion Integral Computation in Quantum Chemistry using
    HBM-based FPGAs. In: <i>2024 34th International Conference on Field-Programmable
    Logic and Applications (FPL)</i>. IEEE; 2024:60-68. doi:<a href="https://doi.org/10.1109/fpl64840.2024.00018">10.1109/fpl64840.2024.00018</a>'
  apa: 'Stachura, P., Li, G., Wu, X., Plessl, C., &#38; Fang, Z. (2024). SERI: High-Throughput
    Streaming Acceleration of Electron Repulsion Integral Computation in Quantum Chemistry
    using HBM-based FPGAs. <i>2024 34th International Conference on Field-Programmable
    Logic and Applications (FPL)</i>, 60–68. <a href="https://doi.org/10.1109/fpl64840.2024.00018">https://doi.org/10.1109/fpl64840.2024.00018</a>'
  bibtex: '@inproceedings{Stachura_Li_Wu_Plessl_Fang_2024, title={SERI: High-Throughput
    Streaming Acceleration of Electron Repulsion Integral Computation in Quantum Chemistry
    using HBM-based FPGAs}, DOI={<a href="https://doi.org/10.1109/fpl64840.2024.00018">10.1109/fpl64840.2024.00018</a>},
    booktitle={2024 34th International Conference on Field-Programmable Logic and
    Applications (FPL)}, publisher={IEEE}, author={Stachura, Philip and Li, Guanyu
    and Wu, Xin and Plessl, Christian and Fang, Zhenman}, year={2024}, pages={60–68}
    }'
  chicago: 'Stachura, Philip, Guanyu Li, Xin Wu, Christian Plessl, and Zhenman Fang.
    “SERI: High-Throughput Streaming Acceleration of Electron Repulsion Integral Computation
    in Quantum Chemistry Using HBM-Based FPGAs.” In <i>2024 34th International Conference
    on Field-Programmable Logic and Applications (FPL)</i>, 60–68. IEEE, 2024. <a
    href="https://doi.org/10.1109/fpl64840.2024.00018">https://doi.org/10.1109/fpl64840.2024.00018</a>.'
  ieee: 'P. Stachura, G. Li, X. Wu, C. Plessl, and Z. Fang, “SERI: High-Throughput
    Streaming Acceleration of Electron Repulsion Integral Computation in Quantum Chemistry
    using HBM-based FPGAs,” in <i>2024 34th International Conference on Field-Programmable
    Logic and Applications (FPL)</i>, 2024, pp. 60–68, doi: <a href="https://doi.org/10.1109/fpl64840.2024.00018">10.1109/fpl64840.2024.00018</a>.'
  mla: 'Stachura, Philip, et al. “SERI: High-Throughput Streaming Acceleration of
    Electron Repulsion Integral Computation in Quantum Chemistry Using HBM-Based FPGAs.”
    <i>2024 34th International Conference on Field-Programmable Logic and Applications
    (FPL)</i>, IEEE, 2024, pp. 60–68, doi:<a href="https://doi.org/10.1109/fpl64840.2024.00018">10.1109/fpl64840.2024.00018</a>.'
  short: 'P. Stachura, G. Li, X. Wu, C. Plessl, Z. Fang, in: 2024 34th International
    Conference on Field-Programmable Logic and Applications (FPL), IEEE, 2024, pp.
    60–68.'
date_created: 2024-10-14T08:44:44Z
date_updated: 2024-10-15T08:37:27Z
department:
- _id: '27'
- _id: '518'
doi: 10.1109/fpl64840.2024.00018
language:
- iso: eng
main_file_link:
- url: https://ieeexplore.ieee.org/document/10705609
page: 60-68
project:
- _id: '52'
  name: 'PC2: Computing Resources Provided by the Paderborn Center for Parallel Computing'
publication: 2024 34th International Conference on Field-Programmable Logic and Applications
  (FPL)
publication_status: published
publisher: IEEE
quality_controlled: '1'
status: public
title: 'SERI: High-Throughput Streaming Acceleration of Electron Repulsion Integral
  Computation in Quantum Chemistry using HBM-based FPGAs'
type: conference
user_id: '77439'
year: '2024'
...
---
_id: '43228'
abstract:
- lang: eng
  text: "The computation of electron repulsion integrals (ERIs) over Gaussian-type
    orbitals (GTOs) is a challenging problem in quantum-mechanics-based atomistic
    simulations. In practical simulations, several trillions of ERIs may have to be\r\ncomputed
    for every time step.\r\nIn this work, we investigate FPGAs as accelerators for
    the ERI computation. We use template parameters, here within the Intel oneAPI
    tool flow, to create customized designs for 256 different ERI quartet classes,
    based on their orbitals. To maximize data reuse, all intermediates are buffered
    in FPGA on-chip memory with customized layout. The pre-calculation of intermediates
    also helps to overcome data dependencies caused by multi-dimensional recurrence\r\nrelations.
    The involved loop structures are partially or even fully unrolled for high throughput
    of FPGA kernels. Furthermore, a lossy compression algorithm utilizing arbitrary
    bitwidth integers is integrated in the FPGA kernels. To our\r\nbest knowledge,
    this is the first work on ERI computation on FPGAs that supports more than just
    the single most basic quartet class. Also, the integration of ERI computation
    and compression it a novelty that is not even covered by CPU or GPU libraries
    so far.\r\nOur evaluation shows that using 16-bit integer for the ERI compression,
    the fastest FPGA kernels exceed the performance of 10 GERIS ($10 \\times 10^9$
    ERIs per second) on one Intel Stratix 10 GX 2800 FPGA, with maximum absolute errors
    around $10^{-7}$ - $10^{-5}$ Hartree. The measured throughput can be accurately
    explained by a performance model. The FPGA kernels deployed on 2 FPGAs outperform
    similar computations using the widely used libint reference on a two-socket server
    with 40 Xeon Gold 6148 CPU cores of the same process technology by factors up
    to 6.0x and on a new two-socket server with 128 EPYC 7713 CPU cores by up to 1.9x."
author:
- first_name: Xin
  full_name: Wu, Xin
  id: '77439'
  last_name: Wu
- first_name: Tobias
  full_name: Kenter, Tobias
  id: '3145'
  last_name: Kenter
- first_name: Robert
  full_name: Schade, Robert
  id: '75963'
  last_name: Schade
  orcid: 0000-0002-6268-539
- first_name: Thomas
  full_name: Kühne, Thomas
  id: '49079'
  last_name: Kühne
- first_name: Christian
  full_name: Plessl, Christian
  id: '16153'
  last_name: Plessl
  orcid: 0000-0001-5728-9982
citation:
  ama: 'Wu X, Kenter T, Schade R, Kühne T, Plessl C. Computing and Compressing Electron
    Repulsion Integrals on FPGAs. In: <i>2023 IEEE 31st Annual International Symposium
    on Field-Programmable Custom Computing Machines (FCCM)</i>. ; 2023:162-173. doi:<a
    href="https://doi.org/10.1109/FCCM57271.2023.00026">10.1109/FCCM57271.2023.00026</a>'
  apa: Wu, X., Kenter, T., Schade, R., Kühne, T., &#38; Plessl, C. (2023). Computing
    and Compressing Electron Repulsion Integrals on FPGAs. <i>2023 IEEE 31st Annual
    International Symposium on Field-Programmable Custom Computing Machines (FCCM)</i>,
    162–173. <a href="https://doi.org/10.1109/FCCM57271.2023.00026">https://doi.org/10.1109/FCCM57271.2023.00026</a>
  bibtex: '@inproceedings{Wu_Kenter_Schade_Kühne_Plessl_2023, title={Computing and
    Compressing Electron Repulsion Integrals on FPGAs}, DOI={<a href="https://doi.org/10.1109/FCCM57271.2023.00026">10.1109/FCCM57271.2023.00026</a>},
    booktitle={2023 IEEE 31st Annual International Symposium on Field-Programmable
    Custom Computing Machines (FCCM)}, author={Wu, Xin and Kenter, Tobias and Schade,
    Robert and Kühne, Thomas and Plessl, Christian}, year={2023}, pages={162–173}
    }'
  chicago: Wu, Xin, Tobias Kenter, Robert Schade, Thomas Kühne, and Christian Plessl.
    “Computing and Compressing Electron Repulsion Integrals on FPGAs.” In <i>2023
    IEEE 31st Annual International Symposium on Field-Programmable Custom Computing
    Machines (FCCM)</i>, 162–73, 2023. <a href="https://doi.org/10.1109/FCCM57271.2023.00026">https://doi.org/10.1109/FCCM57271.2023.00026</a>.
  ieee: 'X. Wu, T. Kenter, R. Schade, T. Kühne, and C. Plessl, “Computing and Compressing
    Electron Repulsion Integrals on FPGAs,” in <i>2023 IEEE 31st Annual International
    Symposium on Field-Programmable Custom Computing Machines (FCCM)</i>, 2023, pp.
    162–173, doi: <a href="https://doi.org/10.1109/FCCM57271.2023.00026">10.1109/FCCM57271.2023.00026</a>.'
  mla: Wu, Xin, et al. “Computing and Compressing Electron Repulsion Integrals on
    FPGAs.” <i>2023 IEEE 31st Annual International Symposium on Field-Programmable
    Custom Computing Machines (FCCM)</i>, 2023, pp. 162–73, doi:<a href="https://doi.org/10.1109/FCCM57271.2023.00026">10.1109/FCCM57271.2023.00026</a>.
  short: 'X. Wu, T. Kenter, R. Schade, T. Kühne, C. Plessl, in: 2023 IEEE 31st Annual
    International Symposium on Field-Programmable Custom Computing Machines (FCCM),
    2023, pp. 162–173.'
date_created: 2023-03-30T11:15:40Z
date_updated: 2023-08-02T15:05:42Z
department:
- _id: '27'
- _id: '518'
doi: 10.1109/FCCM57271.2023.00026
external_id:
  arxiv:
  - '2303.13632'
language:
- iso: eng
main_file_link:
- url: https://ieeexplore.ieee.org/document/10171537
page: 162-173
project:
- _id: '52'
  name: 'PC2: Computing Resources Provided by the Paderborn Center for Parallel Computing'
publication: 2023 IEEE 31st Annual International Symposium on Field-Programmable Custom
  Computing Machines (FCCM)
quality_controlled: '1'
status: public
title: Computing and Compressing Electron Repulsion Integrals on FPGAs
type: conference
user_id: '75963'
year: '2023'
...
