---
_id: '15478'
abstract:
- lang: eng
  text: Stratix 10 FPGA cards have a good potential for the acceleration of HPC workloads
    since the Stratix 10 product line introduces devices with a large number of DSP
    and memory blocks. The high level synthesis of OpenCL codes can play a fundamental
    role for FPGAs in HPC, because it allows to implement different designs with lower
    development effort compared to hand optimized HDL. However, Stratix 10 cards are
    still hard to fully exploit using the Intel FPGA SDK for OpenCL. The implementation
    of designs with thousands of concurrent arithmetic operations often suffers from
    place and route problems that limit the maximum frequency or entirely prevent
    a successful synthesis. In order to overcome these issues for the implementation
    of the matrix multiplication, we formulate Cannon's matrix multiplication algorithm
    with regard to its efficient synthesis within the FPGA logic. We obtain a two-level
    block algorithm, where the lower level sub-matrices are multiplied using our Cannon's
    algorithm implementation. Following this design approach with multiple compute
    units, we are able to get maximum frequencies close to and above 300 MHz with
    high utilization of DSP and memory blocks. This allows for performance results
    above 1 TeraFLOPS.
author:
- first_name: Paolo
  full_name: Gorlani, Paolo
  id: '72045'
  last_name: Gorlani
- first_name: Tobias
  full_name: Kenter, Tobias
  id: '3145'
  last_name: Kenter
- first_name: Christian
  full_name: Plessl, Christian
  id: '16153'
  last_name: Plessl
  orcid: 0000-0001-5728-9982
citation:
  ama: 'Gorlani P, Kenter T, Plessl C. OpenCL Implementation of Cannon’s Matrix Multiplication
    Algorithm on Intel Stratix 10 FPGAs. In: <i>Proceedings of the International Conference
    on Field-Programmable Technology (FPT)</i>. IEEE; 2019. doi:<a href="https://doi.org/10.1109/ICFPT47387.2019.00020">10.1109/ICFPT47387.2019.00020</a>'
  apa: Gorlani, P., Kenter, T., &#38; Plessl, C. (2019). OpenCL Implementation of
    Cannon’s Matrix Multiplication Algorithm on Intel Stratix 10 FPGAs. In <i>Proceedings
    of the International Conference on Field-Programmable Technology (FPT)</i>. IEEE.
    <a href="https://doi.org/10.1109/ICFPT47387.2019.00020">https://doi.org/10.1109/ICFPT47387.2019.00020</a>
  bibtex: '@inproceedings{Gorlani_Kenter_Plessl_2019, title={OpenCL Implementation
    of Cannon’s Matrix Multiplication Algorithm on Intel Stratix 10 FPGAs}, DOI={<a
    href="https://doi.org/10.1109/ICFPT47387.2019.00020">10.1109/ICFPT47387.2019.00020</a>},
    booktitle={Proceedings of the International Conference on Field-Programmable Technology
    (FPT)}, publisher={IEEE}, author={Gorlani, Paolo and Kenter, Tobias and Plessl,
    Christian}, year={2019} }'
  chicago: Gorlani, Paolo, Tobias Kenter, and Christian Plessl. “OpenCL Implementation
    of Cannon’s Matrix Multiplication Algorithm on Intel Stratix 10 FPGAs.” In <i>Proceedings
    of the International Conference on Field-Programmable Technology (FPT)</i>. IEEE,
    2019. <a href="https://doi.org/10.1109/ICFPT47387.2019.00020">https://doi.org/10.1109/ICFPT47387.2019.00020</a>.
  ieee: P. Gorlani, T. Kenter, and C. Plessl, “OpenCL Implementation of Cannon’s Matrix
    Multiplication Algorithm on Intel Stratix 10 FPGAs,” in <i>Proceedings of the
    International Conference on Field-Programmable Technology (FPT)</i>, 2019.
  mla: Gorlani, Paolo, et al. “OpenCL Implementation of Cannon’s Matrix Multiplication
    Algorithm on Intel Stratix 10 FPGAs.” <i>Proceedings of the International Conference
    on Field-Programmable Technology (FPT)</i>, IEEE, 2019, doi:<a href="https://doi.org/10.1109/ICFPT47387.2019.00020">10.1109/ICFPT47387.2019.00020</a>.
  short: 'P. Gorlani, T. Kenter, C. Plessl, in: Proceedings of the International Conference
    on Field-Programmable Technology (FPT), IEEE, 2019.'
conference:
  name: International Conference on Field-Programmable Technology (FPT)
date_created: 2020-01-09T12:54:48Z
date_updated: 2022-01-06T06:52:26Z
ddc:
- '004'
department:
- _id: '27'
- _id: '518'
doi: 10.1109/ICFPT47387.2019.00020
file:
- access_level: closed
  content_type: application/pdf
  creator: plessl
  date_created: 2020-01-09T12:53:57Z
  date_updated: 2020-01-09T12:53:57Z
  file_id: '15479'
  file_name: gorlani19_fpt.pdf
  file_size: 250559
  relation: main_file
  success: 1
file_date_updated: 2020-01-09T12:53:57Z
has_accepted_license: '1'
language:
- iso: eng
project:
- _id: '33'
  grant_number: 01|H16005
  name: HighPerMeshes
- _id: '32'
  grant_number: PL 595/2-1
  name: Performance and Efficiency in HPC with Custom Computing
publication: Proceedings of the International Conference on Field-Programmable Technology
  (FPT)
publisher: IEEE
quality_controlled: '1'
status: public
title: OpenCL Implementation of Cannon's Matrix Multiplication Algorithm on Intel
  Stratix 10 FPGAs
type: conference
user_id: '3145'
year: '2019'
...
