{"department":[{"_id":"27"},{"_id":"518"}],"user_id":"3145","_id":"65102","project":[{"name":"Computing Resources Provided by the Paderborn Center for Parallel Computing","_id":"52"}],"language":[{"iso":"eng"}],"publication":"Proceedings of the SC '25 Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis","type":"conference","status":"public","abstract":[{"lang":"eng","text":"Efficient graph processing is essential for a wide range of applications. Scalability and memory access patterns are still a challenge, especially with the Breadth-First Search algorithm. This work focuses on leveraging HPC systems with multiple GPUs available in a single node with peer-to-peer functionality of the Intel oneAPI implementation of SYCL. We propose three GPU-based load-balancing methods: work-group localisation for efficient data access, even workload distribution for higher GPU occupancy, and a hybrid strided-access approach for heuristic balancing. These methods ensure performance, portability, and productivity with a unified codebase. Our proposed methodologies outperform state-of-the-art single-GPU implementations based on CUDA on synthetic RMAT graphs. We analysed BFS performance across NVIDIA A100, Intel Max 1550, and AMD MI300X GPUs, achieving a peak performance of 153.27 GTEPS on an RMAT25-64 graph using 8 GPUs on the NVIDIA A100. Furthermore, our work demonstrates the capability to handle RMAT graphs up to scale 29, achieving superior performance on synthetic graphs and competitive results on real-world datasets."}],"author":[{"last_name":"Olgu","full_name":"Olgu, Kaan","first_name":"Kaan"},{"first_name":"Tobias","full_name":"Kenter, Tobias","id":"3145","last_name":"Kenter"},{"first_name":"Jose","full_name":"Nunez-Yanez, Jose","last_name":"Nunez-Yanez"},{"first_name":"Simon","full_name":"McIntosh-Smith, Simon","last_name":"McIntosh-Smith"},{"full_name":"Deakin, Tom","last_name":"Deakin","first_name":"Tom"}],"date_created":"2026-03-24T09:05:22Z","date_updated":"2026-03-24T09:06:33Z","publisher":"ACM","doi":"10.1145/3731599.3767570","title":"Towards Efficient Load Balancing BFS on GPUs: One Code for AMD, Intel & Nvidia","publication_status":"published","citation":{"mla":"Olgu, Kaan, et al. “Towards Efficient Load Balancing BFS on GPUs: One Code for AMD, Intel & Nvidia.” Proceedings of the SC ’25 Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis, ACM, 2025, doi:10.1145/3731599.3767570.","short":"K. Olgu, T. Kenter, J. Nunez-Yanez, S. McIntosh-Smith, T. Deakin, in: Proceedings of the SC ’25 Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis, ACM, 2025.","bibtex":"@inproceedings{Olgu_Kenter_Nunez-Yanez_McIntosh-Smith_Deakin_2025, title={Towards Efficient Load Balancing BFS on GPUs: One Code for AMD, Intel & Nvidia}, DOI={10.1145/3731599.3767570}, booktitle={Proceedings of the SC ’25 Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis}, publisher={ACM}, author={Olgu, Kaan and Kenter, Tobias and Nunez-Yanez, Jose and McIntosh-Smith, Simon and Deakin, Tom}, year={2025} }","apa":"Olgu, K., Kenter, T., Nunez-Yanez, J., McIntosh-Smith, S., & Deakin, T. (2025). Towards Efficient Load Balancing BFS on GPUs: One Code for AMD, Intel & Nvidia. Proceedings of the SC ’25 Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis. https://doi.org/10.1145/3731599.3767570","ieee":"K. Olgu, T. Kenter, J. Nunez-Yanez, S. McIntosh-Smith, and T. Deakin, “Towards Efficient Load Balancing BFS on GPUs: One Code for AMD, Intel & Nvidia,” 2025, doi: 10.1145/3731599.3767570.","chicago":"Olgu, Kaan, Tobias Kenter, Jose Nunez-Yanez, Simon McIntosh-Smith, and Tom Deakin. “Towards Efficient Load Balancing BFS on GPUs: One Code for AMD, Intel & Nvidia.” In Proceedings of the SC ’25 Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis. ACM, 2025. https://doi.org/10.1145/3731599.3767570.","ama":"Olgu K, Kenter T, Nunez-Yanez J, McIntosh-Smith S, Deakin T. Towards Efficient Load Balancing BFS on GPUs: One Code for AMD, Intel & Nvidia. In: Proceedings of the SC ’25 Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis. ACM; 2025. doi:10.1145/3731599.3767570"},"year":"2025"}