Optimized HPC Workload Division Strategy on Heterogeneity Computing Platform

Authors

  • Chandrashekhar B N, Kantharaju V, Harish Kumar N, Suresh H , Geetha V

Keywords:

CPU, GPU, Heterogeneous, HPC, LINPAC, workload

Abstract

Over the past ten years, an incredible expansion in the computation intensity and applications of graphics processing units. The key difficulty for CPU+GPU heterogeneous clusters is effectively allocating the HPC load among the CPU and GPUs while minimizing intra- and inter-node communication costs. To address these difficulties, in this work a systematic workload division technique was developed. The analytic load allocation model divides the HPC workload amongst CPU and GPUs efficiently by considering a number of factors. We have taken into consideration the pinned memory mechanism sole MPI process on corresponding nodes to reduce the overhead of communication between and within nodes. The proposed work used the MPI+OpenMP+CUDA to efficiently utilize CPU and GPU resources. We have evaluated our method on a random dataset using well-known compute-intensive programs like LINPACK. The findings of the experiment show that, in contrast to Adaptive partitioning technique intended analytic HPC workload partition approach worked better

Downloads

Download data is not yet available.

References

Chandrashekhar, B.N., Kantharaju, V., Harish Kumar, N. et.al. (2024) “Balancing of Web Applications Workload Using Hybrid Computing (CPU-GPU) Architecture”. SNCOMPUT.SCI. Journal 5,127 Springer https://doi.org/10.1007/s42979-023-02444-2.

J Zeng, L., Alawneh, S. G., & Arefifar, S. A. (2024). Parallel multi-GPU implementation of fast decoupled power flow solver with hybrid architecture. Cluster Computing, 27(1), 1125-1136.

B. N. Chandrashekar, Mohan M, and Geetha V,( 2023) "Forecast Model for Scheduling an HPC Application on CPU and GPU Architecture," 3rd International Conference on Intelligent Technologies (CONIT-2023), pp. 1-5, DOI: 10.1109/CONIT59222.2023.10205724.) IEEE

T. Shimokawabe, T. Aoki, T. Takaki, T. Endo, A. Yamanaka,N. Maruyama, A. Nukada, and S. Matsuoka, “Peta-scale phase-field simulation for dendritic solidification on the TSUBAME 2.0 supercomputer,”in Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, Nov. 2011,pp. 3:1–3:11..

Du, Dayou, Gu Gong, and Xiaowen Chu. "Model Quantization and Hardware Acceleration for Vision Transformers: A Comprehensive Survey." arXiv preprint arXiv:2405.00314 (2024).

S Chalumeau, Felix, Bryan Lim, Raphael Boige, Maxime Allard, Luca Grillotti, Manon Flageat, Valentin Macé et al. "Qdax: A library for quality-diversity and population-based algorithms with hardware acceleration." Journal of Machine Learning Research 25, no. 108 (2024): 1-16.

Rong Shi, SreeramPotluri, Khaled Hamidouche, Xiaoyi Lu, Karen Tomko, and Dhabaleswar K. (DK) Panda Ohio State University “A Scalable and Portable Approach to Accelerate Hybrid HPL on Heterogeneous CPU-GPU Clusters” 978-1-4799-0898-1/132013 IEEE

SimpliceDonfack , StanimireTomovand Jack Dongarra Innovative Computing Laboratory, University of Tennessee, Knoxville, USA“Dynamically balanced synchronization-avoiding LU factorization with multicore and GPUs” 2014 IEEE 28th International Parallel & Distributed Processing Symposium Workshops 978-1-4799-4116-2/14 IEEE Computer society

TakuroUdagawa, Masakazu Sekijima “GPU Accelerated Molecular Dynamics with Method of Heterogeneous Load Balancing” 2015 IEEE International Parallel and Distributed Processing Symposium Workshop 978-1-4673-7684-6/15 IEEE Computer society.

Chakkour, Tarik. "Parallel computation to bidimensional heat equation using MPI/CUDA and FFTW package." Frontiers in Computer Science 5 (2024): 1305800.

Liu, Yiqian, et al. "Indigo3: A Parallel Graph Analytics Benchmark Suite for Exploring Implementation Styles and Common Bugs." ACM Transactions on Parallel Computing (2024).

Mark Joselli, Esteban Clua, Anselmo Montenegro, Aura Conci, and Paulo Pagliosa. A new physics engine with automatic process distribution between cpu-gpu. In Sandbox ’08: Proceedings of the 2008 ACM SIGGRAPH symposium on Video games, pages 149– 156, New York, NY, USA, 2008. ACM.

Hong, Yuxi, et al. "High performance computing seismic redatuming by inversion with algebraic compression and multiple precisions." The International Journal of High-Performance Computing Applications (2024): 10943420231226190.

Shane Cook. CUDA programming: a developer’s guide to parallel computing with GPUs. Newnes, 2013

Dongarra J J, Luszczek P, Petitet A. “The linpack benchmark: Past, present and future”. Concurrency and Computation: Practice and Experience, 2003, 15(9): 803-820.: http://dl.z-thz.com/eBook/zomega_ebook_pdf_1206_sr.pdf. Accessed on: May 19, 2014.

.Massimiliano Fatica. Accelerating linpack with CUDA on heterogeneous clusters. In David R. Kaeli and Miriam Leeser, editors, GPGPU, volume 383 of ACM International Conference Proceeding Series, pages 46–51. ACM, 2009.

Canqun Yang, Feng Wang, Yunfei Du, Juan Chen, Jie Liu, Huizhan Yi and Kai Lu , “Adaptive Optimization for Petascale Heterogeneous CPU/GPU Computing” National High Technology Research and Development Program of China 978-0-7695-4220-1/10 $26.00 © 2010 IEEE DOI 10.1109/CLUSTER.2010.12.

Aaron Becker, Isaac Dooley, and Laxmikant Kale. Flexible hardware mapping for finite element simulations on hybrid cpu / gpu clusters. In SAAHPC : Symposium on Application Accelerators in HPC, July 2009.

B. N. Chandrashekar, Mohan M, and Geetha V, "Forecast Model for Scheduling an HPC Application on CPU and GPU Architecture," 2023 3rd International Conference on Intelligent Technologies (CONIT-2023), pp. 1-5, DOI: 10.1109/CONIT59222.2023.10205724.).

. Chandrasekhar B N, Sanjay .H.A “Performance Study of OpenMP and Hybrid Programming Models on CPU-GPU Cluster” Fifth Scopus International Conference on ‘Emerging Research in Computing, Information, Communication and Applications’, (ERCICA-2018) springer publisher

Chandrashekhar B.N, Sanjay H. A “Dynamic Workload Balancing for Compute Intensive Application Using Parallel and Hybrid Programming Models on CPU-GPU Cluster” Journal of computational and theoretical Nanoscience American scientific Publishers Volume 15, Numbers 6-7, June 2018,pp. 2336-2340(5), DOI: https://doi.org/10.1166/jctn.2018.7464.

B. N. Chandrashekhar and H. A. Sanjay “Performance Framework for HPC Applications on Homogeneous Computing Platform” International Journal of Image, Graphics and Signal Processing (IJIGSP) MECS Press Publishers Vol. 11, No. 8, pp,28-39,2019, DOI: 10.5815/ijigsp.2019.08.03.

Chandrashekhar B. N, Sanjay H. A, Mohan Murthy. Performance Driven Analytical Workload Division Model for the HPC Applications on CPU-GPU Heterogeneous Cluster, Springer Cluster computing Journal 28 September 2022, https://doi.org/10.21203/rs.3.rs-2096666.

B. N. Chandrashekar, K. Aditya Shastry, B. A. Manjunath and V. Geetha, "Performance Model of HPC Application On CPU-GPU Platform," 2022 IEEE 2nd Mysore Sub Section International Conference (MysuruCon), 2022, pp. 1-6, DOI: 10.1109/MysuruCon55714.2022.9972737.

Downloads

Published

23.07.2024

How to Cite

Chandrashekhar B N. (2024). Optimized HPC Workload Division Strategy on Heterogeneity Computing Platform. International Journal of Intelligent Systems and Applications in Engineering, 12(4), 1996–2004. Retrieved from https://www.ijisae.org/index.php/IJISAE/article/view/6518

Issue

Section

Research Article