Telemetry-Guided Power Optimization for Energy-Efficient AI Datacenter Infrastructure

Authors

  • Seshadri Ravikiran Vedula

Keywords:

Telemetry Monitoring, AI Data Center Infrastructure, Energy Efficient Data Centers, Power Optimization, GPU Utilization.

Abstract

The current AI data center architecture is experiencing a challenge of energy usage due to the rapid growth of workloads in the cloud computing and artificial intelligence models. Electricity is now being used in amounts to power these infrastructures such as servers, GPUs and networking equipment and cooling systems. This paper analyzes a telemetry power optimization design that has the potential to optimize the quality of energy of AI data center facilities. The study collects the operational telemetry values of servers, GPUs, network equipment, and cooling equipment and analyses the relationship of the workload consumption and the power consumption. The mean utilization of GPUs was 67% and that of CPU was 54% according to experimental works. The factual evidence shows that the utilization of resources in the process of working on AI was unequal. On the first level, the system turned out to be 1.62 regarding Power Usability Effectiveness (PUE). The implementation of telemetry-driven optimization plans such as workload consolidation, dynamic voltage and frequency control as well as adaptive cooling control led to budget savings i.e. 555 KW to 480 KW which translated to a total of 13.5% savings in energy. It became possible to save 16.7% on the cooling energy used. Such results confirm the idea that both monitoring and optimization strategies, whose implementation relies on telemetry measures, may make an impressive contribution to the area of energy efficiency without still compromising the coherent performance of AI data-centers.

 

Downloads

Download data is not yet available.

References

A. Al-Dulaimy, W. Itani, A. Zekri, and R. Zantout, “Power management in virtualized data centers: State of the art,” Journal of Cloud Computing: Advances, Systems and Applications, vol. 5, no. 1, 2016. doi: 10.1186/s13677-016-0055-y.

V. M. Raj and R. Shriram, “Power management in virtualized datacenter – A survey,” Journal of Network and Computer Applications, vol. 69, pp. 117–133, 2016. doi: 10.1016/j.jnca.2016.04.01.

M. Zakarya, “Energy, performance and cost efficient datacenters: A survey,” Renewable and Sustainable Energy Reviews, vol. 94, pp. 363–385, 2018. doi: 10.1016/j.rser.2018.06.005.

Y. Lin, Y. Zhou, Z. Liu, K. Liu, Y. Wang, M. Xu, J. Bi, Y. Liu, and J. Wu, “NetView: Towards on-demand network-wide telemetry in the data center,” Computer Networks, vol. 180, p. 107386, 2020. doi: 10.1016/j.comnet.2020.107386.

E. Ates, B. Aksar, V. J. Leung, and A. K. Coskun, “Counterfactual explanations for multivariate time series,” in Proceedings of the International Conference on Artificial Intelligence, 2021, pp. 1–8. doi: 10.1109/icapai49758.2021.9462056.

M. Dayarathna, Y. Wen, and R. Fan, “Data center energy consumption modeling: A survey,” IEEE Communications Surveys & Tutorials, vol. 18, no. 1, pp. 732–794, 2015. doi: 10.1109/comst.2015.2481183.

J. Athavale, M. Yoda, and Y. Joshi, “Thermal modeling of data centers for control and energy usage optimization,” in Advances in Heat Transfer, 2018, pp. 123–186. doi: 10.1016/bs.aiht.2018.07.001.

S. A. Ali, M. Affan, and M. Alam, “A study of efficient energy management techniques for cloud computing environment,” arXiv preprint, Oct. 2018. Available: https://arxiv.org/abs/1810.07458.

J. Ma, L. Xia, and Q. Li, “Optimal energy-efficient policies for data centers through sensitivity-based optimization,” arXiv preprint, 2018. doi: 10.48550/arxiv.1808.07905.

T. Wang, B. Qin, Z. Su, Y. Xia, M. Hamdi, S. Foufou, and R. Hamila, “Towards bandwidth guaranteed energy efficient data center networking,” Journal of Cloud Computing: Advances, Systems and Applications, vol. 4, no. 1, 2015. doi: 10.1186/s13677-015-0035-7.

X. Li, C. Lung, and S. Majumdar, “Green spine switch management for datacenter networks,” Journal of Cloud Computing: Advances, Systems and Applications, vol. 5, no. 1, 2016. doi: 10.1186/s13677-016-0058-8.

P. Charalampou and E. D. Sykas, “An SDN focused approach for energy aware traffic engineering in data centers,” Sensors, vol. 19, no. 18, p. 3980, 2019. doi: 10.3390/s19183980.

Y. Li, Y. Wen, K. Guan, and D. Tao, “Transforming cooling optimization for green data center via deep reinforcement learning,” arXiv preprint, 2017. doi: 10.48550/arxiv.1709.05077.

X. Wu, A. Marathe, S. Jana, O. Vysocky, J. John, A. Bartolini, L. Riha, M. Gerndt, V. Taylor, and S. Bhalachandra, “Toward an end-to-end auto-tuning framework in HPC PowerStack,” arXiv preprint, 2020. doi: 10.48550/arxiv.2008.06571.

H. Wang, J. Huang, X. Lin, and H. Mohsenian-Rad, “Proactive demand response for data centers: A win-win solution,” IEEE Transactions on Smart Grid, vol. 7, no. 3, pp. 1584–1596, 2015. doi: 10.1109/tsg.2015.2501808.

T. Z. Oo, N. H. Tran, C. S. Hong, S. Ren, and G. Quan, “Power management in data centers,” in Advances in Computers, 2015, pp. 1–57. doi: 10.1016/bs.adcom.2015.10.001.

P. Behzadnia, Y. Tu, B. Zeng, and W. Yuan, “Energy-aware disk storage management: Online approach with application in DBMS,” arXiv preprint, 2017. doi: 10.48550/arxiv.1703.02591.

E. Mocanu, D. C. Mocanu, P. H. Nguyen, A. Liotta, M. E. Webber, M. Gibescu, and J. G. Slootweg, “On-line building energy optimization using deep reinforcement learning,” arXiv preprint, 2017. doi: 10.48550/arxiv.1707.05878.

S. H. Mohamed, T. E. H. El-Gorashi, and J. M. H. Elmirghani, “Energy efficiency of server-centric PON data center architecture for fog computing,” arXiv preprint, 2018. doi: 10.48550/arxiv.1808.06113.

R. P. Patel and R. Makawana, “Energy-aware manipulated framework for power calculation in cloud datacenters to condense power consumption,” Journal of Emerging Technologies and Innovative Research, vol. 6, no. 3, pp. 112–113, 2019. Available: https://www.jetir.org/papers/JETIR1903A16.pdf

Downloads

Published

23.07.2024

How to Cite

Seshadri Ravikiran Vedula. (2024). Telemetry-Guided Power Optimization for Energy-Efficient AI Datacenter Infrastructure. International Journal of Intelligent Systems and Applications in Engineering, 12(4), 6044–6053. Retrieved from https://www.ijisae.org/index.php/IJISAE/article/view/8296

Issue

Section

Research Article