Telemetry-Guided Power Optimization for Energy-Efficient AI Datacenter Infrastructure
Keywords:
Telemetry Monitoring, AI Data Center Infrastructure, Energy Efficient Data Centers, Power Optimization, GPU Utilization.Abstract
The current AI data center architecture is experiencing a challenge of energy usage due to the rapid growth of workloads in the cloud computing and artificial intelligence models. Electricity is now being used in amounts to power these infrastructures such as servers, GPUs and networking equipment and cooling systems. This paper analyzes a telemetry power optimization design that has the potential to optimize the quality of energy of AI data center facilities. The study collects the operational telemetry values of servers, GPUs, network equipment, and cooling equipment and analyses the relationship of the workload consumption and the power consumption. The mean utilization of GPUs was 67% and that of CPU was 54% according to experimental works. The factual evidence shows that the utilization of resources in the process of working on AI was unequal. On the first level, the system turned out to be 1.62 regarding Power Usability Effectiveness (PUE). The implementation of telemetry-driven optimization plans such as workload consolidation, dynamic voltage and frequency control as well as adaptive cooling control led to budget savings i.e. 555 KW to 480 KW which translated to a total of 13.5% savings in energy. It became possible to save 16.7% on the cooling energy used. Such results confirm the idea that both monitoring and optimization strategies, whose implementation relies on telemetry measures, may make an impressive contribution to the area of energy efficiency without still compromising the coherent performance of AI data-centers.
Downloads
References
A. Al-Dulaimy, W. Itani, A. Zekri, and R. Zantout, “Power management in virtualized data centers: State of the art,” Journal of Cloud Computing: Advances, Systems and Applications, vol. 5, no. 1, 2016. doi: 10.1186/s13677-016-0055-y.
V. M. Raj and R. Shriram, “Power management in virtualized datacenter – A survey,” Journal of Network and Computer Applications, vol. 69, pp. 117–133, 2016. doi: 10.1016/j.jnca.2016.04.01.
M. Zakarya, “Energy, performance and cost efficient datacenters: A survey,” Renewable and Sustainable Energy Reviews, vol. 94, pp. 363–385, 2018. doi: 10.1016/j.rser.2018.06.005.
Y. Lin, Y. Zhou, Z. Liu, K. Liu, Y. Wang, M. Xu, J. Bi, Y. Liu, and J. Wu, “NetView: Towards on-demand network-wide telemetry in the data center,” Computer Networks, vol. 180, p. 107386, 2020. doi: 10.1016/j.comnet.2020.107386.
E. Ates, B. Aksar, V. J. Leung, and A. K. Coskun, “Counterfactual explanations for multivariate time series,” in Proceedings of the International Conference on Artificial Intelligence, 2021, pp. 1–8. doi: 10.1109/icapai49758.2021.9462056.
M. Dayarathna, Y. Wen, and R. Fan, “Data center energy consumption modeling: A survey,” IEEE Communications Surveys & Tutorials, vol. 18, no. 1, pp. 732–794, 2015. doi: 10.1109/comst.2015.2481183.
J. Athavale, M. Yoda, and Y. Joshi, “Thermal modeling of data centers for control and energy usage optimization,” in Advances in Heat Transfer, 2018, pp. 123–186. doi: 10.1016/bs.aiht.2018.07.001.
S. A. Ali, M. Affan, and M. Alam, “A study of efficient energy management techniques for cloud computing environment,” arXiv preprint, Oct. 2018. Available: https://arxiv.org/abs/1810.07458.
J. Ma, L. Xia, and Q. Li, “Optimal energy-efficient policies for data centers through sensitivity-based optimization,” arXiv preprint, 2018. doi: 10.48550/arxiv.1808.07905.
T. Wang, B. Qin, Z. Su, Y. Xia, M. Hamdi, S. Foufou, and R. Hamila, “Towards bandwidth guaranteed energy efficient data center networking,” Journal of Cloud Computing: Advances, Systems and Applications, vol. 4, no. 1, 2015. doi: 10.1186/s13677-015-0035-7.
X. Li, C. Lung, and S. Majumdar, “Green spine switch management for datacenter networks,” Journal of Cloud Computing: Advances, Systems and Applications, vol. 5, no. 1, 2016. doi: 10.1186/s13677-016-0058-8.
P. Charalampou and E. D. Sykas, “An SDN focused approach for energy aware traffic engineering in data centers,” Sensors, vol. 19, no. 18, p. 3980, 2019. doi: 10.3390/s19183980.
Y. Li, Y. Wen, K. Guan, and D. Tao, “Transforming cooling optimization for green data center via deep reinforcement learning,” arXiv preprint, 2017. doi: 10.48550/arxiv.1709.05077.
X. Wu, A. Marathe, S. Jana, O. Vysocky, J. John, A. Bartolini, L. Riha, M. Gerndt, V. Taylor, and S. Bhalachandra, “Toward an end-to-end auto-tuning framework in HPC PowerStack,” arXiv preprint, 2020. doi: 10.48550/arxiv.2008.06571.
H. Wang, J. Huang, X. Lin, and H. Mohsenian-Rad, “Proactive demand response for data centers: A win-win solution,” IEEE Transactions on Smart Grid, vol. 7, no. 3, pp. 1584–1596, 2015. doi: 10.1109/tsg.2015.2501808.
T. Z. Oo, N. H. Tran, C. S. Hong, S. Ren, and G. Quan, “Power management in data centers,” in Advances in Computers, 2015, pp. 1–57. doi: 10.1016/bs.adcom.2015.10.001.
P. Behzadnia, Y. Tu, B. Zeng, and W. Yuan, “Energy-aware disk storage management: Online approach with application in DBMS,” arXiv preprint, 2017. doi: 10.48550/arxiv.1703.02591.
E. Mocanu, D. C. Mocanu, P. H. Nguyen, A. Liotta, M. E. Webber, M. Gibescu, and J. G. Slootweg, “On-line building energy optimization using deep reinforcement learning,” arXiv preprint, 2017. doi: 10.48550/arxiv.1707.05878.
S. H. Mohamed, T. E. H. El-Gorashi, and J. M. H. Elmirghani, “Energy efficiency of server-centric PON data center architecture for fog computing,” arXiv preprint, 2018. doi: 10.48550/arxiv.1808.06113.
R. P. Patel and R. Makawana, “Energy-aware manipulated framework for power calculation in cloud datacenters to condense power consumption,” Journal of Emerging Technologies and Innovative Research, vol. 6, no. 3, pp. 112–113, 2019. Available: https://www.jetir.org/papers/JETIR1903A16.pdf
Downloads
Published
How to Cite
Issue
Section
License

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
All papers should be submitted electronically. All submitted manuscripts must be original work that is not under submission at another journal or under consideration for publication in another form, such as a monograph or chapter of a book. Authors of submitted papers are obligated not to submit their paper for publication elsewhere until an editorial decision is rendered on their submission. Further, authors of accepted papers are prohibited from publishing the results in other publications that appear before the paper is published in the Journal unless they receive approval for doing so from the Editor-In-Chief.
IJISAE open access articles are licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. This license lets the audience to give appropriate credit, provide a link to the license, and indicate if changes were made and if they remix, transform, or build upon the material, they must distribute contributions under the same license as the original.


