Secure and Cost-Efficient Deployment of Data-Intensive AI Workloads in Cloud Platforms

Authors

  • Balasubramanian Bava Jagannathan

Keywords:

Cloud Deployment, Data-Intensive Workloads, Artificial Intelligence Pipelines, Governance-Aware Optimization, Operational Observability

Abstract

Cloud infrastructure remains the primary deployment platform for data-hungry AI pipelines, with elastic compute and managed storage allowing rapid provisioning at scale. Yet engineering production-grade deployments remains a poorly solved problem․ The objectives on performance‚ cost‚ security‚ and operational readiness are tightly coupled with each other‚ but the existing deployment frameworks optimize them separately and only satisfy one objective at a time․ End-to-end AI pipelines entail diverse data ingestion/transformation‚ feature generation‚ model training‚ batch inference‚ online serving‚ and continuous monitoring‚ which exhibit heterogeneous resource utilization and scaling․ A monolithic deployment strategy cannot simultaneously meet the needs of various components․ We propose a framework that proactively realizes data locality, elastic resource allocation, governance-aware isolation, and observability readiness at design time as opposed to applying these concepts post-deployment. It generates candidate deployment plans in placement and scaling dimensions from stage-level workload characteristics‚ filters them using policy and observability feasibility gates‚ and emits run-time readiness artifacts for auditability and reliable run-time operations․ The problem is framed as a constrained multi-objective optimization․ The parameters of interest are the tail latency, total cloud cost, and a surrogate for operational risk, which incorporates the exposure surface and the blast radius. The trade-offs between data localization, elasticity, and governance are investigated, and it is shown that joint planning can reveal deployment options overlooked by several performance- and cost-first baselines

Downloads

Download data is not yet available.

References

Syed Nyamtulla and Dr. Dhirendra Kumar Tripathi, "Serverless vs Traditional Cloud Architectures: Performance and Cost Evaluation of AI/ML Workloads in HPC Environments," International Research Journal of Engineering & Applied Sciences, 2025. [Online. . Available: https://www.irjeas.org/wp-content/uploads/admin/volume13/V13I4/IRJEAS04V13I4017.pdf

Weizheng Xu et al., "Parallelizing DNN training on GPUs: Challenges and opportunities." Companion Proceedings of the Web Conference 2021. [Online. . Available: https://dl.acm.org/doi/pdf/10.1145/3442442.3452055

Qizhen Weng et al., "{MLaaS} in the wild: Workload analysis and scheduling in {Large-Scale} heterogeneous {GPU} clusters," 19th USENIX Symposium on Networked Systems Design and Implementation (NSDI 22). 2022. [Online. . Available: https://www.usenix.org/system/files/nsdi22-paper-weng.pdf

Juncheng Gu et al., "Tiresias: A {GPU} cluster manager for distributed deep learning," 16th USENIX Symposium on Networked Systems Design and Implementation (NSDI 19). 2019. [Online. . Available: https://www.usenix.org/system/files/nsdi19-gu.pdf

Sohaib Ahmad et al., "Proteus: A high-throughput inference-serving system with accuracy scaling." Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 1. 2024. [Online. . Available: https://dl.acm.org/doi/pdf/10.1145/3617232.3624849

Arpan Gujarati et al., "Serving {DNNs} like clockwork: Performance predictability from the bottom up," 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20). 2020. [Online. . Available: https://www.usenix.org/system/files/osdi20-gujarati.pdf

Maria Papaioannou et al., “A survey on security threats and countermeasures in the internet of medical things (IoMT)," Transactions on Emerging Telecommunications Technologies 33.6 (2022): e4049. [Online. . Available: https://onlinelibrary.wiley.com/doi/abs/10.1002/ett.4049

IBRAHEEM ADEBAYO ADEREMI et al., "Explainable AI for Water Quality Monitoring: A Systematic Review of Transparency, Interpretability, and Trust." IEEE Sensors Reviews (2025). [Online. . Available: https://ieeexplore.ieee.org/document/11112533

Yinfang Chen et al., "Automatic root cause analysis via large language models for cloud incidents," Proceedings of the Nineteenth European Conference on Computer Systems. 2024. [Online. . Available: https://dl.acm.org/doi/pdf/10.1145/3627703.3629553

Falope Samson, "Multi-Modal AI for Serverless Cloud Security." (2026). [Online. . Available: https://www.researchgate.net/profile/Falope-Samson/publication/403569954

Deepak Narayanan et al., "Efficient large-scale language model training on GPU clusters using Megatron-LM," Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis. 2021. [Online. . Available: https://dl.acm.org/doi/pdf/10.1145/3458817.3476209

Sandra Wachter et al., "Why a right to explanation of automated decision-making does not exist in the general data protection regulation." International data privacy law 7.2 (2017): 76-99. [Online. . Available: https://academic.oup.com/idpl/article-abstract/7/2/76/3860948?redirectedFrom=PDF

Hongzi Mao et al., "Park: An open platform for learning-augmented computer systems." Advances in Neural Information Processing Systems 32 (2019). [Online. . Available: https://proceedings.neurips.cc/paper/2019/file/f69e505b08403ad2298b9f262659929a-Paper.pdf

Jasmin Bogatinovski et al., "Artificial Intelligence for IT Operations (AIOps)," Workshop White Paper, arXiv preprint arXiv:2101.06054 (2021). [Online. . Available: https://arxiv.org/pdf/2101.06054

JOHN OUSTERHOUT et al., "The RAMCloud storage system." ACM Transactions on Computer Systems (TOCS) 33.3 (2015): 1-55. [Online. . Available: https://dl.acm.org/doi/pdf/10.1145/2806887

Downloads

Published

14.02.2026

How to Cite

Balasubramanian Bava Jagannathan. (2026). Secure and Cost-Efficient Deployment of Data-Intensive AI Workloads in Cloud Platforms. International Journal of Intelligent Systems and Applications in Engineering, 14(1s), 1482–1489. Retrieved from https://www.ijisae.org/index.php/IJISAE/article/view/8375

Issue

Section

Research Article