Automated Data Pipeline Optimization for Real-Time Machine Learning Inference

Authors

  • Bhanu Prakash Reddy Rella, Rahul Kumar Konduru

Keywords:

Machine Learning, Automation, Automated Data pipeline, Real time interference

Abstract

This has catalyzed the enhanced desire of real-time ML, which therefore requires effective data pipeline that involves data pre-processing, feature selection, and model assessment. This is a system that integrates Models for automated data pipeline; this optimizes the ML process, reduces the chances of human error, and enhance predictive models’ accuracy. Developed with Python, the Scikit-learn library and Streamlit, the system allows for data uploading, data preprocessing, feature selection choice and models’ assessment. Also, presented results confirm higher effectiveness and availability to a larger number of users of the resulting products. Though there are some limitations like compatibility issues with the datasets, computation time and memory etc, the future augmentations based on deep learning, real-time data streaming along with the use of cloud environment for deployment will improve the prospects of automation in ML.

Downloads

Download data is not yet available.

References

. Bian, J., Al Arafat, A., Xiong, H., Li, J., Li, L., Chen, H., Wang, J., Dou, D. and Guo, Z., 2022. Machine learning in real-time Internet of Things (IoT) systems: A survey. IEEE Internet of Things Journal, 9(11), pp.8364-8386.

. Kum, S., Oh, S., Yeom, J. and Moon, J., 2022. Optimization of edge resources for deep learning application with batch and model management. Sensors, 22(17), p.6717.

. Kuchnik, M., Klimovic, A., Simsa, J., Smith, V. and Amvrosiadis, G., 2022. Plumber: Diagnosing and removing performance bottlenecks in machine learning data pipelines. Proceedings of Machine Learning and Systems, 4, pp.33-51.

. Nasir, W. and Jack, H., 2025. Real-Time Machine Learning Pipelines: Optimizing Stream Processing for Scalable AI Applications. ResearchGate AI & Data Science Journal.

. Xiang, Y. and Kim, H., 2019, December. Pipelined data-parallel CPU/GPU scheduling for multi-DNN real-time inference. In 2019 IEEE Real-Time Systems Symposium (RTSS) (pp. 392-405). IEEE.

. Abbas, T. and Eldred, A., 2025. AI-Powered Stream Processing: Bridging Real-Time Data Pipelines with Advanced Machine Learning Techniques. ResearchGate Journal of AI & Cloud Analytics.

. Derakhshan, B., Mahdiraji, A.R., Rabl, T. and Markl, V., 2019, March. Continuous Deployment of Machine Learning Pipelines. In EDBT (pp. 397-408).

. Rachakatla, S.K., Ravichandran, P. and Kumar, N., 2022. Scalable Machine Learning Workflows in Data Warehousing: Automating Model Training and Deployment with AI. Australian Journal of AI and Data Science.

. Crankshaw, D., Sela, G.E., Mo, X., Zumar, C., Stoica, I., Gonzalez, J. and Tumanov, A., 2020, October. InferLine: latency-aware provisioning and scaling for prediction serving pipelines. In Proceedings of the 11th ACM Symposium on Cloud Computing (pp. 477-491).

. González, G. and Evans, C.L., 2019. Biomedical Image Processing with Containers and Deep Learning: An Automated Analysis Pipeline: Data architecture, artificial intelligence, automated processing, containerization, and clusters orchestration ease the transition from data acquisition to insights in medium‐to‐large datasets. BioEssays, 41(6), p.1900004.

. Alves, J.M., Honório, L.M. and Capretz, M.A., 2019. ML4IoT: A framework to orchestrate machine learning workflows on internet of things data. IEEE Access, 7, pp.152953-152967.

. Swamy, T., Zulfiqar, A., Nardi, L., Shahbaz, M. and Olukotun, K., 2023, March. Homunculus: Auto-generating efficient data-plane ml pipelines for datacenter networks. In Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3 (pp. 329-342).

. Boppiniti, S.T., 2021. Real-time data analytics with ai: Leveraging stream processing for dynamic decision support. International Journal of Management Education for Sustainable Development, 4(4).

. Hassan, N.A.B., 2025. Managing Data Dependencies in Cloud-Based Big Data Pipelines: Challenges, Solutions, and Performance Optimization Strategies. Orient Journal of Emerging Paradigms in Artificial Intelligence and Autonomous Systems, 15(2), pp.20-28.

. Elshawi, R., Maher, M. and Sakr, S., 2019. Automated machine learning: State-of-the-art and open challenges. arXiv preprint arXiv:1906.02287.

. Niu, W., Li, Z., Ma, X., Dong, P., Zhou, G., Qian, X., Lin, X., Wang, Y. and Ren, B., 2021. Grim: A general, real-time deep learning inference framework for mobile devices based on fine-grained structured weight sparsity. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(10), pp.6224-6239.

. Prosper, J., 2019. Deploying Scalable Deep Learning Models for Real-Time Customer Insight.

. Liu, S., Yao, S., Fu, X., Tabish, R., Yu, S., Bansal, A., Yun, H., Sha, L. and Abdelzaher, T., 2020, December. On removing algorithmic priority inversion from mission-critical machine inference pipelines. In 2020 IEEE Real-Time Systems Symposium (RTSS) (pp. 319-332). IEEE.

. Shuvo, M.M.H., Islam, S.K., Cheng, J. and Morshed, B.I., 2022. Efficient acceleration of deep learning inference on resource-constrained edge devices: A review. Proceedings of the IEEE, 111(1), pp.42-91.

. Shen, Y., Cao, D., Ruddy, K. and Teixeira de Moraes, L.F., 2020. Near real-time hydraulic fracturing event recognition using deep learning methods. SPE Drilling & Completion, 35(03), pp.478-489.

. Smistad, E., Østvik, A., Salte, I.M., Melichova, D., Nguyen, T.M., Haugaa, K., Brunvand, H., Edvardsen, T., Leclerc, S., Bernard, O. and Grenne, B., 2020. Real-time automatic ejection fraction and foreshortening detection using deep learning. IEEE transactions on ultrasonics, ferroelectrics, and frequency control, 67(12), pp.2595-2604.

. Zhao, Z., Wang, K., Ling, N. and Xing, G., 2021, May. Edgeml: An automl framework for real-time deep learning on the edge. In Proceedings of the international conference on internet-of-things design and implementation (pp. 133-144).3.

. Li, Y., Mahjoubfar, A., Chen, C.L., Niazi, K.R., Pei, L. and Jalali, B., 2019. Deep cytometry: deep learning with real-time inference in cell sorting and flow cytometry. Scientific reports, 9(1), p.11088.

. Jeong, E., Kim, J. and Ha, S., 2022. Tensorrt-based framework and optimization methodology for deep learning inference on jetson boards. ACM Transactions on Embedded Computing Systems (TECS), 21(5), pp.1-26.

. Manzoor, S., Kim, E.J., Joo, S.H., Bae, S.H., In, G.G., Joo, K.J., Choi, J.H. and Kuc, T.Y., 2022. Edge deployment framework of guardbot for optimized face mask recognition with real-time inference using deep learning. Ieee Access, 10, pp.77898-77921.

. Ma, D., Fang, H., Wang, N., Zheng, H., Dong, J. and Hu, H., 2022. Automatic defogging, deblurring, and real-time segmentation system for sewer pipeline defects. Automation in Construction, 144, p.104595.

. Zuromski, L.M., Durtschi, J., Aziz, A., Chumley, J., Dewey, M., English, P., Morrison, M., Simmon, K., Whipple, B., O'Fallon, B. and Ng, D.P., 2024. Clinical validation of a real‐time machine learning‐based system for the detection of acute myeloid leukemia by flow cytometry. Cytometry Part B: Clinical Cytometry.

. Ammar, A., Koubaa, A., Boulila, W., Benjdira, B. and Alhabashi, Y., 2023. A multi-stage deep-learning-based vehicle and license plate recognition system with real-time edge inference. Sensors, 23(4), p.2120.

. Seenivasan, D., 2024. AI Driven Enhancement of ETL Workflows for Scalable and Efficient Cloud Data Engineering. International Journal of Engineering and Computer Science, 13(06), pp.10-18535.

. Verma, G., Gupta, Y., Malik, A.M. and Chapman, B., 2021, June. Performance evaluation of deep learning compilers for edge inference. In 2021 IEEE international parallel and distributed processing symposium workshops (IPDPSW) (pp. 858-865). IEEE.

Downloads

Published

10.03.2022

How to Cite

Bhanu Prakash Reddy Rella. (2022). Automated Data Pipeline Optimization for Real-Time Machine Learning Inference. International Journal of Intelligent Systems and Applications in Engineering, 10(1), 163–176. Retrieved from https://www.ijisae.org/index.php/IJISAE/article/view/7522

Issue

Section

Research Article