The Role of AI in Strengthening Cybersecurity for Data Pipelines and ETL Systems

Authors

  • Manohar Reddy Sokkula

Keywords:

ETL cybersecurity, anomaly detection, Autoencoder-LSTM, deep learning, intrusion detection, data pipeline security.

Abstract

In the era of big data and cloud-native architectures, Extract, Transform, Load (ETL) systems and data pipelines form the core of enterprise-level data processing and decision-making. However, their growing complexity, distributed nature, and continuous data movement have also made them prime targets for sophisticated cyberattacks. Traditional security methods such as firewalls, rule-based monitoring, and static encryption often fall short in identifying evolving threats within these dynamic environments. This research explores the integration of Artificial Intelligence (AI), particularly deep learning models, to enhance the cybersecurity posture of ETL systems. The study presents a hybrid Autoencoder-LSTM-based anomaly detection model designed to monitor and secure ETL workflows in real-time. The model is trained using a combination of real-world network intrusion datasets such as CICIDS2018 and UNSW-NB15, along with synthetic ETL telemetry logs generated through tools like Apache NiFi and Talend. Before model training, data preprocessing using Min-Max normalization ensures consistency and efficient learning across diverse feature sets. Additionally, visual tools such as reconstruction error graphs, threshold-based detection plots, correlation heatmaps, and log activity timelines were used to interpret model outputs and highlight patterns of anomalous behavior. The results validate the model’s applicability for detecting a wide range of cyber threats, including slow-paced attacks, insider threats, and data injections within ETL processes. This paper concludes that AI-driven techniques, particularly those leveraging temporal and contextual data, offer powerful capabilities to secure ETL systems beyond the limitations of traditional methods. Future research will focus on integrating reinforcement learning for dynamic policy updates, real-time deployment in production pipelines, and using federated learning for decentralized data environments. This approach promises not only enhanced security but also improved operational resilience and regulatory compliance.

Downloads

Download data is not yet available.

References

S. Mokhtari, A. Abbaspour, K. K. Yen, and A. Sargolzaei, “A Machine Learning Approach for Anomaly Detection in Industrial Control Systems Based on Measurement Data,” Electronics, vol. 10, no. 4, p. 407, Feb. 2021, doi: 10.3390/electronics10040407.

M. Qasim and E. Verdu, “Video anomaly detection system using deep convolutional and recurrent models,” Results in Engineering, vol. 18, p. 101026, Jun. 2023, doi: 10.1016/j.rineng.2023.101026.

W. Marfo, D. K. Tosh, and S. V. Moore, “Network Anomaly Detection Using Federated Learning,” in MILCOM 2022 - 2022 IEEE Military Communications Conference (MILCOM), Rockville, MD, USA: IEEE, Nov. 2022, pp. 484–489. doi: 10.1109/MILCOM55135.2022.10017793.

H. W. Oleiwi, D. N. Mhawi, and H. Al-Raweshidy, “MLTs-ADCNs: Machine Learning Techniques for Anomaly Detection in Communication Networks,” IEEE Access, vol. 10, pp. 91006–91017, Aug. 2022, doi: 10.1109/ACCESS.2022.3201869.

H. Matsuo et al., “Diagnostic accuracy of deep-learning with anomaly detection for a small amount of imbalanced data: discriminating malignant parotid tumors in MRI,” Sci Rep, vol. 10, no. 1, p. 19388, Nov. 2020, doi: 10.1038/s41598-020-76389-4.

H. Son, Y. Jang, S.-E. Kim, D. Kim, and J.-W. Park, “Deep Learning-Based Anomaly Detection to Classify Inaccurate Data and Damaged Condition of a Cable-Stayed Bridge,” IEEE Access, vol. 9, pp. 124549–124559, Jan. 2021, doi: 10.1109/ACCESS.2021.3100419.

M. Said Elsayed, N.-A. Le-Khac, S. Dev, and A. D. Jurcut, “Network Anomaly Detection Using LSTM Based Autoencoder,” in Proceedings of the 16th ACM Symposium on QoS and Security for Wireless and Mobile Networks, Alicante Spain: ACM, Nov. 2020, pp. 37–45. doi: 10.1145/3416013.3426457.

S. T. Ikram et al., “Anomaly Detection Using XGBoost Ensemble of Deep Neural Network Models,” Cybernetics and Information Technologies, vol. 21, no. 3, pp. 175–188, Sep. 2021, doi: 10.2478/cait-2021-0037.

M. K. Hooshmand and D. Hosahalli, “Network anomaly detection using deep learning techniques,” CAAI Trans on Intel Tech, vol. 7, no. 2, pp. 228–243, Jun. 2022, doi: 10.1049/cit2.12078.

K. Al Jallad, M. Aljnidi, and M. S. Desouki, “Anomaly detection optimization using big data and deep learning to reduce false-positive,” J Big Data, vol. 7, no. 1, p. 68, Dec. 2020, doi: 10.1186/s40537-020-00346-1.

R. Kumaran, “ETL Techniques for Structured and Unstructured Data,” SSRN Journal, Jan. 2024, doi: 10.2139/ssrn.5143370.

P. Cichonski, T. Millar, T. Grance, and K. Scarfone, “Computer Security Incident Handling Guide : Recommendations of the National Institute of Standards and Technology,” National Institute of Standards and Technology, NIST SP 800-61r2, Aug. 2023. doi: 10.6028/NIST.SP.800-61r2.

M. F. Ansari, R. Sandilya, M. Javed, and D. Doermann, “ETLNet: An Efficient TCN-BiLSTM Network for Road Anomaly Detection Using Smartphone Sensors,” Jun. 2024, arXiv. doi: 10.48550/ARXIV.2412.04990.

D. Seenivasan, “AI Driven Enhancement of ETL Workflows for Scalable and Efficient Cloud Data Engineering,” int. jour. eng. com. sci, vol. 13, no. 06, pp. 26837–26848, Jun. 2024, doi: 10.18535/ijecs.v13i06.4824.

Saswata Dey, Writuraj Sarma, and Sundar Tiwari, “Deep learning applications for real-time cybersecurity threat analysis in distributed cloud systems,” World J. Adv. Res. Rev., vol. 17, no. 3, pp. 1044–1058, Mar. 2023, doi: 10.30574/wjarr.2023.17.3.0288.

N. Joshi, “Optimizing Real-Time ETL Pipelines Using Machine Learning Techniques,” Aug. 2024, SSRN. doi: 10.2139/ssrn.5054767.

O. Hamza, A. Collins, A. Eweje, and G. O. Babatunde, “Advancing Data Migration and Virtualization Techniques: ETL-Driven Strategies for Oracle BI and Salesforce Integration in Agile Environments,” IJMRGE, vol. 5, no. 1, pp. 1100–1118, Jan. 2024, doi: 10.54660/.IJMRGE.2024.5.1.1100-1118.

S. Hiremath et al., “A New Approach to Data Analysis Using Machine Learning for Cybersecurity,” BDCC, vol. 7, no. 4, p. 176, Nov. 2023, doi: 10.3390/bdcc7040176.

S. Akcay, D. Ameln, A. Vaidya, B. Lakshmanan, N. Ahuja, and U. Genc, “Anomalib: A Deep Learning Library for Anomaly Detection,” in 2022 IEEE International Conference on Image Processing (ICIP), Bordeaux, France: IEEE, Oct. 2022, pp. 1706–1710. doi: 10.1109/ICIP46576.2022.9897283.

S. S. Aljameel et al., “An Anomaly Detection Model for Oil and Gas Pipelines Using Machine Learning,” Computation, vol. 10, no. 8, p. 138, Aug. 2022, doi: 10.3390/computation10080138.

Downloads

Published

19.04.2025

How to Cite

Manohar Reddy Sokkula. (2025). The Role of AI in Strengthening Cybersecurity for Data Pipelines and ETL Systems. International Journal of Intelligent Systems and Applications in Engineering, 13(1), 357 –. Retrieved from https://www.ijisae.org/index.php/IJISAE/article/view/7725

Issue

Section

Research Article