Multi-Agent Orchestration for Autonomous Data Pipeline Governance: Schema Evolution, Anomaly Detection, and Incident Remediation in Cloud-Native Data Platforms
Keywords:
Multi-agent systems; Data pipeline governance; Schema evolution; Anomaly detection; Autonomous remediationAbstract
The rapid adoption of cloud-native data platforms has enabled organizations to scale data processing pipelines to unprecedented levels. However, governance mechanisms—particularly around schema evolution, anomaly detection, and incident remediation—remain largely manual, leading to increased operational risk and degraded data reliability. This paper proposes a novel multi-agent orchestration framework for autonomous data pipeline governance. The system leverages specialized agents for schema monitoring, anomaly detection, service-level agreement (SLA) tracking, and incident remediation, coordinated through a shared state and communication protocol. Evaluated against production-like workloads, the framework demonstrates significant improvements in detection latency, mean time to resolution (MTTR), and system reliability. The results suggest that agentic AI can address critical governance gaps in modern data infrastructures while maintaining safety through controlled autonomy.
DOI: https://doi.org/10.17762/ijisae.v14i1.8344
Downloads
References
Akidau, T., Chernyak, S., & Lax, R. (2015). Dataflow model: A practical approach to balancing correctness, latency, and cost in massive-scale, unbounded, out-of-order data processing. Proceedings of the VLDB Endowment, 8(12), 1792–1803.
Amodei, D., Olah, C., Steinhardt, J., Christiano, P., Schulman, J., & Mané, D. (2016). Concrete problems in AI safety.
Bernstein, P. A. (2003). Applying model management to classical meta data problems. Proceedings of the Conference on Innovative Data Systems Research (CIDR).
Chandola, V., Banerjee, A., & Kumar, V. (2009). Anomaly detection: A survey. ACM Computing Surveys, 41(3), 1–58.
Chen, L., Xu, J., Zhang, Z., & Guo, X. (2020). AIOps: Real-world challenges and research innovations. Proceedings of the IEEE International Conference on Cloud Engineering (IC2E).
Dang, Y., Wu, Q., Zhang, J., Zhang, J., & Xie, T. (2019). Characterizing and detecting performance bugs for cloud systems. Proceedings of the ACM Symposium on Cloud Computing (SoCC).
Halevy, A., Rajaraman, A., & Ordille, J. (2006). Data integration: The teenage years. Proceedings of the VLDB Endowment, 9(10), 1–9.
Jennings, N. R. (2000). On agent-based software engineering. Artificial Intelligence, 117(2), 277–296.
Kleppmann, M. (2017). Designing data-intensive applications: The big ideas behind reliable, scalable, and maintainable systems. O’Reilly Media.
Kraska, T., Beutel, A., Chi, E. H., Dean, J., & Polyzotis, N. (2018). The case for learned index structures. Proceedings of the ACM SIGMOD International Conference on Management of Data, 489–504.
Kreps, J., Narkhede, N., & Rao, J. (2011). Kafka: A distributed messaging system for log processing. Proceedings of the NetDB Workshop.
Liu, F. T., Ting, K. M., & Zhou, Z.-H. (2008). Isolation forest. Proceedings of the IEEE International Conference on Data Mining (ICDM), 413–422.
Marcus, R., Negi, P., Mao, H., Tatbul, N., Alizadeh, M., & Kraska, T. (2019). Neo: A learned query optimizer. Proceedings of the VLDB Endowment, 12(11), 1705–1718.
Nargesian, F., Zhu, E., Pu, K. Q., & Miller, R. J. (2020). Table union search on open data. Proceedings of the VLDB Endowment, 11(7), 813–825.
Pavlo, A., Angulo, G., Arulraj, J., Lin, H., Lin, J., Ma, L., & Stonebraker, M. (2017). Self-driving database management systems. Proceedings of the Conference on Innovative Data Systems Research (CIDR).
Pearl, J. (2009). Causality: Models, reasoning, and inference (2nd ed.). Cambridge University Press.
Redman, T. C. (2018). If your data is bad, your machine learning tools are useless. Harvard Business Review.
Russell, S., & Norvig, P. (2021). Artificial intelligence: A modern approach (4th ed.). Pearson.
Schelter, S., Böse, J.-H., Kirschnick, J., Klein, T., & Seufert, S. (2018). Automatically tracking metadata and provenance of machine learning experiments. Proceedings of the Workshop on Data Management for End-to-End Machine Learning.
Stone, P., & Veloso, M. (2000). Multiagent systems: A survey from a machine learning perspective. Autonomous Robots, 8(3), 345–383.
Wooldridge, M. (2009). An introduction to multiagent systems (2nd ed.). Wiley.
Zaharia, M., Das, T., Li, H., Hunter, T., Shenker, S., & Stoica, I. (2013). Discretized streams: Fault-tolerant streaming computation at scale. Proceedings of the ACM Symposium on Operating Systems Principles (SOSP).
Apache Software Foundation. (n.d.). Apache Avro™ 1.11.0 documentation.
Downloads
Published
How to Cite
Issue
Section
License

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
All papers should be submitted electronically. All submitted manuscripts must be original work that is not under submission at another journal or under consideration for publication in another form, such as a monograph or chapter of a book. Authors of submitted papers are obligated not to submit their paper for publication elsewhere until an editorial decision is rendered on their submission. Further, authors of accepted papers are prohibited from publishing the results in other publications that appear before the paper is published in the Journal unless they receive approval for doing so from the Editor-In-Chief.
IJISAE open access articles are licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. This license lets the audience to give appropriate credit, provide a link to the license, and indicate if changes were made and if they remix, transform, or build upon the material, they must distribute contributions under the same license as the original.


