Multi-Agent Orchestration for Autonomous Data Pipeline Governance: Schema Evolution, Anomaly Detection, and Incident Remediation in Cloud-Native Data Platforms

Praveen Kumar Dora Mallareddi

Authors

Praveen Kumar Dora Mallareddi

Keywords:

Multi-agent systems; Data pipeline governance; Schema evolution; Anomaly detection; Autonomous remediation

Abstract

The rapid adoption of cloud-native data platforms has enabled organizations to scale data processing pipelines to unprecedented levels. However, governance mechanisms—particularly around schema evolution, anomaly detection, and incident remediation—remain largely manual, leading to increased operational risk and degraded data reliability. This paper proposes a novel multi-agent orchestration framework for autonomous data pipeline governance. The system leverages specialized agents for schema monitoring, anomaly detection, service-level agreement (SLA) tracking, and incident remediation, coordinated through a shared state and communication protocol. Evaluated against production-like workloads, the framework demonstrates significant improvements in detection latency, mean time to resolution (MTTR), and system reliability. The results suggest that agentic AI can address critical governance gaps in modern data infrastructures while maintaining safety through controlled autonomy.

DOI: https://doi.org/10.17762/ijisae.v14i1.8344

Downloads

Download data is not yet available.

References

Akidau, T., Chernyak, S., & Lax, R. (2015). Dataflow model: A practical approach to balancing correctness, latency, and cost in massive-scale, unbounded, out-of-order data processing. Proceedings of the VLDB Endowment, 8(12), 1792–1803.

Amodei, D., Olah, C., Steinhardt, J., Christiano, P., Schulman, J., & Mané, D. (2016). Concrete problems in AI safety.

Bernstein, P. A. (2003). Applying model management to classical meta data problems. Proceedings of the Conference on Innovative Data Systems Research (CIDR).

Chandola, V., Banerjee, A., & Kumar, V. (2009). Anomaly detection: A survey. ACM Computing Surveys, 41(3), 1–58.

Chen, L., Xu, J., Zhang, Z., & Guo, X. (2020). AIOps: Real-world challenges and research innovations. Proceedings of the IEEE International Conference on Cloud Engineering (IC2E).

Dang, Y., Wu, Q., Zhang, J., Zhang, J., & Xie, T. (2019). Characterizing and detecting performance bugs for cloud systems. Proceedings of the ACM Symposium on Cloud Computing (SoCC).

Halevy, A., Rajaraman, A., & Ordille, J. (2006). Data integration: The teenage years. Proceedings of the VLDB Endowment, 9(10), 1–9.

Jennings, N. R. (2000). On agent-based software engineering. Artificial Intelligence, 117(2), 277–296.

Kleppmann, M. (2017). Designing data-intensive applications: The big ideas behind reliable, scalable, and maintainable systems. O’Reilly Media.

Kraska, T., Beutel, A., Chi, E. H., Dean, J., & Polyzotis, N. (2018). The case for learned index structures. Proceedings of the ACM SIGMOD International Conference on Management of Data, 489–504.

Kreps, J., Narkhede, N., & Rao, J. (2011). Kafka: A distributed messaging system for log processing. Proceedings of the NetDB Workshop.

Liu, F. T., Ting, K. M., & Zhou, Z.-H. (2008). Isolation forest. Proceedings of the IEEE International Conference on Data Mining (ICDM), 413–422.

Marcus, R., Negi, P., Mao, H., Tatbul, N., Alizadeh, M., & Kraska, T. (2019). Neo: A learned query optimizer. Proceedings of the VLDB Endowment, 12(11), 1705–1718.

Nargesian, F., Zhu, E., Pu, K. Q., & Miller, R. J. (2020). Table union search on open data. Proceedings of the VLDB Endowment, 11(7), 813–825.

Pavlo, A., Angulo, G., Arulraj, J., Lin, H., Lin, J., Ma, L., & Stonebraker, M. (2017). Self-driving database management systems. Proceedings of the Conference on Innovative Data Systems Research (CIDR).

Pearl, J. (2009). Causality: Models, reasoning, and inference (2nd ed.). Cambridge University Press.

Redman, T. C. (2018). If your data is bad, your machine learning tools are useless. Harvard Business Review.

Russell, S., & Norvig, P. (2021). Artificial intelligence: A modern approach (4th ed.). Pearson.

Schelter, S., Böse, J.-H., Kirschnick, J., Klein, T., & Seufert, S. (2018). Automatically tracking metadata and provenance of machine learning experiments. Proceedings of the Workshop on Data Management for End-to-End Machine Learning.

Stone, P., & Veloso, M. (2000). Multiagent systems: A survey from a machine learning perspective. Autonomous Robots, 8(3), 345–383.

Wooldridge, M. (2009). An introduction to multiagent systems (2nd ed.). Wiley.

Zaharia, M., Das, T., Li, H., Hunter, T., Shenker, S., & Stoica, I. (2013). Discretized streams: Fault-tolerant streaming computation at scale. Proceedings of the ACM Symposium on Operating Systems Principles (SOSP).

Apache Software Foundation. (n.d.). Apache Avro™ 1.11.0 documentation.

Multi-Agent Orchestration for Autonomous Data Pipeline Governance: Schema Evolution, Anomaly Detection, and Incident Remediation in Cloud-Native Data Platforms

Authors

Keywords:

Abstract

Downloads

References

Downloads

Published

How to Cite

Issue

Section

License

Announcements

Information for Authors

ijisae

Information

Indexed By

Multi-Agent Orchestration for Autonomous Data Pipeline Governance: Schema Evolution, Anomaly Detection, and Incident Remediation in Cloud-Native Data Platforms

Authors

Keywords:

Abstract

Downloads

References

Downloads

Published

How to Cite

Issue

Section

License

Announcements

Information for Authors

Like, Subscribe and Share This Video

ijisae

Information

Indexed By