Observability in Stateful Workloads: Strategies for Monitoring Persistent Services in Dynamic Cloud Environments

Authors

  • Sunil Agarwal

Keywords:

Cloud, Observability, Workloads, Dynamic

Abstract

Stateful workloads are central elements of any contemporary cloud-native design, but their permanence brings them special difficulties to observation. It introduces an elaborate system of tracking such workloads in terms of metric matching, failure propagation modeling, and cross-layer tracing continuity, outlined in this paper. In order to evaluate how useful the telemetry overhead, the anomaly correlation, and the dashboard-based diagnostics are, we conducted experiments spread over PostgreSQL, Kafka and Redis deployments. The conventional observability methods have proved within our means, to explain how the state changes and long-term associations. To minimize the time of diagnosis as well as enhance its reliability, we suggest optimized approaches to the reduction of metrics and unified dashboards. These plans enable DevOps teams to have scalable resilient operation tools.

Downloads

Download data is not yet available.

References

Niedermaier, S., Koetter, F., Freymann, A., & Wagner, S. (2019). On Observability and Monitoring of Distributed Systems – an industry interview study. In Lecture notes in computer science (pp. 36–52). https://doi.org/10.1007/978-3-030-33702-5_3

Thalheim, J., Rodrigues, A., Akkus, I. E., Bhatotia, P., Chen, R., Viswanath, B., Jiao, L., & Fetzer, C. (2017). Sieve: Actionable Insights from Monitored Metrics in Microservices. arXiv (Cornell University). https://doi.org/10.48550/arxiv.1709.06686

Kratzke, N. (2022). Cloud-Native Observability: The Many-Faceted Benefits of Structured and Unified Logging—A Multi-Case Study. Future Internet, 14(10), 274. https://doi.org/10.3390/fi14100274

Li, B., Peng, X., Xiang, Q., Wang, H., Xie, T., Sun, J., & Liu, X. (2021). Enjoy your observability: an industrial survey of microservice tracing and analysis. Empirical Software Engineering, 27(1). https://doi.org/10.1007/s10664-021-10063-9

Cao, C., Blaise, A., Verwer, S., & Rebecchi, F. (2022). Learning state machines to monitor and detect anomalies on a kubernetes cluster. Proceedings of the 17th International Conference on Availability, Reliability and Security, 1–9. https://doi.org/10.1145/3538969.3543810

Shankar, S., & Parameswaran, A. G. (2022). Towards observability for production machine learning pipelines. Proceedings of the VLDB Endowment, 15(13), 4015–4022. https://doi.org/10.14778/3565838.3565853

Saha, A., Agarwal, P., Ghosh, S., Gantayat, N., & Sindhgatta, R. (2024). Towards Business Process Observability. Towards Business Process Observability, 257–265. https://doi.org/10.1145/3632410.3632435

Saminathan, M., Bhattacharyya, S., & Bairi, A. R. (2021, June 17). End-to-End observability in Cloud-Native systems: integrating distributed tracing and Real-Time analytics. Journal of Science & Technology. https://thesciencebrigade.com/jst/article/view/566

De Moraes Rossetto, A. G., Noetzold, D., Silva, L. A., & Leithardt, V. R. Q. (2024). Enhancing Monitoring Performance: A Microservices Approach to Monitoring with Spyware Techniques and Prediction Models. Sensors, 24(13), 4212. https://doi.org/10.3390/s24134212

Lee, C., Yang, T., Chen, Z., Su, Y., & Lyu, M. R. (2023). EADRO: an End-to-End troubleshooting framework for microservices on multi-source data. arXiv (Cornell University). https://doi.org/10.48550/arxiv.2302.05092

Downloads

Published

05.09.2025

How to Cite

Sunil Agarwal. (2025). Observability in Stateful Workloads: Strategies for Monitoring Persistent Services in Dynamic Cloud Environments. International Journal of Intelligent Systems and Applications in Engineering, 13(1s), 371 –. Retrieved from https://www.ijisae.org/index.php/IJISAE/article/view/7859

Issue

Section

Research Article