We Handled a 200GB/day Log Volume Without Breaking the Bank: A Practical Framework for Cost-Effective Observability at Scale
Keywords:
Log Management, Observability, Telemetry Pipeline, Data Tiering, Log Filtering, Cost Optimization, Distributed Systems, Indexing, Retention Policy, Platform EngineeringAbstract
Scaling observability in a cost-effective way is one of the most pressing challenges facing modern engineering teams. As distributed systems grow in complexity and traffic, daily log volumes can reach hundreds of gigabytes, creating a compounding burden on storage infrastructure, indexing engines, and operational budgets. This article presents a practitioner-driven case study documenting the architectural decisions, tooling evaluations, and pipeline optimizations used to manage a sustained high-volume logging workload without exhausting financial resources or degrading system visibility. The strategies explored include log filtering and structured enrichment at the collection edge, dynamic data tiering, field-selective indexing, retention policy governance, and license-aware tooling decisions. In this article, they are contextualized against established literature in distributed systems observability, telemetry pipeline design, and cloud cost optimization. The leads are in a repeatable, layered framework applicable to platform engineers, site reliability engineers, and infrastructure architects responsible for managing large-scale telemetry in production environments.
Downloads
References
B. Burns, B. Grant, D. Oppenheimer, E. Brewer, and J. Wilkes, "Borg, Omega, and Kubernetes: Lessons learned from three container-management systems over a decade," ACM Queue, vol. 14, no. 1, pp. 70–93, Jan.–Feb. 2016. Available: https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/44843.pdf
Shekhar Jha, “Foundations of Observability Engineering,” in International Journal of Multidisciplinary on Science and Management, 2024. Available: https://www.ijmsm.org/volume1-issue3/IJMSM-V1I3P104.pdf
Neal Leavitt, "Complex-event processing poised for growth," IEEE Computer, vol. 42, no. 4, pp. 17–20, Apr. 2009. Available: https://www.leavcom.com/pdf/CEP.pdf
Mark D. Syer, et al., "Continuous validation of performance test suites," in Proc. Int. Conf. Performance Engineering (ICPE), Prague, Czech Republic, 2014, pp. 197–208. Available: http://www.cse.yorku.ca/~zmjiang/publications/asej2016_syer.pdf
Adrian Jackson, et al., "Architectures for High Performance Computing and Data Systems using Byte-Addressable Persistent Memory," arXiv:1805.10041v1 [cs.DC] 25 May 2018. Available: https://arxiv.org/pdf/1805.10041
Hyeontaek Lim, et al., "SILT: A memory-efficient, high-performance key-value store," in Proc. 23rd ACM Symp. Operating Systems Principles (SOSP), Cascais, Portugal, 2011, pp. 1–13. Available: https://www.pdl.cmu.edu/PDL-FTP/Storage/sosp11_silt.pdf
Benjamin H. Sigelman et al., "Dapper, a large-scale distributed systems tracing infrastructure," Google, Mountain View, CA, Tech. Rep. Google-TR-2010-003, Apr. 2010. Available: https://static.googleusercontent.com/media/research.google.com/en//archive/papers/dapper-2010-1.pdf
Pinjia He, et al., "An evaluation study on log parsing and its use in log mining," in Proc. 46th IEEE/IFIP Int. Conf. Dependable Systems and Networks (DSN), Toulouse, France, 2016, pp. 654–661. Available: https://pinjiahe.github.io/files/pdf/research/DSN16.pdf
Valerio Persico, et al., "Measuring network throughput in the cloud: The case of Amazon EC2," Computer Networks, vol. 93, pp. 408–422, Dec. 2015. Available: http://wpage.unina.it/valerio.persico/pubs/tput_cloud_AWS_comnet.pdf
Seyed Ali Mirheidari, et al., "Alert correlation algorithms: A survey and taxonomy," in Proc. Int. Conf. Cyberspace Safety and Security (CSS), Zhangjiajie, China, 2013, pp. 183–197. Available: https://arxiv.org/pdf/1811.00921
Justin Zobel and Alistair Moffat, "Inverted files for text search engines," ACM Computing Surveys, vol. 38, no. 2, pp. 6–es, Jul. 2006. Available: https://dmice.ohsu.edu/bedricks/courses/cs506-problem-solving-with-large-clusters/articles/week1/zobel_invertedindex.pdf
Matin Kleppmann, “Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems,” Sebastopol, CA: O'Reilly Media, 2017. Available: https://unidel.edu.ng/focelibrary/books/Designing%20Data-Intensive%20Applications%20The%20Big%20Ideas%20Behind%20Reliable,%20Scalable,%20and%20Maintainable%20Systems%20by%20Martin%20Kleppmann%20(z-lib.org).pdf
Royal Borough of Kingston upon Thames, "Information Security and Governance Policy and Framework," Information Systems Frontiers, vol. 21, no. 4, pp. 935–949, Aug. 2019. Available: https://www.kingston.gov.uk/sites/default/files/2025-05/Information_Security_and_Governance_Policy_and_Framework___RBK__Approved_.pdf
Wei Xu, et al., "Detecting large-scale system problems by mining console logs," in Proc. 22nd ACM Symp. Operating Systems Principles (SOSP), Big Sky, MT, 2009, pp. 117–132. Available: https://www.sigops.org/s/conferences/sosp/2009/papers/xu-sosp09.pdf
Min Du, et al., "DeepLog: Anomaly detection and diagnosis from system logs through deep learning," in Proc. 2017 ACM SIGSAC Conf. Computer and Communications Security (CCS), Dallas, TX, 2017, pp. 1285–1298. Available: https://users.cs.utah.edu/~lifeifei/papers/deeplog.pdf
Downloads
Published
How to Cite
Issue
Section
License

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
All papers should be submitted electronically. All submitted manuscripts must be original work that is not under submission at another journal or under consideration for publication in another form, such as a monograph or chapter of a book. Authors of submitted papers are obligated not to submit their paper for publication elsewhere until an editorial decision is rendered on their submission. Further, authors of accepted papers are prohibited from publishing the results in other publications that appear before the paper is published in the Journal unless they receive approval for doing so from the Editor-In-Chief.
IJISAE open access articles are licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. This license lets the audience to give appropriate credit, provide a link to the license, and indicate if changes were made and if they remix, transform, or build upon the material, they must distribute contributions under the same license as the original.


