Test Data Management Using Synthetic Data Generation Techniques

Authors

  • Srikanth Kavuri

Keywords:

Synthetic Data, Test Data Management, Software Testing, Data Privacy, Machine Learning, GDPR, Test Automation, Generative Models, Data Governance

Abstract

As software systems grow more complex and data-driven, Test Data Management (TDM) has become increasingly central to maintaining quality assurance, meeting regulatory requirements, and supporting realistic performance testing. Traditional TDM approaches such as masking production data or generating test data manually are proving insufficient under the weight of modern demands, especially in light of stringent privacy regulations like GDPR and HIPAA. This study explores the use of synthetic data generation as a scalable, privacy-preserving alternative for TDM. Drawing on techniques from statistical modeling, rule-based synthesis, and generative machine learning, we evaluate the ability of synthetic data to emulate production-like conditions without compromising sensitive information. Several data generation strategies are assessed across varied testing environments to examine their impact on test coverage, compliance, and overall software quality. The experimental findings suggest that synthetic data can improve both efficiency and security in testing workflows while minimizing legal and operational risks. The paper concludes with practical recommendations for integrating synthetic data practices into enterprise-scale TDM pipelines, highlighting considerations for governance, automation, and long-term maintainability.

DOI: https://doi.org/10.17762/ijisae.v12i23s.7956

Downloads

Download data is not yet available.

References

Bertino, E., Sandhu, R., & Thuraisingham, B. (2005). Database security—concepts, approaches, and challenges. IEEE Transactions on Dependable and Secure Computing, 2(1), 2–19.

Sweeney, L. (2002). k-anonymity: A model for protecting privacy. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 10(05), 557–570.

El Emam, K., Dankar, F. K., Neisa, A., & Jonker, E. (2011). Evaluating the risk of re-identification of patients from hospital prescription records. Canadian Journal of Hospital Pharmacy, 64(5), 309–319.

Bindschaedler, V., Shokri, R., & Hubaux, J. P. (2017). Plausible deniability for privacy-preserving data synthesis. Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security (CCS), 546–560.

Choi, E., Biswal, S., Malin, B., Duke, J., Stewart, W. F., & Sun, J. (2017). Generating multi-label discrete patient records using generative adversarial networks. arXiv preprint, arXiv:1703.06490.

Patki, N., Wedge, R., & Veeramachaneni, K. (2016). The synthetic data vault. 2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA), 399–410.

Xu, L., Skoularidou, M., Cuesta-Infante, A., & Veeramachaneni, K. (2019). Modeling tabular data using conditional GAN. arXiv preprint, arXiv:1907.00503.

Ping, W., Peng, K., & Chen, J. (2017). Deep generative modeling for tabular data. arXiv preprint, arXiv:1706.03329.

Mohammed, N., Fung, B. C., & Debbabi, M. (2011). Anonymity meets game theory: Secure data integration with malicious participants. VLDB Journal, 20(4), 481–502.

Templ, M., Meindl, B., Kowarik, A., & Chen, S. (2017). Simulation of synthetic data for statistical disclosure control in R. Journal of Statistical Software, 84(10), 1–26.

Downloads

Published

31.12.2024

How to Cite

Srikanth Kavuri. (2024). Test Data Management Using Synthetic Data Generation Techniques. International Journal of Intelligent Systems and Applications in Engineering, 12(23s), 3910 –. Retrieved from https://www.ijisae.org/index.php/IJISAE/article/view/7956

Issue

Section

Research Article