Test Data Management Using Synthetic Data Generation Techniques
Keywords:
Synthetic Data, Test Data Management, Software Testing, Data Privacy, Machine Learning, GDPR, Test Automation, Generative Models, Data GovernanceAbstract
As software systems grow more complex and data-driven, Test Data Management (TDM) has become increasingly central to maintaining quality assurance, meeting regulatory requirements, and supporting realistic performance testing. Traditional TDM approaches such as masking production data or generating test data manually are proving insufficient under the weight of modern demands, especially in light of stringent privacy regulations like GDPR and HIPAA. This study explores the use of synthetic data generation as a scalable, privacy-preserving alternative for TDM. Drawing on techniques from statistical modeling, rule-based synthesis, and generative machine learning, we evaluate the ability of synthetic data to emulate production-like conditions without compromising sensitive information. Several data generation strategies are assessed across varied testing environments to examine their impact on test coverage, compliance, and overall software quality. The experimental findings suggest that synthetic data can improve both efficiency and security in testing workflows while minimizing legal and operational risks. The paper concludes with practical recommendations for integrating synthetic data practices into enterprise-scale TDM pipelines, highlighting considerations for governance, automation, and long-term maintainability.
DOI: https://doi.org/10.17762/ijisae.v12i23s.7956
Downloads
References
Bertino, E., Sandhu, R., & Thuraisingham, B. (2005). Database security—concepts, approaches, and challenges. IEEE Transactions on Dependable and Secure Computing, 2(1), 2–19.
Sweeney, L. (2002). k-anonymity: A model for protecting privacy. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 10(05), 557–570.
El Emam, K., Dankar, F. K., Neisa, A., & Jonker, E. (2011). Evaluating the risk of re-identification of patients from hospital prescription records. Canadian Journal of Hospital Pharmacy, 64(5), 309–319.
Bindschaedler, V., Shokri, R., & Hubaux, J. P. (2017). Plausible deniability for privacy-preserving data synthesis. Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security (CCS), 546–560.
Choi, E., Biswal, S., Malin, B., Duke, J., Stewart, W. F., & Sun, J. (2017). Generating multi-label discrete patient records using generative adversarial networks. arXiv preprint, arXiv:1703.06490.
Patki, N., Wedge, R., & Veeramachaneni, K. (2016). The synthetic data vault. 2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA), 399–410.
Xu, L., Skoularidou, M., Cuesta-Infante, A., & Veeramachaneni, K. (2019). Modeling tabular data using conditional GAN. arXiv preprint, arXiv:1907.00503.
Ping, W., Peng, K., & Chen, J. (2017). Deep generative modeling for tabular data. arXiv preprint, arXiv:1706.03329.
Mohammed, N., Fung, B. C., & Debbabi, M. (2011). Anonymity meets game theory: Secure data integration with malicious participants. VLDB Journal, 20(4), 481–502.
Templ, M., Meindl, B., Kowarik, A., & Chen, S. (2017). Simulation of synthetic data for statistical disclosure control in R. Journal of Statistical Software, 84(10), 1–26.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Srikanth Kavuri

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
All papers should be submitted electronically. All submitted manuscripts must be original work that is not under submission at another journal or under consideration for publication in another form, such as a monograph or chapter of a book. Authors of submitted papers are obligated not to submit their paper for publication elsewhere until an editorial decision is rendered on their submission. Further, authors of accepted papers are prohibited from publishing the results in other publications that appear before the paper is published in the Journal unless they receive approval for doing so from the Editor-In-Chief.
IJISAE open access articles are licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. This license lets the audience to give appropriate credit, provide a link to the license, and indicate if changes were made and if they remix, transform, or build upon the material, they must distribute contributions under the same license as the original.


