Comparative Study of GANs and Stable Diffusion for High-Quality Image Generation Using FID and a Real-World Dataset

Kamireddy Rammohan Rao

Authors

Kamireddy Rammohan Rao

Keywords:

GANs; Stable Diffusion; StyleGAN2-ADA; Fréchet Inception Distance; FFHQ; image synthesis; diffusion models; real-world dataset.

Abstract

Generative image modeling Generative adversarial networks Generative adversarial networks (GANs) and Stable diffusion are two highly impactful families of contemporary image generators. Generative adversarial networks are developed out of adversarial learning, which evolved into diffusion-based image synthesis. The paper is a systematic comparison of StyleGAN2-ADA and a Stable Diffusion v1.5 pipeline that has been fine-tuned to produce portraits on the Flickr-Faces-HQ (FFHQ) domain. The draft protocol uses the public Kaggle mirror of FFHQ of 52,000 real face images at 512512 resolution, and downsampled to 256256 to enable a controlled comparison. The main evaluation measure is Fréchet Inception Distance (FID) and other measures of fidelity, diversity, and deployment efficiency are precision, recall and inference time. In the illustrative draft results below, Stable Diffusion has a lower FID of 6.91 + 0.21 than StyleGAN2-ADA of 8.74 + 0.32 and Stable Diffusion is much faster with a FID of 0.041 + 0.004 s per image than StyleGAN2-ADA The comparison shows a trade-off between distributional quality and generation efficiency which is viable. In order to maintain the academic integrity, the numerical values in this manuscript are deliberately considered as demonstrative values, which must be substituted by the experimental outputs of the author at the time of submission.

Downloads

Download data is not yet available.

References

Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” in Advances in Neural Information Processing Systems, 2014.

M. Mirza and S. Osindero, “Conditional generative adversarial nets,” arXiv preprint arXiv:1411.1784, 2014.

Radford, L. Metz, and S. Chintala, “Unsupervised representation learning with deep convolutional generative adversarial networks,” arXiv preprint arXiv:1511.06434, 2015.

T. Salimans, I. Goodfellow, W. Zaremba, V. Cheung, A. Radford, and X. Chen, “Improved techniques for training GANs,” in Advances in Neural Information Processing Systems, 2016.

M. Arjovsky, S. Chintala, and L. Bottou, “Wasserstein generative adversarial networks,” in Proceedings of the International Conference on Machine Learning, 2017.

Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. Courville, “Improved training of Wasserstein GANs,” in Advances in Neural Information Processing Systems, 2017.

M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, and S. Hochreiter, “GANs trained by a two time-scale update rule converge to a local Nash equilibrium,” in Advances in Neural Information Processing Systems, 2017.

T. Miyato, T. Kataoka, M. Koyama, and Y. Yoshida, “Spectral normalization for generative adversarial networks,” in International Conference on Learning Representations, 2018.

T. Karras, T. Aila, S. Laine, and J. Lehtinen, “Progressive growing of GANs for improved quality, stability, and variation,” in International Conference on Learning Representations, 2018.

Brock, J. Donahue, and K. Simonyan, “Large scale GAN training for high fidelity natural image synthesis,” in International Conference on Learning Representations, 2019.

T. Karras, S. Laine, and T. Aila, “A style-based generator architecture for generative adversarial networks,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019.

T. Karras, S. Laine, M. Aittala, J. Hellsten, J. Lehtinen, and T. Aila, “Analyzing and improving the image quality of StyleGAN,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020.

T. Karras, M. Aittala, J. Hellsten, S. Laine, J. Lehtinen, and T. Aila, “Training generative adversarial networks with limited data,” in Advances in Neural Information Processing Systems, 2020.

Ho, A. Jain, and P. Abbeel, “Denoising diffusion probabilistic models,” in Advances in Neural Information Processing Systems, 2020.

Y. Song, J. Sohl-Dickstein, D. P. Kingma, A. Kumar, S. Ermon, and B. Poole, “Score-based generative modeling through stochastic differential equations,” in International Conference on Learning Representations, 2021.

Song, C. Meng, and S. Ermon, “Denoising diffusion implicit models,” in International Conference on Learning Representations, 2021.

Q. Nichol and P. Dhariwal, “Improved denoising diffusion probabilistic models,” in Proceedings of the International Conference on Machine Learning, 2021.

P. Dhariwal and A. Nichol, “Diffusion models beat GANs on image synthesis,” in Advances in Neural Information Processing Systems, 2021.

R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “High-resolution image synthesis with latent diffusion models,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022.

Nichol, P. Dhariwal, A. Ramesh, P. Shyam, P. Mishkin, B. McGrew, I. Sutskever, and M. Chen, “GLIDE: Towards photorealistic image generation and editing with text-guided diffusion models,” in Proceedings of the International Conference on Machine Learning, 2022.

Saharia, W. Chan, S. Saxena, L. Li, J. Whang, E. Denton, S. K. S. Ghasemipour, B. Ayan, S. S. Mahdavi, R. G. Lopes, et al., “Photorealistic text-to-image diffusion models with deep language understanding,” in Advances in Neural Information Processing Systems, 2022.

Zhang, A. Rao, and M. Agrawala, “Adding conditional control to text-to-image diffusion models,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023.

S. M. Sajjadi, O. Bachem, M. Lucic, O. Bousquet, and S. Gelly, “Assessing generative models via precision and recall,” in Advances in Neural Information Processing Systems, 2018.

T. Kynkäänniemi, T. Karras, S. Laine, J. Lehtinen, and T. Aila, “Improved precision and recall metric for assessing generative models,” in Advances in Neural Information Processing Systems, 2019.

G. Parmar, R. Zhang, and J.-Y. Zhu, “On aliased resizing and surprising subtleties in GAN evaluation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022.

Z. Liu, P. Luo, X. Wang, and X. Tang, “Deep learning face attributes in the wild,” in Proceedings of the IEEE International Conference on Computer Vision, 2015.

F. Yu, A. Seff, Y. Zhang, S. Song, T. Funkhouser, and J. Xiao, “LSUN: Construction of a large-scale image dataset using deep learning with humans in the loop,” arXiv preprint arXiv:1506.03365, 2015.

W. Peebles and S. Xie, “Scalable diffusion models with transformers,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023.

NVLabs, “Flickr-Faces-HQ Dataset (FFHQ),” GitHub repository. [Online]. Available: https://github.com/NVlabs/ffhq-dataset

Kaggle, “Flickr-Faces-HQ Dataset (FFHQ),” dataset mirror. [Online]. Available: https://www.kaggle.com/datasets/arnaud58/flickrfaceshq-dataset-ffhq

Comparative Study of GANs and Stable Diffusion for High-Quality Image Generation Using FID and a Real-World Dataset

Authors

Keywords:

Abstract

Downloads

References

Downloads

Published

How to Cite

Issue

Section

License

Announcements

Information for Authors

ijisae

Information

Indexed By

Comparative Study of GANs and Stable Diffusion for High-Quality Image Generation Using FID and a Real-World Dataset

Authors

Keywords:

Abstract

Downloads

References

Downloads

Published

How to Cite

Issue

Section

License

Announcements

Information for Authors

Like, Subscribe and Share This Video

ijisae

Information

Indexed By