Please use this identifier to cite or link to this item: http://bura.brunel.ac.uk/handle/2438/25013
Title: Development and deployment of a generative model-based framework for text to photorealistic image generation
Authors: Pande, S
Chouhan, S
Sonavane, R
Walambe, R
Ghinea, G
Kotecha, K
Keywords: Text-to-image;Text-to-face;Face synthesis;GAN;AttnGAN
Issue Date: 23-Aug-2021
Publisher: Elsevier BV
Citation: Pande, S. et al. (2021) ‘Development and deployment of a generative model-based framework for text to photorealistic image generation’, Neurocomputing., 463 pp. 1 - 16. doi:10.1016/j.neucom.2021.08.055.
Abstract: The task of generating photorealistic images from their textual descriptions is quite challenging. Most existing tasks in this domain are focused on the generation of images such as flowers or birds from their textual description, especially for validating the generative models based on Generative Adversarial Network (GAN) variants and for recreational purposes. However, such work is limited in the domain of photorealistic face image generation and the results obtained have not been satisfactory. This is partly due to the absence of concrete data in this domain and a large number of highly specific features/attributes involved in face generation compared to birds or flowers. In this paper, we propose an Attention Generative Adversarial Network (AttnGAN) for a fine-grained text-to-face generation that enables attention-driven multi-stage refinement by employing Deep Attentional Multimodal Similarity Model (DAMSM). Through extensive experimentation on the CelebA dataset, we evaluated our approach using the Frechet Inception Distance (FID) score. The output files for the Face2Text Dataset are also compare with that of the T2F Github project. According to the visual comparison, AttnGAN generated higher-quality images than T2F. Additionally, we compare our methodology with existing approaches with a specific focus on CelebA dataset and demonstrate that our approach generates a better FID score facilitating more realistic image generation. The application of such an approach can be found in criminal identification, where faces are generated from the textual description from an eyewitness. Such a method can bring consistency and eliminate the individual biases of an artist drawing the faces from the description given by the eyewitness. Finally, we discuss the deployment of the models on a Raspberry Pi to test how effective the models would be on a standalone device to facilitate portability and timely task completion.
URI: http://bura.brunel.ac.uk/handle/2438/25013
DOI: http://dx.doi.org/10.1016/j.neucom.2021.08.055
ISSN: 0925-2312
1872-8286
Appears in Collections:Dept of Computer Science Embargoed Research Papers

Files in This Item:
File Description SizeFormat 
FullText.pdfEmbargoed until 23/08/20234.04 MBAdobe PDFView/Open


Items in BURA are protected by copyright, with all rights reserved, unless otherwise indicated.