Multi-Stage Hybrid Text-to-Image Generation Models - Egyptian Knowledge Bank

252465

Multi-Stage Hybrid Text-to-Image Generation Models

Article

Last updated: 03 Jan 2025

Subjects

Abstract

Generative Adversarial Networks (GANs) have proven their outstanding potential in creating realistic images that can't differentiate between them and the real images, but text-to-image (conditional generation) still faces some challenges. In this paper, we propose a new model called (AttnDM GAN) stands for Attentional Dynamic Memory Generative Adversarial Memory, which seeks to generate realistic output semantically harmonious with an input text description. AttnDM GAN is a three-stage hybrid model of the Attentional Generative Adversarial Network (AttnGAN) and the Dynamic Memory Generative Adversarial Network (DM-GAN), the 1st stage is called the Initial Image Generation, in which low resolution 64x64 images are generated conditioned on the encoded input textual description. The 2nd stage is the Attention Image Generation stage that generates higher-resolution images 128x128, and the last stage is Dynamic Memory Based Image Refinement that refines the images to 256x256 resolution images. We conduct an experiment on our model the AttnDM GAN using the Caltech-UCSD Birds 200 dataset and evaluate it using the Frechet Inception Distance (FID) with a value of 19.78. We also proposed another model called Dynamic Memory Attention Generative Adversarial Networks (DMAttn-GAN) which considered a variation of the AttnDM GAN model, where the second and the third stages are switched together, its FID value is 17.04.

DOI

10.21608/ijicis.2022.117124.1157

Keywords

Generative Adversarial Networks, Image Generation, Text-to-image, Computer Vision, Conditional Image Synthesis

Authors

View Authors

First Name

Razan

Last Name

Bayoumi

MiddleName

Affiliation

Computer Science Department, Faculty of Computer and Information Sciences, Ain Shams University, Cairo, Egypt

Email

razan.bayoumi@cis.asu.edu.eg

City

Heliopolis

Orcid

0000-0001-6924-4124

View Authors

First Name

Marco

Last Name

Alfonse

MiddleName

Affiliation

Computer Science Department, Faculty of Computer and Information Sciences, Ain Shams University, Cairo, Egypt

Email

marco_alfonse@cis.asu.edu.eg

City

cairo

Orcid

0000-0003-0722-3218

View Authors

First Name

Abdel-Badeeh

Last Name

Salem

MiddleName

Affiliation

Computer Sciece Department, Faculty of Computer and Information Sciences, Ain Shams University

Email

absalem@cis.asu.edu.eg

City

Orcid

0000-0001-5013-4339

Volume

Article Issue

Related Issue

36337

Issue Date

2022-08-01

Receive Date

2022-01-18

Publish Date

2022-08-01

Page Start

Page End

Print ISSN

1687-109X

Online ISSN

2535-1710

Article File

IJICIS_Volume 22_Issue 3_Pages 82-91.pdf

PDF . 1.3MB

Link

https://ijicis.journals.ekb.eg/article_252465.html

Detail API

https://ijicis.journals.ekb.eg/service?article_code=252465

Order

Type

Original Article

Type Code

494

Publication Type

Journal

Publication Title

International Journal of Intelligent Computing and Information Sciences

Publication Link

https://ijicis.journals.ekb.eg/

MainTitle

Multi-Stage Hybrid Text-to-Image Generation Models

Details

Type

Article

Created At

22 Jan 2023

Subjects

Tags

Abstract

DOI

Keywords

Authors

First Name

Last Name

MiddleName

Affiliation

Email

City

Orcid

First Name

Last Name

MiddleName

Affiliation

Email

City

Orcid

First Name

Last Name

MiddleName

Affiliation

Email

City

Orcid

Volume

Article Issue

Related Issue

Issue Date

Receive Date

Publish Date

Page Start

Page End

Print ISSN

Online ISSN

Article File

IJICIS_Volume 22_Issue 3_Pages 82-91.pdf

Link

Detail API

Order

Type

Type Code

Publication Type

Publication Title

Publication Link

MainTitle