Beta
334586

A Survey on Visual Question Answering Methodologies

Article

Last updated: 24 Dec 2024

Subjects

-

Tags

-

Abstract

Understanding visual question-answering (VQA) will be essential for many human tasks. However, it poses significant obstacles at the core of artificial intelligence as a multimodal system. This article provides a summary of the challenges in multimodal architectures that have lately been demonstrated by the enormous rise in research. We need to keep our eyes on these challenges to enhance the design of visual question-answering systems. Then we will introduce the recent rapid developments in methods for answering visual questions with images. Providing the right response to a natural language question concerning an input image, it is a difficult multi-modal activity as we don't need only to extract features from both modal (text and image) but also getting attention on relation between them. Many deep learning researchers are drawn to it because of their outstanding contributions to text, voice, and vision technologies (images and videos) in fields like welfare, robotics, security, and medicine, etc.

DOI

10.21608/ejle.2024.244720.1058

Keywords

Deep learning, Visual question answering, Multimodal challenges, VQA methodologies

Authors

First Name

Aya

Last Name

Al-Zoghby

MiddleName

M.

Affiliation

Department of Computer Science, Faculty of Computers and Information Science Damietta University Damietta, Egypt

Email

aya_el_zoghby@du.edu.eg

City

Damietta

Orcid

-

First Name

Aya

Last Name

Saleh

MiddleName

Salah

Affiliation

Computer Science,Computer and Artificial Intelligence, Damietta University, New Damietta, Damietta

Email

aya.saleh92@gmail.com

City

Damietta

Orcid

0009-0000-8970-0895

First Name

wael

Last Name

awad

MiddleName

abd elkader

Affiliation

Computer Science Department, Faculty of Computer and Artificial Intelligence, Damietta University

Email

wael_abdelkader@du.edu.eg

City

damietta

Orcid

-

Volume

11

Article Issue

1

Related Issue

47386

Issue Date

2024-04-01

Receive Date

2023-10-25

Publish Date

2024-04-01

Page Start

57

Page End

65

Print ISSN

2356-8208

Online ISSN

2356-8216

Link

https://ejle.journals.ekb.eg/article_334586.html

Detail API

https://ejle.journals.ekb.eg/service?article_code=334586

Order

4

Type

Original Article

Type Code

1,039

Publication Type

Journal

Publication Title

The Egyptian Journal of Language Engineering

Publication Link

https://ejle.journals.ekb.eg/

MainTitle

A Survey on Visual Question Answering Methodologies

Details

Type

Article

Created At

24 Dec 2024