Beta
368595

A COMPARATIVE STUDY ON REINFORCEMENT LEARNING BASED VISUAL DIALOG SYSTEMS

Article

Last updated: 23 Dec 2024

Subjects

-

Tags

-

Abstract

Recently the conjunction between vision and language has created many intersecting tasks as visual question-answering systems, image captioning, etc. Specifically, dialog systems that depend on a visual scene play an important role in improving human-computer interaction technology. At the same time, reinforcement learning has emerged as a very successful paradigm for a variety of machine learning tasks, especially those tasks that aim to develop smart and humanoid machines. In this paper, we show how reinforcement learning is applied to conversational agents to build a powerful visual dialog agent. Visual Dialog task requires the agent to have a meaningful conversation about visual content in natural language. For a given image, its caption, dialog history (question/answer pairs), and a question about this scene, the agent should comprehend the question, extract the relevant context from the history, and ground this information on the image to correctly answer the current question. Two main visual dialog tasks have been introduced which are a free-form dialog task known as “Visual Dialog" and a goal-oriented dialog task formulated as a guessing game. Two datasets have been introduced to address these tasks which are VisDial dataset and GuessWhat?! datasets. For evaluation, some approaches use the accuracy metric while others use four metrics that have been proposed for the sake of this task. Several approaches are proposed for tackling this task based on supervised learning or reinforcement learning or even combining both techniques. This paper represents a comparative study of eleven important reinforcement learning approaches for visual dialog.

DOI

10.21608/ijicis.2024.295310.1339

Keywords

Visual Dialog, Guessing Game, Guess What?!, Guess Which, Attention Mechanism

Authors

First Name

Ghada

Last Name

Elshamy

MiddleName

M.

Affiliation

Department of Computer Science, Faculty of Computer and Information Sciences, Ain Shams University

Email

ghada.magdy@cis.asu.edu.eg

City

-

Orcid

0000-0002-1866-5321

First Name

Marco

Last Name

Alfonse

MiddleName

-

Affiliation

Computer Science Department, Faculty of Computer and Information Sciences, Ain Shams University, Cairo, Egypt

Email

marco_alfonse@cis.asu.edu.eg

City

cairo

Orcid

0000-0003-0722-3218

First Name

Islam

Last Name

Hegazy

MiddleName

-

Affiliation

Department of Computer Science, Faculty of Computer and Information Sciences, Ain Shams University

Email

islheg@cis.asu.edu.eg

City

-

Orcid

0000-0002-1572-463X

First Name

Mostafa

Last Name

Aref

MiddleName

M.

Affiliation

Department Computer Science, Faculty of Computer and Information Sciences,Ain Shams University, Cairo, Egypt.

Email

mostafa.aref@cis.asu.edu.eg

City

-

Orcid

0000-0002-1278-0070

Volume

24

Article Issue

2

Related Issue

48744

Issue Date

2024-06-01

Receive Date

2024-06-04

Publish Date

2024-07-01

Page Start

58

Page End

79

Print ISSN

1687-109X

Online ISSN

2535-1710

Link

https://ijicis.journals.ekb.eg/article_368595.html

Detail API

https://ijicis.journals.ekb.eg/service?article_code=368595

Order

368,595

Type

Original Article

Type Code

494

Publication Type

Journal

Publication Title

International Journal of Intelligent Computing and Information Sciences

Publication Link

https://ijicis.journals.ekb.eg/

MainTitle

A COMPARATIVE STUDY ON REINFORCEMENT LEARNING BASED VISUAL DIALOG SYSTEMS

Details

Type

Article

Created At

23 Dec 2024