Bidirectional Temporal Context Fusion with Bi-Modal Semantic Features using a gating mechanism for Dense Video Captioning - Egyptian Knowledge Bank

184716

Bidirectional Temporal Context Fusion with Bi-Modal Semantic Features using a gating mechanism for Dense Video Captioning

Article

Last updated: 03 Jan 2025

Overview Similar Items

Subjects

Abstract

Dense video captioning involves detecting interesting events and generating textual descriptions for each event in an untrimmed video. Many machine intelligent applications such as video summarization, search and retrieval, automatic video subtitling for supporting blind disabled people, benefit from automated dense captions generator. Most recent works attempted to make use of an encoder-decoder neural network framework which employs a 3D-CNN as an encoder for representing a detected event frames, and an RNN as a decoder for caption generation. They follow an attention based mechanism to learn where to focus in the encoded video frames during caption generation. Although the attention-based approaches have achieved excellent results, they directly link visual features to textual captions and ignore the rich intermediate/high-level video concepts such as people, objects, scenes, and actions. In this paper, we firstly propose to obtain a better event representation that discriminates between events nearly ending at the same time by applying an attention based fusion. Where hidden states from a bi-directional LSTM sequence video encoder, which encodes past and future surrounding context information of a detected event are fused along with its visual (R3D) features. Secondly, we propose to explicitly extract bi-modal semantic concepts (nouns and verbs) from a detected event segment and equilibrate the contributions from the proposed event representation and the semantic concepts dynamically using a gating mechanism while captioning. Experimental results demonstrates that our proposed attention based fusion is better in representing an event for captioning. Also involving semantic concepts improves captioning performance.

DOI

10.21608/ijicis.2021.60216.1055

Keywords

Video events proposal detection, Video to natural language, Attention-Based sentence decoder, Bidirectional LSTM, Deep learning

Authors

View Authors

First Name

Noorhan

Last Name

Khaled

MiddleName

Affiliation

Computer Science Department, Faculty of Computer and Information Sciences, Ain Shams University, Cairo, Egypt

Email

noorhankhaled1994@gmail.com

City

Orcid

0000-0003-3372-7495

View Authors

First Name

Last Name

Aref

MiddleName

Affiliation

Department Computer Science, Faculty of Computer and Information Sciences,Ain Shams University, Cairo, Egypt.

Email

mostafa.aref@cis.asu.edu.eg

City

Orcid

0000-0002-1278-0070

View Authors

First Name

mohammed

Last Name

marey

MiddleName

Affiliation

Scientific Computing department, Faculty of Computer and Information Science, Ain Shams University, Cairo, Egypt

Email

mohammed.marey@cis.asu.edu.eg

City

Orcid

Volume

Article Issue

Related Issue

25765

Issue Date

2021-07-01

Receive Date

2021-01-28

Publish Date

2021-07-18

Page Start

Page End

Print ISSN

1687-109X

Online ISSN

2535-1710

Article File

IJICIS_Volume 21_Issue 2_Pages 1-22.pdf

PDF . 1.9MB

Link

https://ijicis.journals.ekb.eg/article_184716.html

Detail API

https://ijicis.journals.ekb.eg/service?article_code=184716

Order

Type

Original Article

Type Code

494

Publication Type

Journal

Publication Title

International Journal of Intelligent Computing and Information Sciences

Publication Link

https://ijicis.journals.ekb.eg/

MainTitle

Bidirectional Temporal Context Fusion with Bi-Modal Semantic Features using a gating mechanism for Dense Video Captioning

Details

Type

Article

Created At

22 Jan 2023

Subjects

Tags

Abstract

DOI

Keywords

Authors

First Name

Last Name

MiddleName

Affiliation

Email

City

Orcid

First Name

Last Name

MiddleName

Affiliation

Email

City

Orcid

First Name

Last Name

MiddleName

Affiliation

Email

City

Orcid

Volume

Article Issue

Related Issue

Issue Date

Receive Date

Publish Date

Page Start

Page End

Print ISSN

Online ISSN

Article File

IJICIS_Volume 21_Issue 2_Pages 1-22.pdf

Link

Detail API

Order

Type

Type Code

Publication Type

Publication Title

Publication Link

MainTitle