184831

Data Augmentation for Arabic Speech Recognition Based on End-to-End Deep Learning

Article

Last updated: 03 Jan 2025

Subjects

-

Tags

-

Abstract

End-to-end deep learning approach has greatly enhanced the performance of speech recognition systems. With deep learning techniques, the overfitting stills the main problem with a little data. Data augmentation is a suitable solution for the overfitting problem, which is adopted to improve the quantity of training data and enhance robustness of the models. In this paper, we investigate data augmentation method for enhancing Arabic automatic speech recognition (ASR) based on end-to-end deep learning. Data augmentation is applied on original corpus for increasing training data by applying noise adaptation, pitch-shifting, and speed transformation. An CNN-LSTM and attention-based encoder-decoder method are included in building the acoustic model and decoding phase. This method is considered as state-of-art in end-to-end deep learning, and to the best of our knowledge, there is no prior research employed data augmentation for CNN-LSTM and attention-based model in Arabic ASR systems. In addition, the language model is built using RNN-LM and LSTM-LM methods. The Standard Arabic Single Speaker Corpus (SASSC) without diacritics is used as an original corpus. Experimental results show that applying data augmentation improved word error rate (WER) when compared with the same approach without data augmentation. The achieved average reduction in WER is 4.55%.

DOI

10.21608/ijicis.2021.73581.1086

Keywords

Arabic Speech Recognition, Data Augmentation, CNN-LSTM, RNN-LM, Attention-based Model

Authors

First Name

Hamzah

Last Name

Alsayadi

MiddleName

-

Affiliation

Department of Computer science, Faculty of Computer and Information Sciences, Ain Shams University, Cairo, Egypt

Email

hamzah.sayadi@cis.asu.edu.eg

City

Giza

Orcid

0000-0002-6062-0899

First Name

Abdelaziz

Last Name

Abdelhamid

MiddleName

-

Affiliation

Department of Computer science, Faculty of Computer and Information Sciences,Ain Shams University, Cairo, Egypt

Email

abdelaziz@cis.asu.edu.eg

City

-

Orcid

0000-0001-7080-1979

First Name

Islam

Last Name

Hegazy

MiddleName

-

Affiliation

Faculty of Computer and Information Sciences

Email

islheg@cis.asu.edu.eg

City

-

Orcid

0000-0002-1572-463X

First Name

Zaki

Last Name

Taha

MiddleName

-

Affiliation

Faculty of Computers and Information Sciences, Ain Shams University

Email

ztfayed@hotmail.com

City

-

Orcid

-

Volume

21

Article Issue

2

Related Issue

25765

Issue Date

2021-07-01

Receive Date

2021-04-22

Publish Date

2021-07-19

Page Start

50

Page End

64

Print ISSN

1687-109X

Online ISSN

2535-1710

Link

https://ijicis.journals.ekb.eg/article_184831.html

Detail API

https://ijicis.journals.ekb.eg/service?article_code=184831

Order

4

Type

Original Article

Type Code

494

Publication Type

Journal

Publication Title

International Journal of Intelligent Computing and Information Sciences

Publication Link

https://ijicis.journals.ekb.eg/

MainTitle

Data Augmentation for Arabic Speech Recognition Based on End-to-End Deep Learning

Details

Type

Article

Created At

22 Jan 2023