Beta
60081

Novel Image PreprocessingApproach for Automatic Speech Recognition

Article

Last updated: 24 Dec 2024

Subjects

-

Tags

-

Abstract

This research is intending to provide a novel approach of manipulating automatic speech recognition using image recognition approach. This research introduces hybrid 2D-Image-Hidden Markov Model(2DI)-(HMM) approach to handle preprocessing classification task in Automatic Speech Recognition System (ASR). The focus in this research is in the classification task. Due to that the proposed approach is novel and is a task in the whole ASR, it is evaluated using relative comparison to other popular approaches to run the same task on the same database. The relative comparison with hybrid Gaussian Mixture (GMM)-HMM with Mel Frequency Cepstral (MFCC) features is considered as reference results. This research introduces a new method of mapping speech signal into two-dimensionalspace. Speech stream is segmented and then the frequency contents are projected into frequency domain using a balanced tree structure filter. The wavelet packets technique is used to implement the filtering. The tree structure is captured into image. Database is constructed of encoded images. The imagesthenare segregated into speech classes. Hybrid Discrete Cosine Transform (DCT) based featuresare used for image encoding with (HMM) as Class model is evaluated against MFCC-HMM for the same classification problem. The proposed hybrid model indicates better balanced results over MFCC-HMM for handling the different classes. The considered classes in this research are vowels, consonants, plosives and speech silence.
KED-TIMITCorpus is used in this research as source of speech information. This approach is indicating promising results especiallyin Silence and vowels detection.

DOI

10.21608/ejle.2018.60081

Keywords

English Phone Recognition, Automatic Speech recognition (ASR), Mel-Scale, DCT, Wavelet packets, HTK, BTE and MFCC

Authors

First Name

Amr

Last Name

Gody

MiddleName

M.

Affiliation

Electrical Engineering Department, Faculty of Engineering, Fayoum University

Email

amg00@fayoum.edu.eg

City

Fayoum

Orcid

0000-0003-2079-9860

First Name

Youssra

Last Name

Emam

MiddleName

Abdelmoniem

Affiliation

Communications and Electronics Department, Faculty of Engineering - Fayoum University

Email

eng.ussraemam@yahoo.com

City

Fayom. Egypt

Orcid

-

First Name

Nashaat

Last Name

Hussein

MiddleName

M.

Affiliation

Electronics& Communication Engineering, Faculty of Engineering, Fayoum University,Egypt

Email

nmh01@fayoum.edu.eg

City

Fayoum, Egypt

Orcid

-

Volume

5

Article Issue

2

Related Issue

9007

Issue Date

2018-09-01

Receive Date

2018-04-23

Publish Date

2018-09-01

Page Start

1

Page End

15

Print ISSN

2356-8208

Online ISSN

2356-8216

Link

https://ejle.journals.ekb.eg/article_60081.html

Detail API

https://ejle.journals.ekb.eg/service?article_code=60081

Order

1

Type

Original Article

Type Code

1,039

Publication Type

Journal

Publication Title

The Egyptian Journal of Language Engineering

Publication Link

https://ejle.journals.ekb.eg/

MainTitle

Novel Image PreprocessingApproach for Automatic Speech Recognition

Details

Type

Article

Created At

22 Jan 2023