Beta
411082

Comparative Study on End-to-End Speech Recognition Using Pre-trained Models

Article

Last updated: 15 Feb 2025

Subjects

-

Tags

Electrical Engineering

Abstract

In the field of speech and audio signal processing, pre-trained models (PTMs) are commonly available. Pre-trained models (PTMs) offer a collection of initial weights and biases that may be adjusted for a particular task, which makes them a popular starting point for ML model development .State-of-the-art performance in speech recognition, natural language processing, and other applications has been shown using pre-trained model representations. Embeddings obtained from these models are used as inputs for learning algorithms that are used for a variety of downstream tasks. This study compares pretrained models to show how they perform in Automatic Speech Recognition (ASR). The literature research indicates that self-supervised models based on Wav2Vec2.0 and fully supervised models such as Whisper are the basic paradigms and approaches for ASR currently. This study evaluated and compared these strategies in order to check how well they perform across a wide range of test scenarios. This survey aims to serve as a practical manual for understanding, using, and generating PTMs for different NLP tasks.

DOI

10.21608/fuje.2024.312102.1089

Keywords

PTMs, ASR, Wav2vec2, Whisper, speech recognition, natural language processing

Authors

First Name

Martha

Last Name

Ghobrial

MiddleName

F.

Affiliation

Electronics and Communication Department , Fayoum university,Fayoum ,Egypt

Email

marthafikryghobrial@gmail.com

City

-

Orcid

-

First Name

Amr

Last Name

Gody

MiddleName

M.

Affiliation

Kyman Faryes Faculty of engineering

Email

amg00@fayoum.edu.eg

City

Fayoum

Orcid

0000-0003-2079-9860

First Name

Sayed

Last Name

Muhammad

MiddleName

T.

Affiliation

Computers and Systems Engineering Department, Faculty of Engineering, Fayoum University,Fayoum ,Egypt

Email

stm11@fayoum.edu.eg

City

Fayoum

Orcid

-

Volume

8

Article Issue

1

Related Issue

53725

Issue Date

2025-01-01

Receive Date

2024-08-13

Publish Date

2025-01-01

Page Start

131

Page End

142

Print ISSN

2537-0626

Online ISSN

2537-0634

Link

https://fuje.journals.ekb.eg/article_411082.html

Detail API

http://journals.ekb.eg?_action=service&article_code=411082

Order

411,082

Type

Original Article

Type Code

651

Publication Type

Journal

Publication Title

Fayoum University Journal of Engineering

Publication Link

https://fuje.journals.ekb.eg/

MainTitle

Comparative Study on End-to-End Speech Recognition Using Pre-trained Models

Details

Type

Article

Created At

15 Feb 2025