Beta
376340

An Efficient Speaker Diarization Pipeline for Conversational Speech

Article

Last updated: 05 Jan 2025

Subjects

-

Tags

Applied and Basic Science.

Abstract

In the domain of audio signal processing, the accurate and efficient diarization of conversational speech remains a challenging task, particularly in environments with significant speaker overlap and diverse acoustic scenarios. This paper introduces a comprehensive speaker diarization pipeline that substantially improves both performance and efficiency in processing conversational speech. Our pipeline comprises several key components: Voice Activity Detection (VAD), Speaker Overlap Detection (SOD), Speaker Separation models, robust speaker embedding, clustering algorithms, and sophisticated post-processing techniques. Beginning with Voice Activity Detection (VAD), the pipeline efficiently discriminates between speech and non-speech segments, effectively reducing processing overhead. Following VAD, the Speaker Overlap Detection (SOD) component identifies segments featuring speaker overlap. Following this, a speaker separation model separate the overlapping speech into distinct streams. A pivotal enhancement in our pipeline is the integration of robust speaker embedding and clustering techniques, which capture and utilize speaker-specific characteristics to improve the grouping of speech segments. Finally, the post-processing stage refines these segments to ensure temporal consistency and improve the overall diarization accuracy. We evaluated our pipeline across multiple benchmark datasets, demonstrating significant reductions in Diarization Error Rate (DER) compared to existing methods. The results affirm the effectiveness of incorporating detailed speaker embeddings and clustering in a diarization system, particularly for real-world conversational speech. This enhanced pipeline offers substantial advancements for applications requiring accurate speaker attribution, such as automated transcription services, meeting analysis, and assistive communication technologies.

DOI

10.21608/bjas.2024.284482.1414

Keywords

speaker diarization, speaker separation, voice activity detection, Optimization

Authors

First Name

Wael

Last Name

Sultan

MiddleName

Ali

Affiliation

Department of Basic Engineering Sciences, Benha Faculty of Engineering, Benha University, Benha, Egypt

Email

wael.ali@bhit.bu.edu.eg

City

-

Orcid

-

First Name

Mourad

Last Name

Semary

MiddleName

Samir

Affiliation

Department of Basic Engineering Sciences, Benha Faculty of Engineering, Benha University, Benha, Egypt

Email

mourad.semary@bhit.bu.edu.eg

City

-

Orcid

-

First Name

Sherif

Last Name

Abdou

MiddleName

Mahdy

Affiliation

Information Technology Department, Faculty of Artificial Intelligence, Cairo University, Cairo, Egypt

Email

sh.ma.abdou@gmail.com

City

-

Orcid

-

Volume

9

Article Issue

5

Related Issue

46897

Issue Date

2024-05-01

Receive Date

2024-05-03

Publish Date

2024-05-29

Page Start

141

Page End

146

Print ISSN

2356-9751

Online ISSN

2356-976X

Link

https://bjas.journals.ekb.eg/article_376340.html

Detail API

https://bjas.journals.ekb.eg/service?article_code=376340

Order

16

Type

Original Research Papers

Type Code

1,647

Publication Type

Journal

Publication Title

Benha Journal of Applied Sciences

Publication Link

https://bjas.journals.ekb.eg/

MainTitle

An Efficient Speaker Diarization Pipeline for Conversational Speech

Details

Type

Article

Created At

28 Dec 2024