Beta
355171

Advancing Arabic Scientific Text Analysis: Evaluating Machine Learning Models for Named Entity Recognition

Article

Last updated: 05 Jan 2025

Subjects

-

Tags

Computer and Technology Science.
Engineering Sciences.

Abstract

The task of named entity recognition in Arabic text, particularly within the scientific and medical domains, presents unique challenges due to the language's rich morphology, the scarcity of resources, and dialectical diversity. This study evaluates the efficacy of Conditional Random Fields (CRF), Support Vector Machines (SVM), and Stochastic Gradient Descent (SGD) models for named entity recognition in Arabic scientific texts. These models have been implemented on a self-collected dataset consisting of Arabic abstracts of theses. The named entities identified in the dataset include proteins, DNA, RNA, cell types, and cell lines. Focusing on the scientific domain, our comparative analysis reveals significant performance differences among the models, with hybrid approaches showing promising results. SGD, SVM, and CRF achieved F1-scores of 0.96, 0.91, and 0.80, respectively. The results demonstrate the effectiveness of the proposed models. The research contributes to Arabic natural language processing by highlighting model strengths and guiding future selections and development of named entity recognition models.

DOI

10.21608/bjas.2024.279914.1377

Keywords

Arabic Named Entity Recognition, Entity Extraction, Arabic NLP, Machine Learning

Authors

First Name

Nourhan

Last Name

Marzouk

MiddleName

-

Affiliation

Department of Computer Science, faculty of Computers and Artificial Intelligence, Benha University, Benha, Egypt

Email

norhan.marzoq17@fci.bu.edu.eg

City

-

Orcid

0009-0000-4310-8706

First Name

Hamada

Last Name

Nayel

MiddleName

-

Affiliation

Department of Computer Science, faculty of Computers and Artificial Intelligence, Benha University, Benha, Egypt

Email

hamada.ali@fci.bu.edu.eg

City

-

Orcid

0000-0002-2768-4639

First Name

Ahmed

Last Name

Elsawy

MiddleName

-

Affiliation

Computer Science Department Faculty of Computers and Artificial Intelligence Benha University Benha, Egypt

Email

ahmed.el_sawy@fci.bu.edu.eg

City

-

Orcid

-

Volume

9

Article Issue

5

Related Issue

46897

Issue Date

2024-05-01

Receive Date

2024-03-27

Publish Date

2024-05-01

Page Start

45

Page End

48

Print ISSN

2356-9751

Online ISSN

2356-976X

Link

https://bjas.journals.ekb.eg/article_355171.html

Detail API

https://bjas.journals.ekb.eg/service?article_code=355171

Order

5

Type

Original Research Papers

Type Code

1,647

Publication Type

Journal

Publication Title

Benha Journal of Applied Sciences

Publication Link

https://bjas.journals.ekb.eg/

MainTitle

Advancing Arabic Scientific Text Analysis: Evaluating Machine Learning Models for Named Entity Recognition

Details

Type

Article

Created At

28 Dec 2024