Beta
112830

Machine Learning and Feature Selection Approaches for Categorizing Arabic Text: Analysis, Comparison, and Proposal

Article

Last updated: 24 Dec 2024

Subjects

-

Tags

-

Abstract

This work adopts some classification approaches for categorizing Arabic text. The approaches are operated on two datasets as test-beds. A comparative study is done to evaluate the performance of the adopted classifiers. Some feature selection methods are also analyzed, investigated, and evaluated. Selecting the most significant features is important because the huge number of features may cause performance degradation for text classification. A comparative study is done among the adopted feature selection methods for classifying Arabic documents.
Moreover, a modification is done on the feature selection approaches by doing amalgamation for the chosen methods. A novel method is also proposed for selecting the most appropriate features. The method is based on the semantic fusion and multiple-words (SF-MW) for constructing the features. A comparison is done among the adopted feature selection methods and the proposed one.
The experimental results show that the best performance was for the SVM classifier compared to the KNN and NB classifiers. The combination among the adopted feature selection methods presents better results compared to the individual adopted ones. The proposed feature selection method (SF-MW) is promising as it reduced the features and achieved higher classification accuracy. The accuracy improvement was about 22% for the two chosen Arabic test-beds which contain 1246 and 1500 documents respectively. The proposed method is expected to be also efficient for other Arabic and English datasets.

DOI

10.21608/ejle.2020.29313.1006

Keywords

classification Algorithms, Feature Selection, Multiple-Arabic-Words, Semantic Fusion, and Measurable Evaluation Criteria

Authors

First Name

Ayat

Last Name

Elnahas

MiddleName

-

Affiliation

Department of Research Informatics, Electronics Research Institute, Cairo, Egypt

Email

eng_ayatelnahas@yahoo.com

City

-

Orcid

-

First Name

Nawal

Last Name

Elfishawy

MiddleName

-

Affiliation

Department of Computer Science and Engineering, Faculty of Electronic Engineering, Menoufia University, Menoufia, Egypt

Email

nelfishawy@hotmail.com

City

Menouf, Egypt

Orcid

-

First Name

Mohamed

Last Name

Nour

MiddleName

-

Affiliation

Department of Research Informatics, Electronics Research Institute, Cairo, Egypt

Email

mnour99@hotmail.com

City

-

Orcid

-

First Name

Maha

Last Name

Tolba

MiddleName

-

Affiliation

Department of Computer Science and Engineering, Faculty of Electronic Engineering, Menoufia University, Menoufia, Egypt

Email

mahatolba@yahoo.com

City

-

Orcid

-

Volume

7

Article Issue

2

Related Issue

17123

Issue Date

2020-09-01

Receive Date

2020-05-06

Publish Date

2020-09-15

Page Start

1

Page End

19

Print ISSN

2356-8208

Online ISSN

2356-8216

Link

https://ejle.journals.ekb.eg/article_112830.html

Detail API

https://ejle.journals.ekb.eg/service?article_code=112830

Order

1

Type

Original Article

Type Code

1,039

Publication Type

Journal

Publication Title

The Egyptian Journal of Language Engineering

Publication Link

https://ejle.journals.ekb.eg/

MainTitle

Machine Learning and Feature Selection Approaches for Categorizing Arabic Text: Analysis, Comparison, and Proposal

Details

Type

Article

Created At

22 Jan 2023