Beta
33954

A Comparative Study for Arabic Text Classification Based on BOW and Mixed Words Representations

Article

Last updated: 04 Jan 2025

Subjects

-

Tags

-

Abstract

This paper compares two methods for features representation in Arabic text classification. These methods are bag of words (BOW) that mean the word-level unigram and mixed words representations. The mixed words use a mixture of a bag of words and two adjacent words with different proportions. The main objective of this paper is to measure the accuracy of each method and to determine which method is more accurate for Arabic text classification based on the representation modes. Each method uses normalization and stemming. The results show that the use of mixed words in features representation achieves the highest accuracy by 98.61% when normalization is used.

DOI

10.21608/ijci.2016.33954

Keywords

Arabic Text Categorization, Frequency Ratio Accumulation Method, Term and Document Frequency, Features Selection, bag of words and Mixed Words

Authors

First Name

Rouhia

Last Name

Sallam

MiddleName

M.

Affiliation

Faculty of Applied Sciences, Taiz University, Yemen

Email

rouhia79@yahoo.com

City

-

Orcid

-

First Name

Hamdy

Last Name

Mousa

MiddleName

-

Affiliation

Faculty of Computers and Information Menoufia University

Email

hamdimmm@hotmail.com

City

-

Orcid

0000-0001-9503-9124

First Name

Mahmoud

Last Name

Hussien

MiddleName

-

Affiliation

Faculty of Computers and Information, Menofia University, Egypt

Email

mahmoud.hussein@ci.menofia.edu.eg

City

-

Orcid

0000-0002-3742-7548

Volume

5

Article Issue

1

Related Issue

5673

Issue Date

2016-06-01

Receive Date

2016-01-05

Publish Date

2016-06-01

Page Start

24

Page End

34

Print ISSN

1687-7853

Online ISSN

2735-3257

Link

https://ijci.journals.ekb.eg/article_33954.html

Detail API

https://ijci.journals.ekb.eg/service?article_code=33954

Order

3

Type

Original Article

Type Code

877

Publication Type

Journal

Publication Title

IJCI. International Journal of Computers and Information

Publication Link

https://ijci.journals.ekb.eg/

MainTitle

A Comparative Study for Arabic Text Classification Based on BOW and Mixed Words Representations

Details

Type

Article

Created At

22 Jan 2023