Beta
254708

Arabic Semantic-Based Textual Similarity

Article

Last updated: 28 Dec 2024

Subjects

-

Tags

Engineering Sciences.

Abstract

Textual similarity is one of the most important aspects of information retrieval. This paper proposes several techniques of
semantic textual similarity as well as the factors that influence them. Two-hybrid approaches for measuring the degree of
similarity between two Arabic snipped texts are presented. The first proposed approach combined the word-based and vectorbased
similarity methods to construct semantic word spaces for each word of the input text. These words are represented in
their lemma forms to capture all semantically related words. In this approach, the semantic word spaces are used to find the
best matching between the input text words, and hence, the degree of similarity between the two snipped texts is computed.
The second proposed approach combined semantic and syntactic based approaches. The basic Levenshtein concept
represents the main structure for this approach. It has been modified to measure the edit cost at the token level not at the
character level. In addition, the semantic word spaces are added to this approach to include the semantic features to the
syntactic features. Some techniques are embedded to overcome the syntactic approach problems such as the word sequence.
Pearson correlation coefficient is used to measure the degree of correctness of the two proposed approaches as compared to
two benchmark datasets. The experiments achieved 0.7212 and 0.7589 for the two proposed approaches on two different
datasets.

DOI

10.21608/bjas.2022.254708

Keywords

Arabic Text Similarity, Semantic Similarity, Lexical Similarity, word embedding, Permutation Feature, Negation Effect

Authors

First Name

Shimaa

Last Name

Ismail

MiddleName

-

Affiliation

Faculty of Computers and Artificial Intelligence, Benha Univ., Benha, Egypt

Email

-

City

-

Orcid

-

First Name

AbdelWahab

Last Name

Alsammak

MiddleName

-

Affiliation

Faculty of Engineering Shoubra, Benha Univ., Benha, Egypt

Email

-

City

-

Orcid

-

First Name

Tarek

Last Name

Elshishtawy

MiddleName

-

Affiliation

Faculty of Computers and Artificial Intelligence, Benha Univ., Benha, Egypt

Email

-

City

-

Orcid

-

Volume

7

Article Issue

4

Related Issue

34935

Issue Date

2022-04-01

Receive Date

2022-04-15

Publish Date

2022-04-01

Page Start

133

Page End

142

Print ISSN

2356-9751

Online ISSN

2356-976X

Link

https://bjas.journals.ekb.eg/article_254708.html

Detail API

https://bjas.journals.ekb.eg/service?article_code=254708

Order

18

Type

Original Research Papers

Type Code

1,647

Publication Type

Journal

Publication Title

Benha Journal of Applied Sciences

Publication Link

https://bjas.journals.ekb.eg/

MainTitle

Arabic Semantic-Based Textual Similarity

Details

Type

Article

Created At

23 Jan 2023