Beta
384213

Constructing and Augmenting a Bidirectional Paraphrases Dataset from an English-Arabic Subtitling Parallel Corpus

Article

Last updated: 24 Dec 2024

Subjects

-

Tags

-

Abstract

Paraphrasing is one of the major yet the most challenging tasks of the deep semantic analysis of natural languages. In this paper we present a novel algorithm that operates on a big parallel text corpus and automatically generates the paraphrases of the two natural languages of the corpus. Like several previously crafted algorithms in this regard, our algorithm exploits the bidirectional translation provided by the big parallel text corpora to infer couples of synonymous phrases, however, our algorithm is simpler and more efficient. Moreover, our algorithm is the only one that constructs the whole paraphrase through its run without any need for further post processing. We implemented and ran our algorithm on the English-Arabic text corpora from the 2018 version of the OpenSubtitles (OPUS) parallel text corpora, and through the statistical evaluation of random samples we found that the semantic quality among the phrases of the automatically generated paraphrases to be interestingly superb.

DOI

10.21608/ejle.2024.308019.1070

Keywords

bidirectional semantic augmentation, paraphrase, paraphrasing, Phrase, semantic analysis

Authors

First Name

Mohamed

Last Name

Ahmed

MiddleName

Attia

Affiliation

RDI; www.rdi-eg.ai

Email

mohamed.attia.nlp@gmail.com

City

-

Orcid

-

First Name

Fahad

Last Name

AlGhamdi

MiddleName

-

Affiliation

Al-Baha University, Al-Baha - Saudi Arabia, fghamdi@bu.edu.sa

Email

fghamdi@bu.edu.sa

City

-

Orcid

-

First Name

Abdelati

Last Name

Hawwari

MiddleName

-

Affiliation

Datalex4ai, Santa Clara – California - USA

Email

abdelati@datalex4ai.com

City

-

Orcid

-

Volume

11

Article Issue

2

Related Issue

50689

Issue Date

2024-10-01

Receive Date

2024-07-29

Publish Date

2024-10-01

Page Start

1

Page End

12

Print ISSN

2356-8208

Online ISSN

2356-8216

Link

https://ejle.journals.ekb.eg/article_384213.html

Detail API

https://ejle.journals.ekb.eg/service?article_code=384213

Order

1

Type

Original Article

Type Code

1,039

Publication Type

Journal

Publication Title

The Egyptian Journal of Language Engineering

Publication Link

https://ejle.journals.ekb.eg/

MainTitle

Constructing and Augmenting a Bidirectional Paraphrases Dataset from an English-Arabic Subtitling Parallel Corpus

Details

Type

Article

Created At

24 Dec 2024