Beta
349220

Richness Lost in Machine Translationese:

Article

Last updated: 24 Dec 2024

Subjects

-

Tags

-

Abstract

Neural Machine Translation (NMT) might have been pronounced as faster and better than human translation. However, NMT inherently overgeneralizes the more frequently appearing patterns detected in their training data at the expense of the less frequently appearing ones in a phenomenon dubbed “machine translationese". This machine translationese has been noticed to reflect some controversial asymmetries. One usually overlooked facet of this machine bias is the loss of "lexical richness". The generated translations have only recently been noticed to be disproportionately deformed and impoverished, negatively impacted with the NMT's tendency to overgeneralize. Lexical richness, notwithstanding its worth, has not received the same attention that lexical accuracy and error-measuring have received, and more important, it has not received any attention at all in under-researched language pairs, such as Arabic–English. This study aims to shed light on lexical richness in the output of Arabic-into-English NMT as opposed to human translation (HT), answering the question: Does HT exhibit more lexical richness than NMT does? The study adopts the most agreed-upon definition of lexical richness as a superordinate term that includes “lexical diversity", “lexical density", and “lexical sophistication"; all three are statistical metrics that gauge the lexical richness of a text. The study analyses the outputs of two NMTs, Google Translate and Microsoft Translator, in terms of lexical richness, using both quantitative and qualitative methods, and then compares the results to those of the HT output. The corpus of the study is comprised of a news subcorpus and a literary subcorpus.

DOI

10.21608/ejle.2024.267336.1064

Keywords

Lexical Density, lexical diversity, Lexical Richness, lexical sophistication, neural machine translation

Authors

First Name

Radwa

Last Name

Kotait

MiddleName

-

Affiliation

English Department, Faculty of Al-Alsun, Ain Shams University

Email

radwa_kotait@alsun.asu.edu.eg

City

-

Orcid

0000-0003-4355-6300

Volume

11

Article Issue

1

Related Issue

47386

Issue Date

2024-04-01

Receive Date

2024-02-02

Publish Date

2024-04-01

Page Start

66

Page End

85

Print ISSN

2356-8208

Online ISSN

2356-8216

Link

https://ejle.journals.ekb.eg/article_349220.html

Detail API

https://ejle.journals.ekb.eg/service?article_code=349220

Order

5

Type

Original Article

Type Code

1,039

Publication Type

Journal

Publication Title

The Egyptian Journal of Language Engineering

Publication Link

https://ejle.journals.ekb.eg/

MainTitle

Richness Lost in Machine Translationese:

Details

Type

Article

Created At

24 Dec 2024