Beta
174318

Imbalanced Data Oversampling Technique Based on Convex Combination Method

Article

Last updated: 24 Dec 2024

Subjects

-

Tags

-

Abstract

Classification process is the predicting a label for a specific set of inputs. In such process, it is difficult to classify given inputs when a dataset is imbalanced. Most of existing machine learning classifiers suffer from dealing with the imbalanced data, because it makes the classifiers highly biased towards the majority class. This bias may lead to less accuracy in minority class prediction. Data oversampling is one of the most important solutions used to balance the data particularly when dataset is small and/or imbalanced dataset. Synthetic Minority Over-sampling Technique (SMOTE), Borderline-SMOTE, Adaptive Synthetic (ADASYN) and Weighted SMOTE(W-SMOTE) are the most popular techniques used for data oversampling. However, the main drawback of SMOTE and ADASYN techniques is they increase the overlapping between classes and then the produced samples are not representative of the original data distribution. The Borderline-SMOTE may neglect some important samples to produce new samples. To overcome, the problems in the existing over-sampling techniques, in this paper, we propose a new data over-sampling method that depends on the convex combination method to generate new samples of the minority class. The convex combination allows us to produce new samples that have the same original data distribution. We evaluated our approach over four standard imbalanced datasets (Yeast, Glass Identification, Paw, and Wisconsin Prognosis Breast Cancer (WPBC)). The experimental results show that our proposed method gives better performance in terms of accuracy, precision, recall. F1-measure and Area under the curve (AUC).

DOI

10.21608/ijci.2021.72508.1047

Keywords

Imbalanced dataset, Oversampling, SMOTE, ADASYN, Borderline-SMOTE

Authors

First Name

mohammed

Last Name

elnahas

MiddleName

moustafa

Affiliation

cs department faculty of computer and information Menoufia University

Email

m.moustafa.elnahas@gmail.com

City

-

Orcid

-

First Name

Mahmoud

Last Name

Hussein

MiddleName

-

Affiliation

Computer Science Department, Faculty of Computers and Information, Menoufia University

Email

mahmoud.hussein@ci.menofia.edu.eg

City

Shebin Elkom

Orcid

0000-0002-3742-7548

First Name

Arabi

Last Name

Keshk

MiddleName

-

Affiliation

Faculty of Computer and Information Menoufia University

Email

arabikeshk@yahoo.com

City

arabikeshk@yahoo.com

Orcid

-

Volume

9

Article Issue

1

Related Issue

29711

Issue Date

2022-01-01

Receive Date

2021-04-15

Publish Date

2022-01-01

Page Start

15

Page End

28

Print ISSN

1687-7853

Online ISSN

2735-3257

Link

https://ijci.journals.ekb.eg/article_174318.html

Detail API

https://ijci.journals.ekb.eg/service?article_code=174318

Order

3

Type

Original Article

Type Code

877

Publication Type

Journal

Publication Title

IJCI. International Journal of Computers and Information

Publication Link

https://ijci.journals.ekb.eg/

MainTitle

Imbalanced Data Oversampling Technique Based on Convex Combination Method

Details

Type

Article

Created At

22 Jan 2023