35122

A Novel Scalable and Effective Partitioning Approach for Big Data Reduction

Article

Last updated: 04 Jan 2025

Subjects

-

Tags

-

Abstract

The continuous increment of data size makes the traditional instance selection methods ineffective to reduce big training datasets in a single machine. Recent approaches to solving this technical problem partition the training dataset into subsets prior to apply the instance selection methods into each subset separately. However, the performance of the applied instance selection methods to subsets is negatively affected, especially when the number of partitioned subsets is increased. In this work, we propose a novel scalable and effective automated partitioning approach, called overlapped distance-based class-balance partitioning. This approach distributes the training dataset instances to the partitioned subsets based on a given distance metric and ensures the equal representation of data classes into partitioned subsets. Moreover, the instances might be assigned to two subsets once they satisfy the dynamic threshold. We implement and test empirically the scalability and effectiveness of the proposed approach using condensed nearest neighbor method over eight standard datasets. The proposed approach is compared empirically and analytically with stratification partitioning approach and a non-overlapped version from our approach with respect to 1) the reduction rate, classification accuracy, and effectiveness metrics, and 2) the scalability aspect, where the number of subsets is increased. The comparison results demonstrate that our approach is more scalable and effective than other partitioning approaches with respect to these standard datasets.

DOI

10.21608/ijci.2019.35122

Keywords

Big Data, Data mining, Data Reduction, Instance Selection, Data Partitioning

Authors

First Name

M.

Last Name

Malhat

MiddleName

G.

Affiliation

Computer Science dept., Faculty of computers and Information, Menoufia University, Egypt

Email

m.gmalhat@yahoo.com

City

-

Orcid

0000-0002-0136-4805

First Name

M.

Last Name

Elmenshawy

MiddleName

-

Affiliation

Computer Science dept., Faculty of Computers and Information, Menofia University, Egypt

Email

mohamed.elmenshawy@ci.menofia.edu.eg

City

-

Orcid

-

First Name

Hamdy

Last Name

Mousa

MiddleName

-

Affiliation

Faculty of Computer and Information Menoufia University

Email

hamdimmm@hotmail.com

City

-

Orcid

0000-0001-9503-9124

First Name

A.

Last Name

Elsisi

MiddleName

B.

Affiliation

Computer Science dept., Faculty of computers and Information, Menofia University, Egypt

Email

ashrafelsisi@hotmail.com

City

-

Orcid

-

Volume

6

Article Issue

1

Related Issue

5795

Issue Date

2019-01-01

Receive Date

2018-08-01

Publish Date

2019-01-01

Page Start

9

Page End

19

Print ISSN

1687-7853

Online ISSN

2735-3257

Link

https://ijci.journals.ekb.eg/article_35122.html

Detail API

https://ijci.journals.ekb.eg/service?article_code=35122

Order

2

Type

Original Article

Type Code

877

Publication Type

Journal

Publication Title

IJCI. International Journal of Computers and Information

Publication Link

https://ijci.journals.ekb.eg/

MainTitle

A Novel Scalable and Effective Partitioning Approach for Big Data Reduction

Details

Type

Article

Created At

22 Jan 2023