Beta
320869

Missing Value Management: Weighted Heuristic Similarity Estimation for Numeric Values

Article

Last updated: 28 Dec 2024

Subjects

-

Tags

-

Abstract

For businesses and technologies such as the Internet of Things (IoT) and digital banking that handles massive volumes of data, it is crucial to have all processed data values accurately recorded; for data values that are not recorded, they must be replaced using a reliable imputation method. The need for missing value imputation is of extreme importance in big data applications as data volumes tend to grow exponentially and their data structures change rapidly. This study proposes a reasonable distance function that is more effective in determining the best replacement values for missing data before applying a classifier on the objective dataset. In essence, the Weighted Heuristic Similarity Estimation mechanism (WHSE) consumes substantial effort in practical application fields. The WHSE method was benchmarked using Mean Absolute Error (MAE) and Root Mean Square Error (RMSE) metrics. The evaluation process was conducted using three distinct classifiers: Nearest-Neighbor (NN), Linear-Regression (LR), and Multi-Layer Perceptron (MLP). WHSE method is applied on two different datasets: Iris and Forest Fires to estimate its impact in replacing missing value. Consequently, WHSE formula can direct the applied classifier to score at least similar performance -- if not ideal-- regardless of the characteristics of the imputed data. WHSE method is expected to be scalable, stable and applicable in big data analytics.   

DOI

10.21608/mjcis.2019.320869

Keywords

Rough Sets, Information Gain, Missing Values, Missing Value Imputation, Machine Learning

Authors

First Name

O. M.

Last Name

Elzeki

MiddleName

-

Affiliation

Faculty of Computers and Information, Computer Science Dept. Mansoura University, Egypt

Email

omar_m_elzeki@mans.edu.eg

City

-

Orcid

-

First Name

M. F.

Last Name

Alrahmawy

MiddleName

-

Affiliation

Faculty of Computers and Information, Computer Science Dept. Mansoura University, Egypt

Email

-

City

-

Orcid

-

First Name

S.

Last Name

Elmogy

MiddleName

-

Affiliation

Faculty of Computers and Information, Computer Science Dept. Mansoura University, Egypt

Email

-

City

-

Orcid

-

Volume

15

Article Issue

1

Related Issue

43865

Issue Date

2019-06-01

Receive Date

2023-10-10

Publish Date

2019-06-01

Page Start

45

Page End

52

Print ISSN

2090-1666

Online ISSN

2090-1674

Link

https://mjcis.journals.ekb.eg/article_320869.html

Detail API

https://mjcis.journals.ekb.eg/service?article_code=320869

Order

320,869

Type

Original Research Articles.

Type Code

1,784

Publication Type

Journal

Publication Title

Mansoura Journal for Computer and Information Sciences

Publication Link

https://mjcis.journals.ekb.eg/

MainTitle

Missing Value Management: Weighted Heuristic Similarity Estimation for Numeric Values

Details

Type

Article

Created At

28 Dec 2024