Beta
358167

Simulation-Based Assessment of Classification Methods: Statistical Models vs. Machine Learning Algorithms

Article

Last updated: 28 Dec 2024

Subjects

-

Tags

-

Abstract

Current studies evaluated the effectiveness of categorization techniques primarily using real datasets with unreported or unknown statistical features. This simulation-based study aims to compare the performance of statistical models (logistic regression, probit regression, and discriminant analysis) with machine learning algorithms (support vector machines, classification and regression trees, and k-nearest neighbors) to comprehensively understand their suitability for classification tasks. Although simulated datasets are used to control their statistical characteristics, the Pima Indian Diabetes real dataset is used to verify the study findings. The outcomes of this study have the potential to guide practitioners and researchers in selecting the most appropriate modeling technique for their specific needs, ultimately enhancing the accuracy and reliability of classification outcomes across various domains. The results revealed that the two statistical models -probit and logit- outperformed in most simulation scenarios. Markedly, the well-grounded, theory-based models of the logit regression and the probit regression models yielded the most accurate predictions in 78.5% and 83.6% of the simulated scenarios, respectively. Interestingly, the performance of the probit model was the best when the binary response variable was balanced (τ=0.50) and when it was too imbalanced (τ=0.90). Notably, the resulting performance metrics of the real dataset refer to the logit, followed by the probit, being the best-predicting models, which resembles the outcome of the simulation study.

DOI

10.21608/esju.2024.260404.1025

Keywords

classification, Logistic regression, Probit Regression, Discriminant Analysis, Support Vector Machines, Classification and regression trees, K-nearest neighbors, Machine Learning

Authors

First Name

Reham

Last Name

Beram

MiddleName

-

Affiliation

Department of Statistics, Mathematics and Insurance, Faculty of Business, Alexandria University, Alexandria, Egypt.

Email

reham.beram@gmail.com

City

Alexandria

Orcid

0000-0002-3935-1460

First Name

Ahmed

Last Name

El-Kotory

MiddleName

-

Affiliation

Department of Statistics, Mathematics and Insurance, Faculty of Business, Alexandria University, Alexandria, Egypt.

Email

ahmed.elkatory@alexu.edu.eg

City

-

Orcid

-

Volume

68

Article Issue

1

Related Issue

46950

Issue Date

2024-06-01

Receive Date

2024-01-03

Publish Date

2024-06-01

Page Start

91

Page End

124

Print ISSN

0542-1748

Online ISSN

2786-0086

Link

https://esju.journals.ekb.eg/article_358167.html

Detail API

https://esju.journals.ekb.eg/service?article_code=358167

Order

7

Type

Original Article

Type Code

1,914

Publication Type

Journal

Publication Title

The Egyptian Statistical Journal

Publication Link

https://esju.journals.ekb.eg/

MainTitle

Simulation-Based Assessment of Classification Methods: Statistical Models vs. Machine Learning Algorithms

Details

Type

Article

Created At

28 Dec 2024