Beta
392966

Leveraging Data Mining for Inference and Prediction in Lung Cancer Research

Article

Last updated: 29 Dec 2024

Subjects

-

Tags

Applied Statistics and Econometrics
Data Science

Abstract

Lung cancer~is the second most common cancer worldwide, with an estimated 2.21 million new diagnoses and 1.8 million deaths in 2020, according to WHO. Successful lung cancer treatment, early detection, and diagnosis improve survival rates. This study included 270 lung cancer patients and 39 with no lung cancer patients. Logistic regression will be used to analyze the association between variables for inference and Linear Discriminant Analysis, Quadratic Discriminant Analysis, Logistic Regression Analysis, k Nearest Neighborhood, Decision tree, Bagging, Random Forest, and Support Vector Machine used to predict the likelihood of an individual developing lung cancer based on factors. In terms of accuracy, 5 fold cross validation showed higher accuracy than the validation set approach where the Logistic Regression Model had the highest accuracy of 93.54%, followed by the Linear Discriminant Analysis with an accuracy of 92.09%,  the Support Vector Machine with an accuracy of 91.29%, Bagging and Random Forest with an accuracy of 90.90 and 91.23 respectively, the Quadratic Discriminant Analysis with an accuracy of 89.97, Decision Tree with an accuracy of 89.97,  the Knn-10 model with an accuracy of 17.74%, and lastly KNN-5 Model with an accuracy of 16.12%. The logistic regression model identified key associations between lung cancer and factors such as Allergy, Peer pressure, Swallowing difficulty, Smoking, Chronic disease, Alcohol consumption, yellow fingers, Fatigue, and Coughing. The accuracy rankings varied between 5-fold cross-validation and validation set approaches. Notably, the logistic regression model consistently demonstrated superior performance, achieving an accuracy rate of 93.54%.

DOI

10.21608/cjmss.2024.309277.1068

Keywords

Lung Cancer, Logistic regression, classification, accuracy, validation, Data mining

Authors

First Name

Md Nurul

Last Name

Raihen

MiddleName

-

Affiliation

Department of Mathematics and Computer Science, Fontbonne University, USA

Email

nraihen@fontbonne.edu

City

Saint Louis

Orcid

0000-0003-2680-0658

First Name

Shakera

Last Name

Begum

MiddleName

-

Affiliation

Department of Statistics, Western Michigan University, USA

Email

shakerabgm@gmail.com

City

-

Orcid

-

First Name

Sultana

Last Name

Akter

MiddleName

-

Affiliation

Institute for Data Science and Informatics, University of Missouri Columbia, USA

Email

sa4kf@umsystem.edu

City

-

Orcid

-

First Name

Md Nazmul

Last Name

Sardar

MiddleName

-

Affiliation

Senior Officer, Product Development at Radiant Nutraceuticals Ltd, Bangladesh

Email

nazmulsardar91@gmail.com

City

-

Orcid

-

Volume

4

Article Issue

1

Related Issue

50936

Issue Date

2025-04-01

Receive Date

2024-08-30

Publish Date

2025-04-01

Page Start

139

Page End

161

Print ISSN

2974-3435

Online ISSN

2974-3443

Link

https://cjmss.journals.ekb.eg/article_392966.html

Detail API

https://cjmss.journals.ekb.eg/service?article_code=392966

Order

392,966

Type

Original Article

Type Code

2,545

Publication Type

Journal

Publication Title

Computational Journal of Mathematical and Statistical Sciences

Publication Link

https://cjmss.journals.ekb.eg/

MainTitle

Leveraging Data Mining for Inference and Prediction in Lung Cancer Research

Details

Type

Article

Created At

29 Dec 2024