414893

Enhancing Fraud Detection in Imbalanced Datasets: A Comparative Study of Machine Learning and Deep Learning Algorithms with SMOTE Preprocessing

Article

Last updated: 09 Mar 2025

Subjects

-

Tags

Artificial Intelligence
Deep learning

Abstract

Fraud detection has become a critical challenge, particularly with the growth of e-commerce. Financial institutions are under increasing pressure to develop robust systems to mitigate significant economic losses due to fraudulent activities. A key difficulty in detecting credit card fraud is the imbalance of data sets, where fraudulent transactions are far fewer than legitimate ones. This imbalance often results in models struggling to effectively recognize fraud. To address this issue, various techniques have been developed. The Synthetic Minority Oversampling Technique (SMOTE) is widely used to create synthetic instances and balance the data set. Other strategies include under-sampling, which reduces the number of legitimate transactions, and cost-sensitive learning, which assigns different costs to misclassifications to prioritize fraud detection. Advanced SMOTE variants, such as Borderline-SMOTE and ADASYN, further enhance the balance of data by focusing on complex samples. This paper examines how data preprocessing affects the performance of several machine learning and deep learning algorithms. Key preprocessing steps include data cleaning, normalization, feature selection, and SMOTE application. The cleaned and normalized data set ensures quality and comparability, while feature selection reduces dimensionality. The application of SMOTE directly addresses the class imbalance. The preprocessed data are evaluated using Support Vector Machines (SVM), Random Forests (RF), Convolitional Neural Networks (CNN), and Long-Short-Term Memory Networks (LSTMs). These algorithms are assessed for their ability to detect fraud after pre-processing. Comparative analyses confirm the effectiveness of SMOTE, showing improved performance across all algorithms. Metrics such as accuracy, precision, recall, and F1 score exhibit high results, with CNN achieving the highest performance (95% accuracy and 94% F1 score), followed by RF, LSTM, and SVM. Although SMOTE enhanced SVM performance, it did not match CNN or RF levels. These findings highlight the significant improvements that data pre-processing can yield, providing valuable insights for improving fraud detection systems.  

DOI

10.21608/mjcis.2025.313097.1007

Keywords

Fraud detection, SMOTE, SVM, CNN, LSTMs

Authors

First Name

Walaa

Last Name

salem

MiddleName

salah

Affiliation

Information Systems Department, Faculty of Computer & Information Sciences - Mansoura University

Email

walaa79@mans.edu.eg

City

Mansoura

Orcid

0009-0009-4281-4903

First Name

ibrahim

Last Name

el- hasnony

MiddleName

-

Affiliation

Information Systems Department, Faculty of Computer & Information Sciences - Mansoura University

Email

ibrahimhesin2005@mans.edu.eg

City

mansoura

Orcid

-

First Name

Ahmed

Last Name

Abu Elfetouh

MiddleName

-

Affiliation

Information Systems Department, Faculty of Computer & Information Sciences - Mansoura University

Email

elfetouh@gmail.com

City

mansoura

Orcid

-

First Name

Amira

Last Name

Rezk

MiddleName

-

Affiliation

Information Systems Department, Faculty of Computer & Information Sciences - Mansoura University

Email

amira_rezk@mans.edu.eg

City

mansoura

Orcid

-

Volume

20

Article Issue

1

Related Issue

53821

Issue Date

2025-06-01

Receive Date

2024-09-24

Publish Date

2025-06-01

Page Start

1

Page End

21

Print ISSN

2090-1666

Online ISSN

2090-1674

Link

https://mjcis.journals.ekb.eg/article_414893.html

Detail API

http://journals.ekb.eg?_action=service&article_code=414893

Order

414,893

Type

Original Research Articles.

Type Code

1,784

Publication Type

Journal

Publication Title

Mansoura Journal for Computer and Information Sciences

Publication Link

https://mjcis.journals.ekb.eg/

MainTitle

Enhancing Fraud Detection in Imbalanced Datasets: A Comparative Study of Machine Learning and Deep Learning Algorithms with SMOTE Preprocessing

Details

Type

Article

Created At

09 Mar 2025