A Comparative Study for Different Resampling Techniques for Imbalanced datasets

Elsobky, Alaa Mahmoud; Keshk, Arabi ELsaid; Malhat, Mohamed Gaber

doi:10.21608/ijci.2023.236287.1136

A Comparative Study for Different Resampling Techniques for Imbalanced datasets

Document Type : Original Article

Authors

menoufia

10.21608/ijci.2023.236287.1136

Abstract

The imbalanced data is a significant challenge for

researchers in supervised machine learning. Current data mining algorithms are not effective for processing imbalanced data.

In fact, this problem reduces classification accuracy because the

prediction of minority classes is inaccurate. The classification

of imbalanced data is the major challenge that has received

significant attention. Therefore, The use of sampling techniques

to improve classification performance has been a significant

consideration in related work. In this paper, a comparative

study of six different sampling algorithms is performed. The

employed sampling algorithms are from different sampling

techniques: two oversampling algorithms, two undersampling

algorithms, and two combination algorithms between oversampling and undersampling. The techniques used in oversampling

are random oversampling and SMOTE, while undersampling

techniques are random undersampling and a near miss. A

combination of oversampling and undersampling techniques

is SMOTE TOMEK and SMOTEEN. This comparative study

aims to examine the impact of the employed sampling method.

Algorithms on the performance of three classifiers: SVM, KNN,

and logistic regression. Cross-validation experiments on 12

standard datasets show that the SMOTEEN sampling The

algorithm achieves significant improvements compared with

other typical algorithms.

Keywords

IJCI. International Journal of Computers and Information

Volume 10, Issue 3
Special issue for the proceedings of ICCI 2023 conference
November 2023
Pages 147-156

View on SCiNiTO

Article View: 56
PDF Download: 180

A Comparative Study for Different Resampling Techniques for Imbalanced datasets

Volume 10, Issue 3
Special issue for the proceedings of ICCI 2023 conference
November 2023
Pages 147-156

Files

Share

How to cite

Statistics

A Comparative Study for Different Resampling Techniques for Imbalanced datasets

Volume 10, Issue 3Special issue for the proceedings of ICCI 2023 conferenceNovember 2023Pages 147-156

Files

Share

How to cite

Statistics

Volume 10, Issue 3
Special issue for the proceedings of ICCI 2023 conference
November 2023
Pages 147-156