An Ensemble Machine Learning With Feature Selection Methods for Detecting Phishing Attacks

Document Type : Original Article

Authors

1 information technology, computers and information, menofia University, Menofa, Egypt

2 Information Technology department, Faculty of computers and information, Menoufia University, Egypt

3 Information Technology Department Faculty of Computers and Information Menoufia University, Egypt

Abstract

Attacks against the internet have increased due to the development of the internet and its application. Phishing attacks are among internet users' most common threats, Especially during COVID-19 with remote working. Phishing is a security attack in which attackers obtain people's personal information, such as user IDs and passwords, through fake websites, emails, and malicious URLs. Then, they use this information to carry out numerous attacks and to commit illegal transactions. In this work; an ensemble Machine learning technique was proposed to classify legitimate websites from phishing ones. The datasets used for evaluation are representative of real-world phishing and benign websites.Dataset1 named (phishing website dataset) consists of 11,055 URLs (6157 are phishing URLs, 4898 are legitimate URLs), and Dataset2 named (Phishing_Legitimate_full) consists of 10,000 URLs (5000 are phishing URLs, 5000 are legitimate URLs). The ensemble methods with voting algorithms have been used to form the ensemble models to improve the detection accuracy. The results indicate that the proposed ensemble model (KNN+RF+PART) achieved the highest accuracy rate of 98.65% compared to the existing ensemble models. This is due to the strong performance of the individual algorithms in detecting phishing, making this combination Effective in detection. Also, the feature selection methods were used to choose top-rank features for improving classification performance. Applying feature selection methods to individual classifiers significantly enhanced their performance, with accuracy rates reaching up to 98.50%, compared to individual classifiers without feature selection. Thus, the proposed framework can be a viable alternative for predicting phishing attacks.

Keywords