ANALYSIS OF SINGLE AND ENSEMBLE MACHINE LEARNING CLASSIFIERS FOR PHISHING ATTACKS DETECTION

Authors

  • Oyelakin A. M Department of Computer Science, Faculty of Natural and Applied Sciences Al-Hikmah University, Ilorin, Nigeria
  • Alimi O. M Department of Computer Science, Faculty of Natural and Applied Sciences Al-Hikmah University, Ilorin, Nigeria
  • Mustapha I. O Department of Computer Science, Faculty of Natural and Applied Sciences Al-Hikmah University, Ilorin, Nigeria
  • Ajiboye I. K Abdulraheem College of Advanced Studies, Igbaja, Nigeria

DOI:

https://doi.org/10.15282/ijsecs.7.2.2021.5.0088

Keywords:

Phishing Attacks, Internet Security, Ensemble Machine Learning Algorithms, Classification

Abstract

Phishing attacks have been used in different ways to harvest the confidential information of unsuspecting internet users. To stem the tide of phishing-based attacks, several machine learning techniques have been proposed in the past. However, fewer studies have considered investigating single and ensemble machine learning-based models for the classification of phishing attacks. This study carried out performance analysis of selected single and ensemble machine learning (ML) classifiers in phishing classification.The focus is to investigate how these algorithms behave in the classification of phishing attacks in the chosen dataset. Logistic Regression and Decision Trees were chosen as single learning classifiers while simple voting techniques and Random Forest were used as the ensemble machine learning algorithms. Accuracy, Precision, Recall and F1-score were used as performance metrics. Logistic Regression algorithm recorded 0.86 as accuracy, 0.89 as precision, 0.87 as recall and 0.81 as F1-score. Similarly, the Decision Trees classifier achieved an accuracy of 0.87, 0.83 for precision, 0.88 for recall and 0.81 for F1-score. In the voting ensemble, accuracy of 0.92 was achieved. 0.90 was obtained for precision, 0.92 for recall and 0.92 for F1-score. Random Forest algorithm recorded 0.98, 0.97, 0.98 and 0.97 as accuracy, precision, recall and F1-score respectively. From the experimental analyses, Random Forest algorithm outperformed simple averaging classifier and the two single algorithms used for phishing url detection. The study established that the ensemble techniques that were used for the experimentations are more efficient for phishing url identification compared to the single classifiers.

 

References

APWG (2020). Phishing Activity Trends Report for Q1 2020 retrieved from

https://docs.apwg.org/reports/apwg_trends_report_q1_2020.pdf

A. Alswailem, B. Alabdullah , N. Alrumayh, and A. Alsedrani. Detecting phishing websites using machine learning. In 2019

nd International Conference on Computer Applications Information Security (ICCAIS), 1–6, (2019)

Akshay Sushena Manjeri, Kaushik R., MNV Ajay, C. Nair Priyanka. A Machine Learning Approach for Detecting

Malicious Websites using URL Features. 2019 3rd International Conference on Electronics, Communication and Aerospace

Technology (ICECA), 555–561,(2019). https://doi.org/10.1109/iceca.2019.8821879

A. M. Oyelakin , O. M. Alimi, Tosho Abdulrauf. A Comparative Analysis of Machine Learning Algorithms for Detecting

Phishing Urls, Journal of Computer Science and Control Systems, Oredia University, Romania, 13(2):16-19, (2020)

available at https://electroinf.uoradea.ro/index.php/jcscs/12-cercetare/reviste/jcscs/213-1st-issue-vol-13-nr-2.html

M. A. Hall. Correlation-based Feature Selection for Machine Learning, (1999) PhD Thesis at University of Waikato

A. Chaudhary, S. Kolhe, & R. Kamal. An improved Random Forest Classifier for multi-class classification. Information

Processing in Agriculture, (September 2016). https://doi.org/10.1016/j.inpa.2016.08.002

M. Zakariah. Classification of large datasets using Random Forest Algorithm in various applications : Survey. International

Journal of Engineering and Innovative Technology (IJEIT), 4(3), 189–198. (2014)

E. Bauer and R. Kohavi. An empirical comparison of voting classification algorithms: Bagging, boosting and variants.

Machine Learning, 36(1/2):525–536 (1999)

L. Breiman. Stacked regressions. Machine Learning, 24(1), 49–64. (1996)

Y. Freund & R. Schapire. Experiments with a new boosting algorithm. In Proceedings of the Thirteenth International

Conference on Machine Learning (1996), pp. 148–156 Bari, Italy.

Sagar Patil, Yogesh Shetye, Nilesh Shendage. Detecting Phishing Websites Using Machine Learning, International

Research Journal of Engineering and Technology (IRJET),7(2) (2020)

V. Shahrivari, M. D. Muhammad and I. Muhammad. Phishing Detection Using Machine Learning Techniques, available at

https://arxiv.org/pdf/2009.11116.pdf (2020)

D. Jampen, G. Gür, T. Sutter & B. Tellenbach. Don ’ t click : towards an effective anti - phishing training . A comparative

literature review. In Human-centric Computing and Information Sciences (2020). https://doi.org/10.1186/s13673-020-

-7

Rami Mohammad, T.L. McCluskey and Fadi Abdeljaber Thabtah. Intelligent Rule based Phishing Websites Classification.

IET Information Security, 8 (3), 153-160. ISSN 1751-8709, (2014). Available at https://archive.ics.uci.edu/ml/machinelearning-databases/00327/

L. Breiman. Random Forests, Machine Learning, 45(1), 5-32, (2001). Available at:

https://doi.org/10.1023/A:1010933404324

M. E. Fenner. Machine Learning with Python for Everyone (2020). Free Sample Chapter, Addison Wesley Data and

Analytics Series

Published

2021-10-11

How to Cite

A. M, O., O. M, A., I. O, M., & I. K, A. (2021). ANALYSIS OF SINGLE AND ENSEMBLE MACHINE LEARNING CLASSIFIERS FOR PHISHING ATTACKS DETECTION. International Journal of Software Engineering and Computer Systems, 7(2), 44–49. https://doi.org/10.15282/ijsecs.7.2.2021.5.0088