Improving Intrusion Detection Performance with Genetic Algorithm-Based Feature Extraction and Ensemble Machine Learning Methods
Keywords:
EM’s Classifier, GA Feature Selection, UNSW-NB15 dataset, Intrusion detectionAbstract
The Internet of Things (IoT) has transformed our world by offering enhanced accessibility, connectivity, and convenience in our daily lives. It facilitates the seamless flow of vast amounts of data among interconnected devices, creating a network that is susceptible to diverse network attacks and intrusions. Developing an efficient IDS (Intrusion Detection System) for IoT networks is a challenging task primarily due to two reasons: the massive amount of aggregated data and the diverse nature of IoT devices. Traditional IDS approaches struggle to handle and analyze this data in real time. Hence, there is a growing demand for advanced IDS techniques that leverage ML or DL methods. This study specifically focuses on intrusion detection in IoT networks, utilizing the UNSW-NB15 dataset. The UNSW-NB15 dataset is a well-known and publicly available dataset that is widely used for evaluating the effectiveness of IDS algorithms. The main purpose of the current work is to enhance the performance of intrusion detection by integrating feature extraction techniques based on genetic algorithms (GA) and ensemble machine learning algorithms (EM’s). By leveraging these approaches, the study aims to improve the accuracy and effectiveness of detecting intrusions in IoT networks. Feature extraction is a crucial step in IDS, as it aims to reduce the dimensionality of the dataset while retaining relevant information. Genetic algorithms, known for their optimization capabilities, are employed to search for an optimal subset of features that maximize the discriminatory power of the IDS. To achieve this, a framework is proposed that integrates genetic algorithms with various ensemble ML techniques, including random forests, Extra-Trees, XGBoost, AdaBoost, and stacking. The GA selects a subset of features from the UNSW-NB15 dataset, and the ensemble ML models are trained and evaluated using these selected features and calculate accuracy.
Downloads
References
K. Lueth, “State of the IoT 2018: Number of IoT devices now at 7B – Market accelerating.” https://iot-analytics.com/state-of-the-iot-update-q1-q2-2018- number-of-iot-devices-now-7b /(accessed May 27, 2020).
R. McKay, B. Pendleton, and J. Britt, “Machine Learning Algorithms on Botnet Traffic: Ensemble and Simple Algorithms,” Proceedings of the 2019 3rd International Conference on Compute and Data Analysis, p. 5, 2019.
M. Aldwairi, W. Mardini, A. Alhowaide, Anomaly Payload Signature Generation System Based on Efficient Tokenization Methodology, International Journal on Communications Antenna and Propagation (IRECAP) (2018) (Nov. 2018).
T. Mohamed, T. Otsuka, T. Ito, Towards Machine Learning Based IoT Intrusion Detection Service,” Recent Trends and Future Technology in Applied Intelligence. IEA/AIE 2018, Lecture Notes in Computer Science 10868 (May 2018), https://doi.org/10.1007/978-3-319-92058-0_56.
I. Butun, S.D. Morgera, R. Sankar, A Survey of Intrusion Detection Systems in Wireless Sensor Networks, IEEE Communications Surveys Tutorials 16 (1) (2014) 266–282, https://doi.org/10.1109/SURV.2013.050113.00191. First.
C. Zhang, Y. Ma (Eds.), Ensemble Machine Learning: Methods and Applications, Springer-Verlag, New York, 2012, https://doi.org/10.1007/978-1-4419-9326- 7.
S. Raschka, Python Machine Learning - Second Edition, Packt Publishing, 2017. Accessed: Nov. 19,
. Khammassi C, Krichen S. A GA-LR wrapper approach for feature selection in network intrusion detection. Comput Secur 2017;70:255–77.
Osanaiye O, Cai H, Choo K-KR, Dehghantanha A, Xu Z, Dlodlo M. Ensemble-based multi-flter feature selection method for DDOS detection in cloud computing. EURASIP J Wirel Commun Netw. 2016;20]16(1):130.
Ambusaidi MA, He X, Nanda P, Tan Z. Building an intrusion detection system using a flter-based feature selection algorithm. IEEE Trans Comput. 2016; 65(10):2986–98.
Y. Zhou, M. Han, L. Liu, J.S. He, Y. Wang, Deep learning approach for cyberattack detection, in: IEEE INFOCOM 2018 - IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS), Apr. 2018, pp. 262–267, https://doi.org/10.1109/INFCOMW.2018.8407032.
S.T. Miller, C. Busby-Earle, Multi-Perspective Machine Learning a Classifier Ensemble Method for Intrusion Detection, in: Proceedings of the 2017 International Conference on Machine Learning and Soft Computing - ICMLSC ’17, Ho Chi Minh City, Vietnam, 2017, pp. 7–12, https://doi.org/10.1145/3036290.3036303.
B.A. Tama, M. Comuzzi, K.-H. Rhee, TSE-IDS: A Two-Stage Classifier Ensemble for Intelligent Anomaly-Based Intrusion Detection System, IEEE Access 7 (Jul. 2019) 94497–94507, https://doi.org/10.1109/ACCESS.2019.2928048.
M. Aloqaily, S. Otosum, I.A. Ridhawi, Y. Jararweh, An intrusion detection system for connected vehicles in smart cities, Ad Hoc Networks 90 (Jul. 2019), 101842, https://doi.org/10.1016/j.adhoc.2019.02.001.
A.J. Siddiqui, A. Boukerche, TempoCode-IoT: temporal codebook-based encoding of flow features for intrusion detection in Internet of Things, Cluster Comput (Sep. 2020), https://doi.org/10.1007/s10586-020-03153-8.
Connelly L. Logistic regression. Medsurg Nurs. 2020;29(5):353–4.
Gao J, Chai S, Zhang B, Xia Y. Research on network intrusion detection based on incremental extreme learning machine and adaptive principal component analysis. Energies 2019;12(7):1223.
Almogren AS. Intrusion detection in edge-of-things computing. J Parallel Distrib Comput. 2020;137:259–65.
Jiang K, Wang W, Wang A, Wu H. Network intrusion detection combined hybrid sampling with deep hierarchical network. IEEE Access. 2020; 8:32464–476
[20]. Khan NM, Negi A, Thaseen IS, et al. Analysis on improving the performance of machine learning models using feature selection technique. In: International conference on intelligent systems design and applications. Springer; 2018. pp. 69–77
Huibing Wang, Jinbo Xiong, Zhiqiang Yao, Mingwei Lin, and Jun Ren. Research survey on support vector machine. In Proceedings of the 10th EAI International Conference on Mobile Multimedia Communications, pages 95–103, 2017.
Mohammad Marufur Rahman, Md Islam, Md Manik, Motaleb Hossen, Mabrook S Al-Rakhami, et al. Machine learning approaches for tackling novel coronavirus (covid-19) pandemic. Sn Computer Science, 2(5):1–10, 2021.
Mr Brijain, R Patel, Mr Kushik, and K Rana. A survey on decision tree algorithm for classifcation. International Journal of Engineering Development and Research, IJEDR, 2(1), 2014.
Breiman L. Random forests Machine learning. 2001;45(1):5–32.
Trevor Hastie, Robert Tibshirani, and Jerome Friedman. Random forests. In The elements of statistical learning, pages 587–604. Springer, 2009.
Belouch M, El Hadaj S, Idhammad M. A two-stage classifer approach using reptree algorithm for network intrusion detection. Int J Adv Comput Sci Appl. 2017;8(6):389–94
Gao J, Chai S, Zhang B, Xia Y. Research on network intrusion detection based on incremental extreme learning machine and adaptive principal component analysis. Energies 2019;12(7):1223.
Ahmad, M.W.; Reynolds, J. and Rezgui; Y. Predictive modelling for solar thermal energy systems: A comparison of support vector regression, random forest, extra trees and regression trees. Journal of cleaner production, 2 ¯ 018, 203, 810–821.
Alsariera, Y.A.; Adeyemo, V.E.; Balogun, A.O. and Alazzawi, A.K. AI meta-learners and extra-trees algorithm for the detection of phishing websites. IEEE Access, 2 ¯ 020, 8, 142532–142542.
Devan, P. and Khare, N., 2020. An efficient XGBoost–DNN-based classification model for network intrusion detection system. Neural Computing and Applications,2 ¯ 020, 1–16.
Scikit-Learn: Ensemble Gradient Boosting Classifier. Available online: https://scikit-learn.org/stable/modules/generated/ sklearn.ensemble.GradientBoostingClassifier.html (accessed on 21 May 2021
Wolpert, D.H. Stacked generalization. Neural Netw. 1992, 5, 241–259
F. Amato, N. Mazzocca, F. Moscato and E. Vivenzio, "Multilayer Perceptron: An Intelligent Model for Classification and Intrusion Detection," 2017 31st International Conference on Advanced Information Networking and Applications Workshops (WAINA), Taipei, Taiwan, 2017, pp. 686-691, doi: 10.1109/WAINA.2017.134.
Moustafa N, Slay J. UNSW-NB15: a comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set). In: 2015 military communications and information systems conference (MilCIS). IEEE; 2015. pp. 1–6.
Anwer, H. M., Farouk, M., & Abdel-Hamid, A. (2018, April). A framework for efficient network anomaly intrusion detection with features selection. In 2018 9th International Conference on Information and Communication Systems (ICICS) (pp. 157-162). IEEE
Hauke, J., & Kossowski, T, Correlations between variables can be measured with the use of different indices (coefficients). The three most popular are: Pearson’s coefficient, Spearman’s rho coefficient, and Kendall’s tau coefficient (2011)
Scikit Learn, Machine Learning in Python. https://scikit-learn.org/stable. Accessed 26 Sept 2020.
Kapoor, E. ., Kumar, A. ., & Singh , D. . (2023). Energy-Efficient Flexible Flow Shop Scheduling With Due Date and Total Flow Time. International Journal on Recent and Innovation Trends in Computing and Communication, 11(2s), 259–267. https://doi.org/10.17762/ijritcc.v11i2s.6145
Omondi, P., Rosenberg, D., Almeida, G., Soo-min, K., & Kato, Y. A Comparative Analysis of Deep Learning Models for Image Classification. Kuwait Journal of Machine Learning, 1(3). Retrieved from http://kuwaitjournals.com/index.php/kjml/article/view/128
Soundararajan, R., Stanislaus, P. M., Ramasamy, S. G., Dhabliya, D., Deshpande, V., Sehar, S., & Bavirisetti, D. P. (2023). Multi-channel assessment policies for energy-efficient data transmission in wireless underground sensor networks. Energies, 16(5) doi:10.3390/en16052285 Talukdar, V., Dhabliya, D., Kumar, B., Talukdar, S. B., Ahamad, S., & Gupta, A. (2022).
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2023 Gunupusala Satyanarayana, Kaila Shahu Chatrapathi
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
All papers should be submitted electronically. All submitted manuscripts must be original work that is not under submission at another journal or under consideration for publication in another form, such as a monograph or chapter of a book. Authors of submitted papers are obligated not to submit their paper for publication elsewhere until an editorial decision is rendered on their submission. Further, authors of accepted papers are prohibited from publishing the results in other publications that appear before the paper is published in the Journal unless they receive approval for doing so from the Editor-In-Chief.
IJISAE open access articles are licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. This license lets the audience to give appropriate credit, provide a link to the license, and indicate if changes were made and if they remix, transform, or build upon the material, they must distribute contributions under the same license as the original.