BDT: A Novel Approach to Handle Imbalanced Data in Machine Learning Models
Keywords:
Data Imbalance, Machine Learning, Under-Sampling, Over-Sampling, Model Performance, Algorithm Adjustment, Imbalanced Data Correction TechniqueAbstract
In the realm of machine learning and data science, the issue of imbalanced datasets presents a significant challenge, often leading to biased models and inaccurate predictions. This research introduces a novel technique aimed at mitigating the effects of data imbalance, thereby enhancing model performance across various metrics. Through a rigorous examination of existing imbalance correction methods, this study identifies key gaps and proposes an innovative approach: Balanced Data Technique (BDT) that combines under-sampling, over-sampling, and algorithmic adjustment methods in a unique framework. Employing a comprehensive experimental setup across multiple imbalanced datasets, the technique demonstrates superior performance in comparison to established methods, as evidenced by improved accuracy, precision, and recall scores. This paper details the development process of the technique, from theoretical underpinnings through to practical implementation and testing. The implications of this research are far-reaching, offering potential improvements in fields where imbalanced data is prevalent. By addressing this fundamental issue, the proposed technique contributes to the advancement of more equitable and effective machine learning models.
Downloads
References
Chawla, N.V., Bowyer, K.W., Hall, L.O., & Kegelmeyer, W.P. (2002). SMOTE: Synthetic Minority Over-sampling Technique. Journal of Artificial Intelligence Research, 16, 321-357.
He, H., & Garcia, E.A. (2009). Learning from Imbalanced Data. IEEE Transactions on Knowledge and Data Engineering, 21(9), 1263-1284.
Kotsiantis, S., Kanellopoulos, D., & Pintelas, P. (2006). Handling Imbalanced Datasets: A Review. GESTS International Transactions on Computer Science and Engineering, 30(1), 25-36.
Fernandez, A., Garcia, S., Herrera, F., & Chawla, N.V. (2018). SMOTE for Learning from Imbalanced Data: Progress and Challenges, Marking the 15-year Anniversary. Journal of Artificial Intelligence Research, 61, 863-905.
Han, J., Pei, J., & Kamber, M. (2011). Data Mining: Concepts and Techniques. 3rd ed. Morgan Kaufmann.
Sun, Y., Wong, A.K.C., & Kamel, M.S. (2009). Classification of Imbalanced Data: A Review. International Journal of Pattern Recognition and Artificial Intelligence, 23(4), 687-719.
Bunkhumpornpat, C., Sinapiromsaran, K., & Lursinsap, C. (2009). Safe-Level-SMOTE: Safe-Level-Synthetic Minority Over-sampling TEchnique for Handling the Class Imbalanced Problem. Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining, 475-482.
Batista, G.E.A.P.A., Prati, R.C., & Monard, M.C. (2004). A Study of the Behavior of Several Methods for Balancing Machine Learning Training Data. SIGKDD Explorations Newsletter, 6(1), 20-29.
Menardi, G., & Torelli, N. (2014). Training and Assessing Classification Rules with Imbalanced Data. Data Mining and Knowledge Discovery, 28(1), 92-122.
Garcia, S., Herrera, F. (2015). Evolutionary Under-Sampling for Classification with Imbalanced Datasets: Proposals and Taxonomy. Evolutionary Computation, 17(3), 275-306.
Hossin, M., & Sulaiman, M.N. (2015). A Review on Evaluation Metrics for Data Classification Evaluations. International Journal of Data Mining & Knowledge Management Process, 5(2), 1-11.
Krawczyk, B. (2016). Learning from Imbalanced Data: Open Challenges and Future Directions. Progress in Artificial Intelligence, 5(4), 221-232.
Lemaitre, G., Nogueira, F., & Aridas, C.K. (2016). Imbalanced-learn: A Python Toolbox to Tackle the Curse of Imbalanced Datasets in Machine Learning. Journal of Machine Learning Research, 18(17), 1-5.
Haixiang, G., Yijing, L., Shang, J., Mingyun, G., Yuanyue, H., & Bing, G. (2017). Learning from Class-Imbalanced Data: Review of Methods and Applications. Expert Systems with Applications, 73, 220-239.
Zhou, Z.H., & Liu, X.Y. (2006). Training Cost-sensitive Neural Networks with Methods Addressing the Class Imbalance Problem. IEEE Transactions on Knowledge and Data Engineering, 18(1), 63-77.
Charte, F., Rivera, A.J., del Jesus, M.J., & Herrera, F. (2015). Addressing Imbalance in Multilabel Classification: Measures and Random Resampling Algorithms. Neurocomputing, 163, 3-16.
Liu, X.Y., Wu, J., & Zhou, Z.H. (2009). Exploratory Undersampling for Class-Imbalance Learning. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 39(2), 539-550.
Buda, M., Maki, A., & Mazurowski, M.A. (2018). A Systematic Study of the Class Imbalance Problem in Convolutional Neural Networks. Neural Networks, 106, 249-259.
Johnson, J.M., & Khoshgoftaar, T.M. (2019). Survey on Deep Learning with Class Imbalance. Journal of Big Data, 6(1), 27.
Wei, J., & Zou, K. (2019). EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, 6382-6388.
S. Barua , M.M. Islam , X. Yao , K. Murase , Mwmote–majority weighted minority oversampling technique for imbalanced data set learning, IEEE Trans. Knowl. Data Eng. 26 (2) (2014) 405–425 .
M. Bekkar , H.K. Djemaa , T.A. Alitouche , Evaluation measures for models assessment over imbalanced data sets, J. Inf. Eng. Appl. 3 (10) (2013).
P. Branco , L. Torgo , R.P. Ribeiro , A survey of predictive modeling on imbalanced domains, ACM Comput. Surv. (CSUR) 49 (2) (2016) 31 .
C. Bunkhumpornpat , K. Sinapiromsaran , Dbmute: density-based majority under-sampling technique, Knowl. Inf. Syst. 50 (3) (2017) 827–850 .
Pattaramon Vuttipittayamongkol , Eyad Elyan: Neighbourhood-based undersampling approach for handling imbalanced and overlapped data, Information Sciences 509 (2020) 47–70.
Bartosz Krawczyk: Learning from imbalanced data: open challenges and future directions, Prog Artif Intell (2016) 5:221–232.
Behzad Mirzaei , Bahareh Nikpour , Hossein Nezamabadi-pour: CDBH: A clustering and density-based hybrid approach for imbalanced data classification, Expert Systems With Applications 164 (2021) 114035, https://doi.org/10.1016/j.eswa.2020.114035
Downloads
Published
How to Cite
Issue
Section
License

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
All papers should be submitted electronically. All submitted manuscripts must be original work that is not under submission at another journal or under consideration for publication in another form, such as a monograph or chapter of a book. Authors of submitted papers are obligated not to submit their paper for publication elsewhere until an editorial decision is rendered on their submission. Further, authors of accepted papers are prohibited from publishing the results in other publications that appear before the paper is published in the Journal unless they receive approval for doing so from the Editor-In-Chief.
IJISAE open access articles are licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. This license lets the audience to give appropriate credit, provide a link to the license, and indicate if changes were made and if they remix, transform, or build upon the material, they must distribute contributions under the same license as the original.