The Impact of Enhanced Space Forests with Classifier Ensembles on Biomedical Dataset Classification

Authors

  • Zeynep Hilal Kilimci Dogus University
  • Sevinc Ilhan Omurca Kocaeli University

DOI:

https://doi.org/10.18201/Ilhan

Keywords:

Classifier Ensembles, Enhanced Space Forests, Ensemble Algorithms

Abstract

In this paper, we propose to improve the classification success of classifier ensembles by investigating the contribution of enhanced space forests on biomedical datasets. For this purpose, this study especially is focused on enhanced feature spaces by implementing the most popular feature selection techniques, namely information gain (IG), and chi-square (CHI). After performing these methods on the feature space, training phase is evaluated with all the original and the most significant features. That is, the new training dataset is constructed by combining the original features and the new ones. Then, the training is done with the well-known classification algorithm namely decision tree, using the enhanced feature space. Finally, three types of ensemble algorithms, namely bagging, random subspace, and random forest are carried out. A wide range of comparative experiments are conducted on publicly available and widely-used 36 datasets from the UCI machine learning repository to observe the impact of the enhanced space forests with classifier ensembles. Experiment results demonstrate that the proposed enhanced space forests perform better classification accuracy than the state of the art studies. Approximately, 1% - 3% improvement of the classification success is an indicator that our proposed technique is efficient.

Downloads

Download data is not yet available.

Author Biographies

Zeynep Hilal Kilimci, Dogus University

Computer Engineering

Sevinc Ilhan Omurca, Kocaeli University

Computer Engineering

References

M. F. Amasyali and O. K. Ersoy, “Classifier ensembles with the extended space forest,” IEEE Trans. Knowl. Data Eng., vol. 26, no. 3, pp. 549–562, 2014.

M. N. Adnan, M. Z. Islam, and P. W. H. Kwan, “Extended space decision tree,” in Proc. Machine Learning and Cybernetics, Lanzhou, China, 2014, pp. 219–230.

B. Peralta and A. Soto, “Embedded local feature selection within mixture of experts” Inform. Sciences, vol. 269, pp. 176–187, 2014.

F. N. Koutanaei, H. Sajedi, and M. Khanbabaei, “A hybrid data mining model of feature selection algorithms and ensemble learning classifiers for credit scoring,” J. Retailing and Consumer Servic., vol. 27, pp. 11-23, 2015.

D. Gopika and B. Azhagusundari B, “An analysis on ensemble methods in classification tasks”, International Journal of Advanced Research in Computer and Communication Engineering, vol. 3, no. 7, pp. 7423–7427, 2014.

Y. Ren, L. Zhang, and P. N. Suganthan, “Ensemble classification and regression-recent developments, applications and future directions,” IEEE Comput. Intell. Mag., vol. 11, no. 1, pp. 41–53, 2016.

L. Mesin, A. Munera, and E. Pasero, “A low cost ecg biometry system based on an ensemble of support vector machine classifiers,” Adv. Neural Networks, vol. 54, pp. 425–433, 2016.

M. Zamani, H. Beigy, and A. Shaban, “Cascading randomized weighted majority: a new online ensemble learning algorithm,” J. Intell. Data Anal., vol. 20, no. 4, pp. 877–889, 2016.

J. V. Lochter, R. F. Zanettib, D. Rellera, and T. A. Almeidaa, “Short text opinion detection using ensemble of classifiers and semantic indexing,” Expert Syst. Appl., vol. 62, pp. 243–249, 2016.

M. N. Adnan and M. Z. A. Islam, “Comprehensive method for attribute space extension for random forest,” in International Conference on Computer and Information Technology, Dhaka, Bangladesh, 2014, pp. 25–29.

A. Ahmed and G. Brown, “Random projection random discretization ensembles - ensembles of linear multivariate decision trees,” IEEE T. Knowl. Data En., vol. 26, no. 5, pp. 1225–1239, 2014.

L. Liu, B. Wang, Q. Zhong, and H. Zeng, “A selective ensemble method based on k-means method,” in International Conference on Computer Science and Network Technology, Harbin, China, 2015, pp. 665–668.

S. Deepan and D. Menaka, “Ensemble classification of urban regions using hyperspectral remote sensed scenes,” Middle-East J. Sci. Res., vol. 24, no. S1, pp. 49–54, 2016.

D. Mera, M. Fernández-Delgado, J. M. Cotos, J. R. R. Viqueira, and S. Barro, “Comparison of a massive and diverse collection of ensembles and other classifiers for oil spill detection in sar satellite images,” J. Neural Comp. Appl., vol. 27, no. 139, pp. 1–17, 2016.

L. Breiman, “Bagging predictors,” Mach. Learn., vol. 24, no. 2, pp. 123–140, 1996.

G. Wang, Z. Zhang, J. Sun, S. Yang, and C. A. Larson, “POS-RS: A random subspace method for sentiment classification based on part-of-speech analysis,” Inform. Process. Manag., vol. 51, no. 4, pp. 458–479, 2015.

R. Farzi and V. Bolandi, “Estimation of organic facies using ensemble methods in comparison with conventional intelligent approaches: A case study of the south pars gas field Persian Gulf, Iran”, Journal of Modeling Earth Systems and Environment, vol. 2, no. 6, pp. 105–118, 2016.

A. Onan, S. Korukoglu, and H. Bulut, “Ensemble of keyword extraction methods and classifiers in text classification,” Expert Syst. Appl., vol. 57, pp. 232–247, 2016.

K. Grzesiak-Kopeć, M. Ogorzałek, and L. Nowak, “Computational classification of melanocytic skin lesions,” in International Conference on Artificial Intelligence and Soft Computing, Zakopane, Poland, 2016, pp. 169–178.

T. K. Ho, “The random subspace method for constructing decision forests,” IEEE T. Pattern Anal., vol. 20, no. 8, pp. 832–844, 1998.

A. Onan, “Classifier and feature set ensembles for web page classification,”, J. Inf. Sci., vol. 42, no. 2, pp. 150–165, 2015.

D. Aldogan and Y. Yaslan, “A comparison study on ensemble strategies and feature sets for sentiment analysis,” in International Symposium on Computer and Information Sciences, London, UK, 2015, pp. 359–370.

L. Breiman, “Random forests,” Mach. Learn., vol. 45, no. 1, pp. 5–32, 2001.

F. M. Belem, E. F. Martins, J. M. Almeida, and M. A. Gonçalves, “Personalized and object-centered tag recommendation methods for web 2.0 applications”, Inf. Process. Manag., vol. 50, no. 4, pp. 524–553, 2014.

A. P. Jain and V. D. Katkar, “Sentiments analysis of twitter data using data mining,” in International Conference on Information Processing, Pune, India, 2015, pp. 807–810.

R. R. Rejimol Robinson and C. Thomas, “Ranking of machine learning algorithms based on the performance in classifying ddos attacks,” IIEE Recent Advances in Intelligent Computational Systems, Trivandrum, Kerala, India, 2015, pp. 185–190.

D. J. Dittman, T. M. Khoshgoftaar, and A. Napolitano, “The effect of data sampling when using random forest on imbalanced bioinformatics data,” International Conference on Information Reuse and Integration, San Francisco, CA, 2015, pp. 457–463.

M. N. M. García, J. C. B. Herráez, M. S. Barba, and F. S. Hernández, “Random forest based ensemble classifiers for predicting healthcare-associated infections in intensive care units,” in International Conference on Distributed Computing and Artificial Intelligence, Sevilla, Spain, 2016, pp. 303–311.

M. Lichman, UCI Machine Learning Repository, Available: http://archieve.ics.uci.edu/ml, University of California, Irvine, School of Information and Computer Science, 2013.

V. Singh and M. A. Pradhan, “Advanced methodologies employed in ensemble of classifiers: a survey,” Int. J. Sci. Res., vol. 3, no. 12, pp. 591–595, 2014.

N. Rooney, H. Wang, P. S. Taylor, “An investigation into the application of ensemble learning for entailment classification,” Inf. Process. Manag., vol. 50, no. 1, pp. 87–103, 2014.

Z. Zheng, X. Wu, and R. Srihari, “Feature selection for text categorization on imbalanced data,” SIGKDD Explorations, vol. 6, no. 1, pp. 80–89, 2004.

A. Abu-Errub, “Arabic text classification algorithm using tfidf and chi square measurements,” Int. J. Comput. Appl., vol. 93, no. 6, pp. 40–45, 2014.

V. Chauraisa and S. Pal, “Data mining techniques: to predict and resolve breast cancer survivability,” in International Journal of Computer Science and Mobile Computing, vol. 3, no. 1, pp. 10–22, 2014.

A. G. Neha, “A novel clustering approach based sentiment analysis of social media data,” Int. J. Eng. Dev. Res., vol. 3, no. 4, pp. 1099–1107, 2015.

A. K. Uysal and S. Gunal, “The impact of preprocessing on text classification,” Inf. Process. Manag., vol. 50, no. 1, pp. 104–112, 2014.

N. Rachburee and W. Punlumjeak, “A comparison of feature selection approach between greedy, ig-ratio, chi-square, and mrmr in educational mining,” in International Conference on Information Technology and Electrical Engineering, Chiang Mai, Thailand, 2015, pp. 420–424.

M. A. Siddiqui, “An empirical evaluation of text classification and feature selection methods,” Artif. Intel. Res., vol. 5, no. 2, pp. 70–81, 2016.

E. Zorarpacı, S. A. Özel, “A hybrid approach of differential evolution and artificial bee colony for feature selection,” Expert Syst. Appl., no. 62, pp. 91-103, 2016.

Z. H. Kilimci and M. C. Ganiz, “Evaluation of classification models for language processing” in International Symposium on INnovations in Intelligent SysTems and Applications, Madrid, Spain, 2015, pp. 1–8.

Z. H. Kilimci and S. Akyokus, “N-gram pattern recognition using multivariate bernoulli model with smoothing methods for text classification,” IEEE Signal Processing and Communications Applications Conference, Zonguldak, Turkey, 2016, pp. 79–82.

D. D. Margineantu and T. G., “Dietterich pruning adaptive boosting,” in International Conference on Machine Learning, San Francisco, USA, 1997, pp. 211-218.

Downloads

Published

29.06.2018

How to Cite

Kilimci, Z. H., & Ilhan Omurca, S. (2018). The Impact of Enhanced Space Forests with Classifier Ensembles on Biomedical Dataset Classification. International Journal of Intelligent Systems and Applications in Engineering, 6(2), 144–150. https://doi.org/10.18201/Ilhan

Issue

Section

Research Article