Feature Selection using FFS and PCA in Biomedical Data Classification with AdaBoost-SVM

: Recently, there has been an increasing trend to propose computer aided diagnosis systems for biomedical pattern recognition. A computer aided diagnosis method, which aims higher classification accuracy, is developed to classify the biomedical dataset. This process includes two types of machine learning algorithms: feature selection and classification. In this method, firstly, features were extracted from biomedical dataset, then the extracted features were classified by hybrid AdaBoost-Support Vector Machines (SVM) classifier structure. For feature selection, Forward Feature Selection (FFS) and Principal Component Analysis (PCA) algorithms were used, and the performance of the feature selection algorithms was tested by AdaBoost-SVM classifier. Following it, advantages and disadvantages of these algorithms were evaluated. Wisconsin Breast Cancer (WBC), Pima Diabetes (PD), Heart (Statlog) biomedical datasets were taken from UCI database and Electrocardiogram (ECG) signals were taken from Physionet ECG Database, and were used to test the proposed hybrid structure. The used two hybrid structures and other studies in the literature were compared with our findings. The obtained results show that the proposed hybrid structure has high classification accuracy for biomedical data classification.


Introduction
For the last fifty years, researchers in the field of Biomedical Engineering have tried to improve a Computer Aided Diagnosis which generally use artificial intelligence techniques for detecting biomedical problems. In some situations, a biomedical problem can be an identification of an illness according to examination results, but sometimes it can be a signification of a signal. In the literature, lots of classification techniques have been proposed for the solution of both problems. Some of these techniques are algorithms of Decision Trees, Boosting, Artificial Neural Networks, and SVM. There are also some implementations done by ensemble classifiers (Naive-Bayes classifiers, AdaBoost, Bagging, Rotational Forest) to increase the classification accuracy. The main idea is to find the weak classifier which has the highest performance in lots of weak classifiers and to increase the weights of these weak classifiers in the ensemble. Some studies and obtained results in the literature are presented briefly as follows: Yuan and Ma [1] proposed an AdaBoost-Genetic Algorithm system. The proposed algorithm was tested on benchmark datasets. The best performance on Breast Cancer dataset was obtained as 97.39%, and of Heart (Statlog) dataset as 83.09%. Dhakateet al. [2] introduced an ensemble feature selection approach to find the best first search feature selection algorithm to reduce the noise in the dataset. AdaBoost, Boosting and Bagging algorithms were used as ensemble classifiers, and the results were compared. The best performance for Breast Cancer on AdaBoost was obtained as 74.47%. Yunlong and Feng [3] designed an AdaBoost-kNN structure. In their system, some statistical regularity was obtained by AdaBoost. Then, kNN algorithm was run on feature space. The classification accuracy of proposed system on Breast Cancer was 96.44%; as 78.52% on Pima Diabetes. Chen et al. [4] modified the traditional AdaBoost method for One-Class Support Vector Machines. They used a binary class dataset from UCI benchmark and tested the proposed algorithm. The maximum classification performance on Breast Cancer was obtained as 97.03%. Lahiri and Biswas [5] proposed a new AdaBoost algorithm. In the proposed method, several learners were trained by ANNs on subsets of original feature spaces. With the proposed method, the classification performance on Breast Cancer dataset was obtained as 97.1%, and as 87.4% on Heart (Statlog) dataset. Huaxiang and Jing [6] proposed a fuzzy-boosting system. The C4.5 algorithm was used as a base classifier, and the proposed system obtained better results than AdaBoost and Bagging algorithm. The classification performance of proposed fuzzy-boosting system was obtained as 96.75% on Breast Cancer and as 77.32% on Pima Diabetes. Chen and Zhang [7] proposed a multiple Classifiers Ensemble based on Feature Selection (FSCE) in order to improve the classification performance. The proposed method was tested on UCI benchmark dataset [8], and the results were compared with AdaBoost algorithm. The best classification accuracy on Breast Cancer dataset was obtained as 97.13% by proposed FSCE method. Ghavidel et al. [9] proposed a new ensemble classifier generation method which aims to create more diverse base classifiers while making them more accurate. In their approach, training data for base classifiers were built by taking a bootstrap sample of the original training set. treated as a weak classifier in AdaBoost algorithm to constitute a strong classifier for binary classification problems. The proposed algorithm was applied to UCI dataset and the obtained results were examined. The best classification accuracy was obtained as 97.37% on Breast Cancer, and as 69.22% on Pima Diabetes. Shu and Wang [11] proposed a new AdaBoost-AC (accelerated) method for classification. The algorithm was used to acquire the weights of the weak classifiers. The classification performance of the Breast Cancer dataset was obtained as 75.36% by the proposed method. AdaBoost is mostly used for image classification in the literature. Also, there are many studies on biomedical data classification. In our study, we developed a hybrid algorithm for biomedical data classification. Firstly, we extracted features from dataset, and then classified by proposed AdaBoost-SVM algorithm. Two different feature extraction algorithms were used, and performance results obtained from AdaBoost SVM were compared. The aim of this study is to assess the performance of AdaBoost-SVM based on feature selection algorithms on the classification of the discontinuous dataset such as Breast Cancer, Pima Diabetes and classification of continuous dataset like ECG. The paper has four parts. The first part reviews studies in the literature which are relevant to feature selection and classification algorithms. Then the used feature extraction methods and classification methods were presented. In the third part, the experimental results were summarized and discussed. The paper concludes with the discussion of obtained results and suggestions for further research.

Materials and Methods
The proposed system was formed by a feature selection algorithm and a classifier system which is implemented by AdaBoost and Support Vector Machines. The datasets firstly were processed by feature selection algorithms: FFS and PCA, then the classification process was done by AdaBoost-SVM hybrid classifier structure. The used datasets were presented in Table 1.

Principal Component Analysis
Principal Component Analysis is a method used to define instances in the dataset and to express the similarities and differences of the dataset. PCA is a strong method to use for analyzing the dataset [13] because it is hard to find instances of a dataset with high dimensions and in the situations where schematic representation is not possible.

Ensemble Classifier AdaBoost
AdaBoost is an ensemble classifier method which creates a strong classifier by combining weak classifiers. In each iteration, the algorithm calls a simple learning algorithm, which was named as a base learner, and creates the classifier. Then, a weight coefficient is appointed to the classifier. The last classification result is obtained by weighted voting which is related to weight coefficients of weak classifiers. If the weak learner error is low, its weight is high in the last voting. The weak learners estimate a little better than random guessing, so there is a big flexibility in the weak learner set design [14].
The algorithm is as follows [15]: Step 2: Weights are updated:

Support Vector Machines
Support Vector Machines (SVM) is a learning method which has high performance on various implementations. SVM is based on two main ideas. The first idea is to map feature vectors to high dimension space by a non-linear method and to use linear classifiers in this new space. The second idea is to find a hyperplane which splits dataset with a big margin. This plane splits the dataset well as much as possible among infinite numbers of planes. Lots of planes have a similar performance on training dataset, but generalization performance on new dataset can differ significantly [16].

Experimental Results
In this study, AdaBoost-SVM ensemble classifier was presented to classify biomedical dataset. The most effective features of dataset were chosen, and dimension reduction was made; then the dataset were classified by SVM-based AdaBoost classifier structure. Two feature extraction algorithms, Forward Feature Selection and Principal Component Analysis, were applied to biomedical data. 10-folds cross-validation method was realized on all datasets and experiments. In stage of feature extraction, the features which are selected more than threshold value in 10folds cross-validation are formed as the new dataset. The optimum threshold value is taken as 5 experimentally. By using this threshold value, 6 features are selected from Breast Cancer  Fig. 1. The selected features were classified by AdaBoost-SVM structure. The performance of the classifier was evaluated by sensitivity, specificity and accuracy rates. There are some parameters in AdaBoost classifier algorithm such as base learner weight, coefficients, and weak learner error. In this study, the parameters for each of the features have been obtained during the training process. Therefore, during the test process, each of the features of test dataset was tested by its parameters which were obtained during the training process. The experiments were implemented on ASUS N550JK Intel (R) Core (TM) i7-4700HQ CPU @ 2.40 GHz notebook.

Testing Each of Features by Their Parameters
In this method, SVM-based AdaBoost classifier was trained by training set. SVM was used as weak learner in all base learners. During the training process, firstly, all features of the first pattern were classified by weak learners of the first base learner. All base learners have weak learners as much as the number of features. This process is implemented for all features. The 1 st feature is classified by the 1 st weak learner, and the 2 nd feature is classified by the 2 nd weak learner, and this pattern will continue. This process is repeated for all base learners. After the training process, weak learner errors and base learner weights are obtained, and the weak learner structures are held as a trained weak classifier to be used in test stage. In test stage, by using trained weak classifier, the pattern is classified. The classification result ℎ = {−1, +1} is multiplied with the weights of base learners (a). After classification process is completed for all base learners, a classification result called weight voting is obtained by Equation 4. The number of base learners and the number of weak classifiers were held the same. The training scheme of this method is presented in Fig. 2.
During the decision process, test dataset was given as input with parameters which were obtained after the training process. The weak learner structures are held as trained weak classifiers, and base learner weights are used to obtain the classification result. In all base learners, for the first pattern, classification results of their features are taken after classification with the weak learners. For m features, it is decided which class the features belongs to after classification. If the features of pattern are assigned to +1 class by more than the half of the feature number, the pattern was classified as +1, otherwise, the pattern is classified as -1. This process is repeated for all instances and base learners. After test stage, one classification result is obtained according to Equation 4. The scheme of test process was given in Fig. 3.

Experiments on Discontinuous Dataset
In this study, two different data types, continuous time and discontinuous time, were used. Wisconsin Breast Cancer, Pima Diabetes and Heart (Statlog) datasets were taken from UCI database, and they were used as discontinuous data. These datasets were classified by FFS-AdaBoost-SVM and PCA-AdaBoost-SVM structures, and the performances of the structures were presented. According to experimental results, on FFS-AdaBoost-SVM structure, the optimum feature number was found as 4 for Breast Cancer, Pima Diabetes, and Heart (Statlog) datasets. The accuracy rates (ACC) on FFS-AdaBoost-SVM and PCA-AdaBoost-SVM structures for three datasets can be seen for different feature numbers of feature selection in Fig. 4. It can be seen in Fig. 4 that the best performance was obtained in Breast Cancer dataset by 4 features, so the most effective 4 features of Breast Cancer dataset were selected by FFS algorithm, and these features were classified by AdaBoost-SVM structure. The accuracy rate was obtained as 97.74% for Breast Cancer dataset. As Fig. 4 shows, the best performance on Pima Diabetes data was obtained with 4 features using FFS algorithm, and the obtained features were classified by AdaBoost-SVM classifier. The optimum accuracy was obtained as 75.26% for Pima Diabetes data. FFS algorithm found the best performance with 4 features in Heart (Statlog) dataset, and these features were classified by AdaBoost-SVM classifier. The best accuracy was obtained as 89.54% for Heart (Statlog) dataset. Similarly, feature selection process was performed by PCA algorithm in Breast Cancer dataset, and the selected four features were classified by AdaBoost-SVM. The accuracy rate was found as 92.07% for Breast Cancer dataset. According to Fig. 4, the four features were selected on Pima Diabetes dataset by PCA algorithm, and the classification process was done by AdaBoost-SVM structure.
The classification performance was obtained as 74.89% in Pima Diabetes dataset. As appropriate to the feature selection process in FFS algorithm, four features of Heart (Statlog) dataset were selected by PCA algorithm, and these 4 features were classified by AdaBoost-SVM. The optimum accuracy rate was obtained as 74.36% for Heart (Statlog) dataset. On both structures which were performed by FFS and PCA algorithms, the base learner number was held the same with the weak learner number. On Breast Cancer dataset, 5.7% higher performance was obtained in feature extraction process which was implemented by FFS according to PCA, and thus, the algorithm resulted in shorter time. On Pima Diabetes dataset, 0.4% higher performance was obtained by FFS algorithm according to PCA, but PCA algorithm resulted faster. On Heart (Statlog) dataset, FFS algorithm gave around 15% higher performance than PCA algorithm, but PCA algorithm resulted faster. The parameters of FFS-AdaBoost-SVM structure were used on PCA-AdaBoost-SVM structure. The number of base learners and weak classifiers for each dataset were held the same. The classification results on FFS-AdaBoost-SVM and PCA-AdaBoost-SVM for Wisconsin Breast Cancer, Pima Diabetes and Heart (Statlog) datasets were presented in Table 2.
As seen in Table 2

Experiments on Continuous Dataset
The Electrocardiogram (ECG) dataset was used as continuous time dataset in this experiment. ECG is a heart current graphic and a record of electrical activity in heart. The electrical signals in heart are being measured by surface electrodes and electronics devices. The obtained dataset is transformed to an ECG wave which has a characteristic pattern [17]. Three different ECG signal types (Right Bundle Branch Block, Left Bundle Branch Block and Normal Sinus Rhythm) were used. These datasets were taken from Physionet ECG Database [18]. Each of signals was sampled at 360Hz frequency with 11-bit resolution over a 10mV range. The presented classifier algorithms have binary structures, for that reason, multiple classification theories were used. In classification process, one-against-all (OAA) method was preferred. The method of OAA is: If the dataset is in Ψ= { 1, 2,…,} form, M is the number of different classes in a dataset. In OAA method, a binary classifier is processed to distinguish each class from the other classes in dataset. The aim of this binary classifier is to differ (Ψ−{ }) from the other ( , =1,2,…, ) classes. Thereby, M numbers classifiers are being trained for every class [19]. First of all, dimension reduction process was done on ECG dataset by applying FFS and PCA feature extraction methods. The ECG dataset was divided to 5, 8, 10, 20, 25, 50 and feature selection process was done for all parts. Then, the selected features were classified by AdaBoost-SVM and the best results were obtained when the dataset was divided to 8. So, the dataset was divided to 8 and feature selection process was done on the divided parts. The scheme of the process was presented in Fig.  5.

In taking number of the features and number of AdaBoost base learners as same
The new dataset was obtained after feature extraction (FFS or PCA) and it was presented as a new input to classifier structure. The number of base learners in classifier structure can be held the same with number of features, or it can be changed. The number of weak classifiers are also held the same with the number of features in each base learners. In Table 3, obtained results were presented while the number of features and the number of base learners were held the same. For 17, 33, 43, 70 features, obtained results show that classifier structure with FFS has higher performance than the classifier structure with PCA. In all experiments, FFS has higher performance. PCA is better than FFS at only process time. The accuracy rates, due to different numbers of features on FFS-AdaBoost-SVM and PCA-AdaBoost-SVM structures, were presented in Fig. 6.

In taking number of the features and number of AdaBoost base learners as different
In this approach, the number of base learners were held different from the number of features while the number of weak learners in each base learners were held the same with the number of feature. The obtained results were presented in Table 4(a) and Table 4(b). And the graphical results for both structures were presented in Fig. 7. As seen in Table 4(a), the highest classification rate was obtained as 97.37% with 10 base learners on FFS-AdaBoost-SVM structure at 33 and 43 features. The highest classification rate on PCA-AdaBoost-SVM structure was achieved as 71.05% with 8 base learners at 17 features as seen in Table 4(b). PCA is better than FFS only at some total training and test times as seen in Table 4.

Discussion and Conclusion
In this study, SVM-based AdaBoost ensemble classifier system was designed. Feature selection algorithms were added to this system to increase the classification performance. The dimension reduction was implemented on the biomedical dataset by FFS and PCA feature extraction algorithms. Thus, training and test times were reduced. Without feature extraction, total training and test time were 701,6 seconds in WBC dataset; 1285,2 seconds in PD dataset; 925,1 seconds in Heart (Statlog) dataset; 2804,7 seconds in ECG signal. The designed system was tested by continuous and discontinuous time dataset. The designed feature extraction-classifier structure was run by 30 times (10-folds CV). In experiments with the used discontinuous datasets, the number of base learners were held the same with the number of features. For 4 features, the best classification accuracy was obtained as 97.74% on WBC dataset; as 75.26% on Pima Diabetes dataset; as 89.54% on Heart (Statlog) dataset by FFS-AdaBoost-SVM structure. In experiments done by the used continuous dataset, the number of base learners were obtained by two different methods, and the system performance was observed. In the first method, the number of base learners were held the same with the number of features. In this method, the best classification accuracy was obtained by 17 features as 98.24% on FFS-AdaBoost-SVM structure as seen in Table 3(a).
In the second method, the number of base learners were investigated experimentally. The value of base learners' numbers was optimally found as 5, 8 and 10. 5, 8 and 10 base learners were tested by 17, 33, 43 and 70 features which were selected from ECG dataset. The best classification accuracy was obtained by 10 base learners with 33 and 43 features as 97.37% on FFS-AdaBoost-SVM structure as seen in Table 4(a).
The obtained results of this study were compared to the literature in Table 5. Table 5 shows that the proposed method is more effective than other studies to classify the used biomedical datasets. For Breast Cancer dataset, our study has the best classification accuracy according to Table 5, and the closest accuracy value was obtained as 97.35% in [10]. For Pima Diabetes dataset, the best classification accuracy is 77.34% in [3], and our study has lower classification accuracy than [3]. But our results are better than [10] for discontinuous dataset. For Heart (Statlog) dataset, our study has the best classification accuracy, and the closest accuracy value is 87.4% in [5]. For ECG dataset, our study has the highest classification accuracy as seen in Table 5. Also, it can be seen that in the Table 5, the studies in the literature did not use discontinuous and continuous dataset to test their algorithm. In this study, the proposed method was tested on both data types.