A Two Stage Hybrid Ensemble Classifier Based Diagnostic Tool for Chronic Kidney Disease Diagnosis Using Optimally Selected Reduced Feature Set

: This paper presents an idea of applying a two stage hybrid ensemble classifier for improving the prediction accuracy of Machine Learning based automated diagnosis of chronic kidney disease on the basis of values of an optimally selected subset of clinical and physiological parameters fed to it. Chronic kidney disease is a generalized term for various heterogeneous disorders affecting the structure and function of the kidney. It is a disease with high mortality rate. In this paper the authors have proposed a two stage hybrid ensemble technique with very high efficiency. In two stage hybrid ensemble classifier the potential of individual classification algorithms are combined together. In addition to this the authors optimally selected 8 parameters of prime importance from the set of 24 parameters of the dataset used for the study .The parameters (features) selected represent the intersection of the two sets; one containing medically essential parameters arranged in decreasing contribution to the diagnosis and other set containing parameters ranked in decreasing order of their contribution in the Machine Learning classification process. The results depict that the two stage hybrid ensemble is a very efficient method for classification of chronic kidney disease. The results of this ensemble classifier on the optimally selected reduced feature set (with 8 parameters) as well as the complete feature set (with 24 parameters) in terms of various performance metrics are predictive accuracy of (2-class) 100%, sensitivity of 1, precision of 1, specificity of 1 and F-value of 1. The GUI based diagnostic tool developed on the basis of the proposed ensemble can act as a tool for assisting doctors for cross-validating their findings of initial screening of chronic kidney disease using fewer clinical parameters thus helping them to attend to the needs of more patients in less time.


Introduction
John McCarthy coined the term ''Artificial Intelligence'' and defined it as the ''science and engineering of making intelligent machines''. Artificial Intelligence commonly known as AI is sometimes also referred as "Synthetic Intelligence" [1]. AI is the branch of science and engineering i.e. concerned with the computational understanding of intelligent behaviour and with the creation of artefacts that exhibit such behaviour. Programs which enable computers to function in the ways that make people seem intelligent are called artificial intelligent systems [2].The field of Artificial Intelligence was founded with an intention of imparting a central property of humans i.e. intelligence to machines. Machine Learning is a branch of artificial intelligence which aims at providing computational methods for accumulating, changing and updating knowledge in the intelligent systems. In words of Ethem Alpaydın machine learning can be defined as "Machine learning is programming computers to optimize a performance criterion using example data or past experience. We have a model defined up to some parameters, and learning is the execution of a computer program to optimize the parameters of the model using the training data or past experience. The model may be predictive to make predictions in the future or descriptive to gain knowledge from data, or both" [3] .Machine learning can be used effectively for diagnosis, prognosis, and prediction of recurrence of diseases or medical disorders. Life threatening disease like diabetes [4], lung diseases [5], Heart diseases [6] cancer [7] etc can be diagnosed with great accuracy by feeding medical datasets obtained from various sources pertaining to these diseases to Machine Learning based systems that learn from these data sets and predict the future outcome with notable accuracy. The primary motive of machine learning in medicine is developing artificial intelligent systems that can assist a medical doctor in performing expert diagnosis. Chronic kidney disease is a general term for heterogeneous disorders affecting the structure and function of the kidney [8]. It is a heterogeneous condition, whose clinical manifestations and course depends on the cause and type (pathology), severity, rate of progression, and comorbid conditions [9]. The definition of chronic kidney disease is based on the presence of kidney damage (i.e. albuminuria) or decreased kidney function for 3 months or more [10]. Kidney failure is one of the most serious outcomes of chronic kidney disease, the main reasons being the complications of reduced kidney function. Dialysis and transplantation are the only viable treatment options when the symptoms of kidney failure are severe i.e. end stage renal disease. Complications can occur at any stage, which often lead to death with no progression to kidney failure, and can arise from adverse effects of interventions to prevent or treat the disease [8]. CKD is an internationally recognized public health problem affecting nearly 10% of the world population [11]. The current desired contribution of AI in the medical sciences is the programs that can assist a medical expert in performing expert and more accurate and quick diagnosis. These programs by making use of combination of sciences like statistics and probability try to find out the patterns from the data (i.e. Pattern Recognition) used for training and then make use of these patterns in order to classify the test data into one the possible categories (outcome).

Literature Review
The use of machine learning algorithms is day by day increasing in medical domain for solving problems by analyzing and interpreting large volumes of data [4]. A number of researchers in this field have used Machine Learning algorithms in order to solve problems in the field of medicine. • Igor Kononenko [12] presented a view on the use of Machine learning techniques 1) in the past for the interpretation of medical data 2) For intelligent analysis of medical data in the current scenario and 3) for assistance of physicians in diagnosis of medical disorders, in the future. Integration of machine learning techniques with the existing instrumentations for the acceptance of machine learning in medicine is suggested by the authors. • Hardik Maniya et al. [13] compared Naïve Bayes classifier and KNN for diagnosis of Tuberculosis, implementation has been done using C language and Weka tool. Medical Dataset used had 19 attributes and 154 instances. The authors classified the patients affected by tuberculosis into two categories (least probable and most probable).The authors achieved nearly 78% accuracy with low false negative. • R Bharat Rao et al. [14] developed a computer aided diagnosis (CAD) system named lungCAD. The system used classification algorithm for the detection of pulmonary nodules from the CT thorax studies. The clinical approach for the diagnosis of coin lesion used chest x-ray and CT scan. The lungCAD greatly assisted the clinician in order to improve their diagnosis accuracy .LungCAD was also approved by FDA in 2006. • Abid Sarwar et al. [4] performed a comparative analysis of Artificial neural network, Naïve Bayes and KNN algorithm for the type II diabetes in terms of detection accuracy. The results showed that Artificial neural network with 96% prediction accuracy performs better than Naïve Bayes with 95% and KNN 91%. • Yasodha et al. [15] did analysis of a database of diabetic patients using weka tool. The authors considered different algorithms such as REP Tree, Bayes Network, J48 and Random Tree classifiers for the study and compared the outputs. The main objective of the study was to develop a Diabetic expert system; inputs being patient's daily glucose rate and insulin dosages the system would predict the patient's insulin dosage for the next day. • Bekir Karlik [16] did comparison between Backpropagation and Naive Bayes Classifiers to diagnose hepatitis disease. Hepatitis is the general term for inflammation of the liver. The most common causes of hepatitis are the hepatotropic viruses (such as hepatitis A, B, and C) and alcohol abuse. In practice, both of these methods often compete well with more sophisticated classifiers. The performances of proposed methods are selected for each of classification tasks of hepatitis diseases. The over¬all accuracy of diagnosis systems were 98% and 97% respectively. • Huda Yasin [17] proposed a method for investigating factors which are more pervasive for the risk of hepatitis C virus. The dataset has been obtained from the machine learning warehouse of University of California. The authors compared the proposed method with nearly 20 classification techniques which includes Naïve Bayes, GRNN, and CART etc. and proved the proposed method is having the highest accuracy rate of 89.6%.The proposed method worked by using only 37% of the total fields depicting low feature complexity. • L.C. van der Gaag et al. [18] developed a decision-support system for patient-specific therapy selection for esophageal cancer. The system predicts the correct stage of cancer, which helps the oncologist to start with the correct treatment plan for the patient. The kernel of the system is a probabilistic network that describes the presentation characteristics of cancer of the esophagus and the path physiological processes of invasion and metastasis. Results showed that for 85% of the patients, the network predicted the correct cancer stage. • Babak Sokouti et.al. [19] proposed Levenberg-Marquardt feedforward MLP neural network (LMFFNN) in order to classify cervical cell images obtained from 100 patients including healthy, low-grade intraepithelial squamous lesion and high-grade intraepithelial squamous lesion cases. This neural network along with extracted cell image features is a new model for cervical cell image classification. Based on the results, cervical cell images were classified successfully with 100% correct classification rate using the proposed method. Moreover, the rates of sensitivity and specificity were calculated as 100 % using LMFFNN method. It was shown there was a good agreement between the expert decision and values gained from the ANN model. • Shivajirao M.Jadhav et al. [20] proposed a system that used the ECG recordings for the detection of Athymias in the human heart, by training a multilayer perceptron (MLP) artificial neural network(ANN) on an ECG data set. The ECG dataset has been extracted from University of California at Irvine (UCI) data repository and it contains 452 instances with 279 attributes. The proposed system classifies the patterns into two classes 1) normal and 2) abnormal classes. The data set has also been used to train a modular artificial neural network (ANN). The system with MLP model showed 86.67% classification accuracy and 93.75% sensitivity and Modular ANN showed a classification accuracy of 93.1%. • T.Manju et al. [21] proposed a hybrid system based on multilayer feed forward neural network (MLFFN) and genetic algorithm (GA) for assisting medical doctors in predicting the heart disease. The cardiac arrest (heart attack) is a major cause of death in the world, its major causes are smoking, high blood pressure, unhealthy diet, and obesity and diabetes .The data set used in the study was collected from university of California at Irvine (UCI) repository and consists of data of 270 patients. The ANN is trained using back propagation and feed forward neural network. Weight optimization is done using genetic algorithm. The weights are associated with each connection in the neural network nodes. The accuracy of the system on training dataset came out to be 79.7% and on testing accuracy 89.67%. • S.Vijayarani et al. [22] compared Support Vector Machine (SVM) and Artificial Neural Network (ANN) in order to predict kidney disease. The performance metrics used were accuracy and execution time. Analysis of results showed that ANN performed better in terms of classification accuracy whereas SVM required less execution time as compared to ANN. The authors considered ANN to be better than SVM considering both the performance metrics together. • Andrew Kusiak et al. [23] elicited knowledge about the interaction between many of measured parameters and kidney dialysis patient survival using data pre-processing, data transformations and data mining approach. Two different data mining algorithms were employed for extracting knowledge in the form of decision rules which in turn were used by a decision making algorithm, which predicts survival of new unseen patients. Important parameters identified by data mining were interpreted for their medical significance. The approach presented in this manuscript reduced the cost and effort for selecting patients for clinical studies. Patients can be selected based on the prediction results and the most significant parameters discovered. Various other research works and studies have also been carried out for performing machine learning based diagnosis of diseases like cancer [24,25], diabetes [26,27], heart diseases [28,29], kidney disease [29][30][31].

Chronic Kidney Disease data
The medical dataset used to carry out this research has been obtained from the UCI data repository [32].The dataset used was created from data obtained from Dr. P. Soundarapandian of Apollo Hospitals, Tamil Nadu and contains data of 400 people from the southern part of India with their ages ranging between 2-90 years. There are in total twenty four parameters, most of which are clinical in nature and a few are physiological ones. On the basis of these 24 attributes each instance is assigned one of the two classlabels i.e. suffering or not suffering with chronic kidney disease. Table 1 summarizes the various parameters chosen and their allowed values. Among the twenty four parameters a few are of prime importance for diagnosis of CKD. In a patient suffering from CKD the values of serum creatinine and blood urea are usually elevated. Specific gravity of urine remains fixed due the inability of kidneys to dilute or concentrate urine. Blood Pressure can be the result of CKD or in some cases high blood pressure may itself lead to CKD over time. Long standing diabetes could also be a reason of CKD. The Hemoglobin level falls and the patient is most of the times anaemic. Table 2 enlists all the twenty four parameters in descending order of their medical relevance as well as in terms of machine learning classification process for the diagnosis of CKD.

Methodology
The work carried out in this paper is an extension to the preliminary comparative analysis that the authors carried out earlier [33]. There are in total twenty five different classifiers that the authors considered for the development of the two stage hybrid ensemble classifier. Most of the candidate classifiers are majorly based on the following techniques: Decision-Tree, support vector machine (SVM), K-nearest neighbor, Artificial Neural Networks and Ensemble methods. These algorithms were selected for the analysis & study because of their popularity in the recent relevant literature & good performance even in fewer amounts of training data. A short description about the selected algorithms for study is given below.

Decision Tree Classifiers
DT classifiers classify data by making use of tree structure algorithms [34]. The underlying algorithm begins with the training samples and corresponding class labels. The training set is partitioned recursively based on a feature value into subsets. Each internal node represents a test on attribute; each edge (branch) represents an outcome of the test. A decision tree classifier identifies the class label of an unknown sample by following path root to the leaves, which represent the class label for that sample. The feature (attribute) i.e. selected as the root node is the one that best divides the training data. Fig.1  Decision Tree Variants used:

Simple Tree
In this variant of decision tree, maximum number of splits is taken to be four and gini's diversity index is used as split criterion.

Medium Tree
In this variant of decision tree, maximum number of splits is taken to be twenty and gini's diversity index is used as split criterion.

Complex Tree
In this variant of decision tree, maximum number of splits is taken to be hundred and gini's diversity index is used as split criterion.

Support Vector Classification (SVC)
SVC revolves around the perception of a "margin"-either side of a hyperplane that divides two data classes. Maximizing the margin creates the largest possible distance among the hyperplane and the instances on either side of the hyperplane reduce an upper bound on the anticipated generalization error. It works on two types of data i.e. linearly separable data and linearly Non-separable data. In case of linearly separable data only one hyperplane is needed for separating the data but in the case of latter more than one hyperplanes are needed.

SVM variants used: 3.2.2.1. Linear SVM
In this variant of SVM, Linear Kernel Function has been used with kernel scale set to 1.

Quadratic SVM
In this variant of SVM, Quadratic Kernel Function has been used with kernel scale set to 2.

Cubic SVM
In this variant of SVM, Cubic Kernel Function has been used with kernel scale set to 2.5.

Fine Gaussian SVM
In this variant of SVM Gaussian Kernel Function has been used with kernel scale set to 1.2.

Medium Gaussian SVM
In this variant of SVM Gaussian Kernel Function has been used with kernel scale set to 4.9.

Coarse Gaussian SVM
In this variant of SVM Gaussian Kernel Function has been used with kernel scale set to 20.

Discriminant analysis (DA)
DA classifiers work under the assumption that different classes generate data based on different Gaussian distributions. In the training phase the Gaussian distribution parameters for each class are estimated by the fitting function and in order to predict the classes (class-labels) of new data, the trained classifier finds the class with the smallest misclassification cost. There are mainly two types of discriminant analysis classifiers namely -Linear Discriminant Analysis Classifier (LDA) and Quadratic Discriminant Analysis Classifier. The Quadratic Discriminant Analysis Classifier can be considered as the generalization of LDA. In this study Linear and Quadratic Discriminant Classifiers have been used and in both the models diagonal covariance is used for regularization.

K-nearest neighbour (KNN)
KNN is a classification technique which classifies the test objects on the basis of number of closest training examples. It is also termed as a lazy-learning algorithm. KNN is a non-parametric algorithm which means that it does not assume anything on the underlying data distribution. In this, the Euclidean distance is calculated between the test data and every sample in the training data followed by classifying the test data into a class in which most of k-closest neighbours of training data belong to. K is usually a very small positive integer. As the Value of K increases it becomes increasingly difficult to distinguish between the various classes. Cross-validation and other heuristic techniques are used to choose an optimal value of K. KNN variants used:

Fine KNN
In this variant of KNN, number of neighbours has been taken as one, Euclidean distance has been used as distance metric and equal distance weight has been used.

Medium KNN
In this variant of KNN, number of neighbours has been taken as ten, Euclidean distance has been used as distance metric and equal distance weight has been used.

Coarse KNN
In this variant of KNN, number of neighbours has been taken as hundred, Euclidean distance has been used as distance metric and equal distance weight has been used.

Cosine KNN
In this variant of KNN, number of neighbours has been taken as ten, Cosine distance has been used as distance metric and equal distance weight has been used.

Cubic KNN
In this variant of KNN, number of neighbours has been taken as ten, Minkowski distance has been used as distance metric and equal distance weight has been used.

Weighted KNN
In this variant of KNN, number of neighbours has been taken as ten and Euclidean distance has been used as distance metric and squared inverse distance weight has been used.

Artificial neural network (ANN)
ANN is a methodology inspired by the biological network of neurons. It is a powerful data-modelling tool capable of capturing, representing and simulating complex relationships between inputs and outputs by performing multiple parallel computations. These are analytical tools which try to emulate ''learning'' process of the cognitive system and the neurobiological functions of the human brain. In ANN, the neurons are grouped into different layers, an input layer, one or more hidden layers, and an output layer. Fig.2 shows a neural network with two hidden layers. Learning is achieved by repeatedly adjusting the numerical weights associated with the interconnecting edges between different artificial neurons. In addition to this an activation function is used that converts a neuron's weighted input to its output activation. In this study two versions of Feed Forward Back-Propagation Neural Network (FFBPNN) have been used. One of them uses Levenberg-Marquardt (LM) back propagation training function along with gradient descent weight and bias learning function and other uses gradient descent training function along with gradient descent weight and bias learning function.

Ensemble method (EM)
In this method potentials of various individual classifiers are fused together. Using Ensemble method increases the performance by combining the classifying ability of individual classifiers and the chances of misclassifying a particular instance are reduced significantly, this provides a greater accuracy to the overall classification process.The different learners can be combined in a number of ways. They can work in parallel on all of the inputs, and their outputs can be combined in some way. If an instance gets wrongly classified by an individual classifier, the error is corrected by the right classification done by other individual classifiers. Alternatively, a multistage combination will train the base learners on different subsets of the input data. For example, the AdaBoost algorithm first trains an initial learner, and then trains subsequent learners on data that the first learner misclassifies. This way, the weaknesses of each base-learner are made up for by the next learner [35]. Fig.3 illustrates the general working of an ensemble method in which all individual classifiers work in parallel.

Boosted Trees
In this variant AdaBoost method is used, decision tree is the learner type with maximum number of splits being twenty and number of learners used is thirty.

Bagged Trees
In this variant bagging method is used, decision tree is the learner type and number of learners used is thirty.

Subspace Discriminant
In this variant Subspace method is used, discriminant analysis is the learner type, number of learners used is thirty and subspace dimension is twelve.

Subspace KNN
In this variant Subspace method is used, nearest neighbours is the learner type with number of learners being thirty and subspace dimension is twelve.

RUSBoosted Trees
In this variant RUSBoost method is used, decision tree is the learner type, number of learners is thirty, maximum number of splits is twenty and learning rate is 0.1.

Two stage Hybrid Ensemble Classifier
The ensemble classifier proposed in this study is based on using classification potential of different individual classifiers collectively. If some of these individual classifiers in turn are themselves ensemble methods, the new method becomes a two stage Hybrid Ensemble classifier. Fig. 4 illustrates the proposed method graphically. The values of all the performance metrics of twenty five different candidate classifiers were initially evaluated and analyzed after applying them to optimally selected reduced feature set. The values of all the performance metrics of twenty five different candidate classifiers were initially evaluated and analyzed after applying them to optimally selected reduced feature set. In order to select individual classifiers to be used in the ensemble method, the sensitivity and specificity i.e. the true positive rate and true negative rate respectively of each classifier were analyzed. The motive behind was to select two set of classifiers; one that would ensure high true positive rate (sensitivity) and other with high true negative rate (specificity). Based on the results of values of these two performance metrics of all the candidate classifiers, the authors selected: Ensemble using bagged trees, Linear SVM and Ensemble using boosted trees Classifier. The reason for their selection being that Ensemble using bagged trees has true negative rate (specificity) of 100% (i.e. 1), Linear SVM has true positive rate (sensitivity) of 99.2% (i.e. 0.992) and Ensemble using boosted trees to make up for the comparatively low true negative rate (specificity) of the Linear SVM, as it has reasonably both high true positive rate of 98%(i.e. 0.98) & true negative rate of 100%(i.e. 1). Inclusion of the Ensemble using boosted trees ensures that the cases in which conflict may arise between class labels assigned by the other two classifiers due to comparatively low true negative rate of Linear SVM of 85.33 % (i.e. 0.8533) will be dealt with and the chances of wrong classification will be minimized. In general, a Two Stage Ensemble Classifier can be represented as: CTEC=Mode (C1 C2 C3 ...... Cn ) Where, CTEC = Class-label assigned by the Two stage hybrid ensemble classifier. Ci = Class-label assigned by every i th individual classifier. In particular, the two stage ensemble classifier used in context of this study can be represented as : CTEC=Mode ( CE.Bagged CLSVM CE.Boosted) Where, CTEC = Class-label assigned by the Two stage hybrid ensemble classifier.
CE.Bagged = Class-label assigned by Ensemble classifier using bagged trees classifier. CLSVM = Class-label assigned by Linear SVM classifier.
CE.Boosted = Class-label assigned by Ensemble using boosted trees classifier

Optimally Selected Reduced Feature Set
The medical dataset considered for this study consists of twentyfour parameters (features). In the course of this study each individual parameter was considered and evaluated in terms of its contribution towards the results, i.e. in diagnosing whether a patient suffers from the chronic kidney disease or not , both in terms of their medical relevance and how much their contribution is in the Machine Learning classification process. For ranking various parameters in accordance with their importance in terms of medical relevance, help from the medical experts from the concerned field was sought. On the other hand in order to rank all the parameters in terms of their contribution in the Machine Learning classification process ranker method of attribute evaluation using 5-fold cross validation in WEKA benchmark (version 3.6.13) was used. It is a filter approach and it ranks the attributes with respect to their information gain. This ranking is independent of a specific learning algorithm. The list of all twenty four parameters arranged in descending order in terms of their contribution towards the diagnosis of chronic kidney disease in terms of Medical relevance and Machine learning classification process as well are shown in Table 2.  After this an optimal subset of 8 parameters was extracted from the complete set of 24 parameters; these are such parameters which meet the requirements both in the terms of medical relevance and machine learning classification process. Every parameter (feature) that is part of optimally selected reduced feature set was evaluated on two fronts; one being their importance in medical relevance along with the cost incurred in clinical test to obtain their value and the other front being the contribution of that parameter in the machine learning classification process. All the Eight parameters included in the reduced feature set stand high both in terms of medical relevance and classification process. All the parameters used in the reduced feature set are given in table 3. Two (blood pressure, hypertension) out of the eight parameters does not require any clinical tests .The reduced number of parameters requiring clinical tests means considerable reduction in the cost incurred to the patient.

Implementation and Results
All the twenty five candidate classifiers and the two stage hybrid ensemble were applied to the complete as well as the reduced feature set using 5-fold cross validation. Cross-Validation is used to give a good estimate of the predictive accuracy of the final classifier trained with all the data. The procedure includes selecting a number of folds to partition the data set followed by the following steps: 1. Partition the data into k disjoint sets or folds 2. For each fold: a. Classifier is trained using the out-of-fold observations b. Model performance is assessed using the in-fold data.

The average test error over all folds is calculated
Afterward this methodology of each classifier differs but the last step i.e. common to all the classifiers is assigning a class-label to every single instance of the dataset. For the implementation of individual classifiers as well as the two stage hybrid ensemble MATLAB 2016a was used. The feature set was imported to the MATLAB environment from a Microsoft excel-sheet. The performance metrics used for the evaluation of results are predictive accuracy, sensitivity, precision, specificity and F-score These performance metrics are explained below: 1. Predictive accuracy of Z% shows that the classifier is able to classify Z% of instances correctly.  ensemble classifier and table  4 lists them all along with their values of sensitivity, precision and specificity in terms of complete as well as reduced feature set whereas Fig.6 and Fig.7 illustrate them graphically. As it can be seen from the results, the two-stage hybrid ensemble classifier outperformed all the individual classifiers on both the complete as well as the reduced feature set.  On both the feature sets the two stage hybrid ensemble achieved a predictive accuracy of 100%, sensitivity of 1, precision of 1 and specificity of 1. Among the individual classifiers ensemble method using bagged trees performed best with predictive accuracy of 99.2% along with sensitivity of 0.992 , precision of 1 and specificity of 1 on complete feature set and predictive accuracy of 99.2 % along with sensitivity of 0.984 , precision of 1 and specificity of 1 on reduced feature set. The efficiency of the optimally selected reduced feature set can also be seen from the fact that the performance of most of the individual classifiers improved when they were trained using reduced feature set instead of complete feature set. As the dimensionality of data increases, classification problems become significantly harder i.e. a high number of features can lead to lower classification accuracy. The classification accuracy achieved with reduced feature sets is often significantly better than with the complete feature set [37] A GUI based diagnostic tool based on the two stage hybrid ensemble classifier is developed that can be used to predict whether a patient is suffering from chronic kidney disease or not when it is fed with all the 8 attributes from user through a user friendly GUI (Graphical User Interface).The development of this diagnostic tool is done using MATLAB 2016a. Out of 8 parameters that the user needs to enter as input in GUI based diagnostic tool four are numeric and the rest are nominal values. The diagnostic tool in execution is shown in Fig.8.

Conclusion
Chronic Kidney disease is a disease with high mortality rate. Five to ten percent of the population worldwide suffers from this disease. Chronic kidney disease is a worldwide health crisis. A majority of the cases are not timely diagnosed or remain undiagnosed in developing and underdeveloped nations majorly due to poor doctor-patient ratio and poverty; this is one of the prime reasons that higher percentage of these cases are from developing and underdeveloped nations in comparison to developed nations as majority of people in developed nations go through routine check-up and diagnosis. More than 80% of all patients who receive treatment for kidney failure are in affluent countries with universal access to health care and large elderly populations [38]. The cost incurred by the clinical tests a patient has to go through acts as a deterrent for them to visit a doctor in order to get timely and regular check-up in developing and underdeveloped nations. Reducing the number of parameters required for the diagnosis to be done by the classification system without hampering its performance, one major problem i.e. of the cost incurred in going through a number of clinical tests can be addressed to a great extent .The reduced number of parameters means fewer clinical tests a patient has to go through and fewer the clinical tests taken less will be the cost incurred. Keeping this in mind all the parameters added to the optimally selected reduced feature set were also evaluated in terms of the cost incurred to a patient by going through the clinical test that provides the value for that parameter, in addition to its medical relevance and considering its role in classification process done by classifier. It can also be seen from the results of individual classifiers that their classification performance improved in case of optimally reduced feature set as compared to the complete feature set. The GUI based diagnostic tool using two stage hybrid ensemble classifier developed by the authors can result in timely and accurate diagnosis of this disease by assisting doctors in cross checking their diagnosis findings in relatively short time with minimal number of clinical tests required and thus helping a doctor to attend and diagnose more number of patients as compared to the scenario where he has to go through the diagnosis process entirely manually. There were some missing values in the dataset that were dealt with by replacing numeric and discrete integer values by attribute mean of the all the instances with the same class-label as that of the instance under consideration and nominal values were replaced using attribute mode. In the future, this study could be extended to perform multi-stage diagnosis of chronic kidney disease by including the GFR (glomerular filtration rate) as a parameter.