Diagnosis of Mesothelioma Disease Using Different Classification Techniques

Mesothelioma, which is a disease of the pleura and peritoneum, is an asbestos-related environmental disease in undeveloped countries. Although the incidence of this disease is lower than that of lung cancer, the reaction it creates in society is very high. In this study, 9 different classification algorithms of data mining were applied to the Mesethelioma data set obtained from real patients in Dicle University, Faculty of Medicine and loaded into UCI Machine Learning Repository, and the results were compared. When the obtained results were examined, it has been seen that Artificial Neural Network (ANN) had %99.0740 correct classification ratio.


Introduction
Malignant mesothelioma (MM) is very aggressive tumors of the pleura. These tumors are associated with exposure to asbestos, as well as with simian 40 virus infections and genetic predisposition. Molecular mechanisms and rural life may also be effective in the development of mesothelioma. Soil mixtures containing asbestos are mainly found in Turkey and Greece. Along with the development of computer technologies, the amount of data that is being used every day is rapidly growing. According to an estimate, the amount of data in the world is doubled every 20 months [1]. In the past decade, data mining has been disciplined in analyzing data to gain useful and meaningful information, and has become a focus of attention in the academic field as well as in industry, economics, and business circles. Data mining methods used in many fields such as health, basic sciences, banking, finance, market research have many algorithms [2]. The purpose of data mining is to extract information that can be meaningful, confidential and useful by analyzing large data sets. The information extracted from the data set studied has an accuracy degree and is not deterministic information. The success of the algorithms used by the criterion such as accuracy, precision, sensitivity and f-criterion which determine the performance grades of the generated models. Although evaluation of data mining algorithms in comparison with empirical means has been done in all scientific studies, there is criticism that such studies in academic literature will not produce objective and definite results [3]. These criticisms are due to the fact that the model performances generated by the users who are implementing the steps such as data pre-processing, parameter selection, test and learning clusters applied at the steps of modelling are based on the user who performs the application. Another criticism is that comparisons of applications in academic studies where a new algorithm is compared to existing algorithms do not result in an objective result with the developer's bias. The final criticism is that the most majority of the comparative studies done in the academic literature do not use real data, and thus the implemented evaluations have not produced the correct results [4]. Despite all these criticisms, the necessity of comparison of algorithms has been accepted as a common view, and has taken place in academic studies and current practice in terms of implementation and development. There are different studies in the literature to compare data mining algorithms. Some of these studies are based on the acceptability of the developed algorithm by comparing it with previous algorithms using different data sets. Compared to different data mining classification algorithms within the scope of the European Stat Logs project, a comprehensive study has been conducted to find out which algorithms better to meet the industry needs in the work called "Machine Learning, Neural and Statistical Classification". In the study, statistical, machine learning and ANN methods were compared on different data sets. As a result, it is emphasized that different algorithms produce better models in different data sets [5]. Other similar comparative studies have achieved different results [6,7,8,9,10,11]. In this study different classification algorithms of data mining have been used to diagnose Mesothelioma disease. A comparison of these algorithms for this specific data set has been implemented. This paper is organized as follows. In section 2, brief information is presented about the classification algorithms of this study. In section 3, the properties of Mesothelioma dataset is explained. In section 4, the comparative analysis of the results obtained is given. In section 5, a general evaluation of the study and some suggestions are presented.

Theoretical Background
In this study, 9 different classifying algorithms of data mining were used. Short information about each of the classifying algorithms namely J48, Bayes Net, SMO, LMT, Logistic, Multi Class Classifier, Random Committee, PART and ANN will be mentioned in the following paragraphs.

J48
The J48 classifier is a simple version of C4.5 decision trees that produces binary trees. Decision trees approach is the most useful method in classification problems. With this technique, a tree that models the classification process is constructed. When the tree is built, the tree is applied to each group in the database and results in the classification of that group. J48 method can be produce pruned or unpruned C4.5 decision tree [12,13].

Bayes Net
A Bayesian network (BN) is a directed acyclic graph (DAG) where nodes are random variables and directed edges represent probability dependencies among variables. Each node and its parents are associated with a conditional probability distribution (CPD), which quantifies the effect of the parents on the node. A BN provides a compact representation of a joint probability distribution over the set of random variables [14].

SMO
SMO implements John Platt's sequential minimal optimization algorithm for training a support vector classifier. This implementation globally replaces all missing values and transforms nominal attributes into binary ones. It also normalizes all attributes by default. (In that case the coefficients in the output are based on the normalized data, not the original datathis is important for interpreting the classifier.) Multi-class problems are solved using pairwise classification (1-vs-1 and if logistic models are built pairwise coupling according to [15]). To obtain proper probability estimates, use the option that fits logistic regression models to the outputs of the support vector machine. In the multiclass case the predicted probabilities are coupled using Hastie and Tibshirani's pairwise coupling method [16,17,18].

LMT
The Logistics Model Tree (LMT) is a standard decision tree structure with logistic regression functions on the leaves. As in traditional decision trees, a test of attributes is associated with each inner node. For a nominal feature of K, the node has k child nodes, and the samples are sorted into k branches depending on the attribute value. For numerical attributes, the node has 2 child nodes and the test consists of comparing the attribute value and a threshold value. If attribute value of the instance is less than threshold value than it sorted down left branch. Otherwise, if attribute value of the instance is more than threshold value than it sorted down right branch [19].

Logistic
It is a class for building and using a multinomial logistic regression model with a ridge estimator. There are some modifications, however, compared to the paper of leCessie and van Houwelingen [20,21]: If there are k classes for n instances with m attributes, the parameter matrix B to be calculated will be an m*(k-1) matrix. In order to find the matrix B for which L is minimised, a Quasi-Newton Method is used to search for the optimized values of the m*(k-1) variables. Note that before we use the optimization procedure, we 'squeeze' the matrix B into a m*(k-1) vector. Although original Logistic Regression does not deal with instance weights, we modify the algorithm a little bit to handle the instance weights.

Multi Class Classifier
Although SVM is actually developed for two-layer problems, it can be converted to multi-class classification by two approaches. One is to combine a number of two-category classification SVMs in a certain manner to form a multi-class classifier, while the other is to directly solve a multi-class classification function with the training samples [22]. The decision-making functions of the latter are difficult to fulfil. Training and testing processes are also long processes. So the first method is more practical and various algorithms are derived from this method: the one-against-rest method, the one-against-one method [23].

Random Committee
Class for building an ensemble of randomizable base classifiers. Each base classifiers are built using a different random number seed (but based on the same data). The final prediction is a straight average of the predictions generated by the individual base classifiers [24].

PART
This class is a class that creates a PART decision list. In each iteration, it builds a partial C4.5 decision tree and transforms the best leaf into a rule [16].

Artificial Neural Network
ANN is an information technology that is developed by inspiring the human brain's information processing technique. ANN mimics the way a simple biological nervous system works. Imitated nerve cells contain neurons, which connect to each other in various ways to form a network. These networks have the capacity to learn, memory, and reveal the relationship between the hosts. In other words, ANN normally provide solutions to problems that require a person's natural abilities to think and observe. The basic reason for a person to be able to produce solutions for the problems that require his / her ability to think and observe is the ability to learn by living or trying to have the human brain and therefore the human being [25]. In biological systems, learning occurs through the adjustment of synaptic connections between neurons. That is, people begin their learning process from their birth to life. In this process, the brain is continuously developing. As we live and experience, synaptic connections are established and even new connections are established. Learning occurs at this point. This also applies to ANN. Learning happens by using examples through training; In other words, the realization occurs by processing the input / output data, that is, by using the training algorithm to repeatedly adjust the connection weights until a convergence is achieved. ANNs are mathematical systems consisting of many processing units (neurons) connected together in a weighted fashion. A transaction unit is an equation that is often referred to as a transfer function. This processing unit receives signals from other neurons; Combines them, transforms them, and generates a numerical result. In general, the processing units correspond roughly to real  [25]. At the heart of neural computation are distributed, adaptive and nonlinear processing concepts. ANN operate differently than traditional processors. In conventional processors, a single central processing unit performs each movement in turn. ANN consist of a large number of simple transaction units, each of which deals with a piece of a major problem. In its simplest form, a processing unit weighs a set of weights, transforms nonlinearly, and generates an output value. At first glance, the way the work units' work is misleading. The power of neural computation comes from the intensive connection between the processing units that share the total processing load. In these systems, healthier learning is provided by the method of back propagation [25]. In most ANN, neurons with similar characteristics are structured in layers and the transfer functions are run simultaneously. Almost all networks have data-receiving neurons and output-generating neurons. The mathematical function, the main element of ANN, is shaped by the architecture of the network. More specifically, the basic structure of the function determines the size of the weights and the operation of the processing elements. The way in which ANN relate behaviour, that is, input and output, is first influenced by the transfer functions of neurons, how they are connected to each other, and the weights of these connections [25].

Material and Methods
In this study, data set named "Mesothelioma disease data set" which was prepared at Dicle University Faculty of Medicine and loaded on UCI (University of California, Irvine) Machine Learning Repository database was used. This dataset contains 324 patient records. Each record has 34 features. These are; Age, gender, city, asbestos exposure, malignant mesothelioma type of MM, duration of asbestos exposure, diagnosis method, side the duration of symptoms, respiratory distress (dyspnea), ache on chest, weakness, habit of cigarette, performance status (performance) (White blood cell count WBC), hemoglobin (HGB), platelet count PLT, sedimentation, blood lactic dehydrogenase LDH, alkaline phosphatase ALP, (Total protein), albumin (albumin), glucose, pleural lactic dehydrogenase, pleural protein, pleural albumin, pleural glucose, Pleural effusion, pleural thickness on tomography, pleural level of acidity pH and C-reactive protein (C-reactive protein, CRP). There is also a variable for each record that represents the diagnostic class. Of 324 records, 228 were identified as healthy and 96 as patients [26]. ANN classification is implemented by Alyuda NeuroIntelligence 2.2. whereas others were implemented by WEKA (The University of Waikato).

Experimental Study
First, the J48 algorithm, a C4.5 decision tree type, was applied to the Mesothelioma dataset and the obtained results are shown in Table 1. As can be seen from Table 1, 283 of the 324 samples in the related database were correctly classified. Thus, the correct classification ratio of J48 algorithm is %87.3457. The Bayes Net algorithm, was applied to the Mesothelioma dataset and the obtained results are shown in Table 2. As can be seen from Table 2, 286 of 324 samples in the related database were correctly classified. Thus, the correct classification ratio of Bayes Net algorithm is %88.2716. The SMO algorithm, was applied to the Mesothelioma dataset and the obtained results are shown in Table 3. As can be seen from Table 3, 288 of 324 samples in the related database were correctly classified. Thus, the correct classification ratio of SMO algorithm is %88.8889. The LMT algorithm, was applied to the Mesothelioma dataset and the obtained results are shown in Table 4. As can be seen from Table 4, 289 of 324 samples in the related database were correctly classified. Thus, the correct classification ratio of LMT algorithm is %89.1975. The Logistic algorithm, was applied to the Mesothelioma dataset and the obtained results are shown in Table 5. As can be seen from Table 5, 290 of 324 samples in the related database were correctly classified. Thus, the correct classification ratio of Logistic algorithm is %89.5062. The Multi Class Classifier was applied to the Mesothelioma dataset and the obtained results are shown in Table 6. As can be seen from Table 6, 290 of 324 samples in the related database were correctly classified. Thus, the CCR of Multi Class Classifier algorithm is %89.5062. The Random Committee algorithm was applied to the Mesothelioma dataset and the obtained results are shown in Table  7. As can be seen from Table 7, 292 of 324 samples in the related database were correctly classified. Thus, the CCR of Random Committee algorithm is %90.1235. The PART algorithm, was applied to the Mesothelioma dataset and the obtained results are shown in Table 8. As can be seen from Table 8, 294 of 324 samples in the related database were correctly classified. Thus, the correct classification ratio of PART algorithm is %90.7407. ANN was applied to the Mesothelioma Dataset. Of the 324 samples, 220 were in training, 52 were in validation, and 52 were allocated to test the network. The network topology used was 41-42-1. The network topology used was 41-42-1 and logistic was used as input and output activation function. Quick propagation algorithm was used for training network and the obtained results are shown in Table 9. As can be seen from Table 9, overall correct classification ratio of ANN algorithm is %99.0740. In Fig 2, the ROC curve of the ANN application is presented.