Systematic Literature Review of Mixed Variables Classification
Keywords:
Classification; Location Model; Smoothed Location Model; Mixed Variables; Systematic Literature Review.Abstract
Classification has become a widely used methodology across various fields. Numerous techniques are available for categorizing objects into distinct classes. Among these, the parametric approach, specifically the location model and its smoothed counterpart, has gained significant attention, particularly for data containing a mix of continuous and categorical variables. This paper systematically reviews literature from Scopus and Science Direct databases, focusing on a parametric classification approach involving mixed variables, i.e., location model and smoothed location model. A total of 70 articles were selected through systematic review procedures aligned with relevant thematic areas. These articles were analysed and discussed in the context of mixed classification based on the location model and smoothed location model across diverse data scenarios. Systematic literature reviews offer several advantages over traditional methods, notably providing a structured review process and essential priorities to mitigate research biases effectively. The review reveals that the smoothed location model demonstrates superior classification performance compared to other techniques in most studies. However, literature specifically addressing the applications of the location model and smoothed location model for classifying objects with mixed variables remains limited. Consequently, it is recommended that future research endeavours consider both models when dealing with classification tasks involving mixtures of variables.
Downloads
References
J. Franco, J. Crossa, J. Villaseñor, A. Castillo, S. Taba, & S. A. Eberhart, “A two-stage, three-way method for classifying genetic resources in multiple environments,” Crop Science, vol. 39, no.1, pp. 259–267, 1999. doi:10.2135/cropsci1999.0011183X003900010040x
M. Cocchi, A. Biancolillo, & F. Marini, “Chemometric Methods for Classification and Feature Selection, Comprehensive Analytical Chemistry,” vol. 82, no.1, pp. 265–299, 2018. doi:10.1016/bs.coac.2018.08.006
C. Catal, & B. Diri, “A systematic review of software fault prediction studies. Expert Systems with Applications,” vol. 36, no.4, pp. 7346–7354, 2009. doi:10.1016/j.eswa.2008.10.027
A. Kaya, A. S. Keceli, C. Catal, & B. Tekinerdogan, “Model analytics for defect prediction based on design-level metrics and sampling techniques,” In Model Management and Analytics for Large Scale Systems, pp. 125–139, 2020. doi:10.1016/B978-0-12-816649-9.00015-6
P. Baah, A. Adebanji, & R. G. Kakaï, “Optimal ratio of continuous to categorical variables for the two-group location model,” International Journal of Applied Mathematics and Statistics, vol. 42, no.12, pp. 18–26, 2013.
H. Hamid, F, Zainon, & T. P. Yong, “Performance analysis: An integration of principal component analysis and linear discriminant analysis for a very large number of measured variables,” Research Journal of Applied Sciences, vol. 11 no.11, pp. 1422-1426, 2016.
I. Olkin & R. F. Tate, “Multivariate Correlation Models with Discrete and Continuous Variables,” The Annals of Mathematical Statistics, vol. 32, pp. 448–465, 1961.
G. J. McLachlan, “Discriminant Analysis and Statistical Pattern Recognition,” New York, NY: John Wiley & Sons, Inc,1992.
P. C. Chang, & A. A. Afifi, “Classification based on Dichotomous and Continuous Variables,” Journal of the American Statistical Association, vol. 69, no.346, pp. 336–339, 1974.
W. J. Krzanowski, “Discrimination and classification using both binary and continuous variables,” Journal of the American Statistical Association, vol. 70, no.352, pp. 782–790, 1975. doi:10.1080/01621459.1975.10480303
N. Balakrishnan, S. Kocherlakota, & K. Kocherlakota, “On the errors of misclassification based on dichotomous and normal variables,” Annals of the Institute of Statistical Mathematics, vol. 38, no.3, pp. 529–538, 1986. doi:10.1007/BF02482540
W. J. Krzanowski, “The location model for mixtures of categorical and continuous variables,” Journal of Classification, vol. 10, no.1, pp. 25–49, 1993. doi:10.1007/BF02638452
W. J. Krzanowski, “Quadratic location discriminant functions for mixed categorical and continuous data,” Statistics & Probability Letters, vol. 19, no.2, pp. 91–95, 1994. doi.:10.1016/0167-7152(94)90138-4
C. Y. Leung, “Regularized classification for mixed continuous and categorical variables under across-location heteroscedasticity,” Journal of Multivariate Analysis, vol. 93, no.2, pp. 358–374, 2005. doi:https://doi.org/10.1016/j.jmva.2004.03.001
C. Y. Leung, “Error rates in classification consisting of discrete and continuous variables in the presence of covariates,” Statistical Papers, vol. 42, no.2, pp. 265–272, 2001. doi:10.1007/s003620100055
W. J. Krzanowski, “Stepwise Location Model Choice in Mixed Variables in Discriminant Analysis,” Applied Statistics, vol. 32, no.3, pp. 260–266, 1983.
O. Asparoukhov, & W. J. Krzanowski, “Non-parametric smoothing of the location model in mixed variable discrimination,” Statistics and Computing, vol. 10, no.4, pp. 289–297, 2000. doi:10.1023/A:1008973308264
R. Guti´ errez, A. Merbouha, R. Guti´errez-S´anchez, & A. Nafidi, “Non-parametric smoothing and regularization of the location model in mixed variable discrimination,” Monograf ´ıas Del Seminario Matem´ Atico Garc´ıa de Galdeano, pp. 107–116, 2008.
H. Hamid, “A new approach for classifying large number of mixed variables,” World Academy of Science, Engineering and Technology, vol. 46, pp. 156–161, 2010.
E. K. Jacob, “Classification and Categorization: A Difference that Makes a Difference,” Library Trends, vol. 52, no. 3, pp. 515-540, 2004.
R. Ortiz, R. Sevilla, G. Alvarado, & J. Crossa, “Numerical classification of related Peruvian highland maize races using internal ear traits,” Genetic Resources and Crop Evolution, vol. 55, no.7, pp. 1055–1064, 2008. doi:10.1007/s10722-008-9312-3
Z. Knezović, J. Gunjača, Z. Šatović, & I. Kolak, “Comparison of different methods for classification of gene bank accessions,” Agriculturae Conspectus Scientificus, vol. 70, no.3, pp. 87–91, 2005.
P. R. dos Santos, A. P. Viana, V. M. Gomes, S. da Costa Preisigke, O. F. de Almeida, E. A. Santos & M. A. Walker, “Resistance to Pratylenchus brachyurus in Vitis species population through multivariate approaches and mixed models,” Scientia Agricola, vol. 76, no.5, pp. 424–433, 2019. doi:10.1590/1678-992x-2017-0387
R. H. White & E. V. Nordheim, “Charring rate of wood for ASTM E 119 exposure,” Fire Technology, vol. 28, no.1, pp. 5–30, 1992. doi:10.1007/BF01858049
E. Couce, A. Ridgwell, & E. J. Hendy, “Environmental controls on the global distribution of shallow-water coral reefs,” Journal of Biogeography, vol. 39, no.8, pp. 1508–1523, 2012. doi:10.1111/j.1365-2699.2012.02706.x
K. Turgeon, & M. A. Rodríguez, “Predicting microhabitat selection in juvenile Atlantic salmon Salmo salar by the use of logistic regression and classification trees,” Freshwater Biology, vol. 50, no.4, pp. 539–551, 2005. doi:10.1111/j.1365-2427.2005.01340.x
J. Franklin, K. E. Wejnert, S. A. Hathaway, C. J. Rochester, & R. N. Fisher, “Effect of species rarity on the accuracy of species distribution models for reptiles and amphibians in southern California,” Diversity and Distributions, vol. 15, no.1, pp. 167–177, 2009. doi:10.1111/j.1472-4642.2008.00536.x
O. B. Gulesan, E. Anil, & P. S. Boluk, “Social media-based emergency management to detect earthquakes and organize civilian volunteers,” International Journal of Disaster Risk Reduction, vol. 65, 10254, 2021. doi:https://doi.org/10.1016/j.ijdrr.2021.102543
S. J. Strath, R. J. Kate, K. G. Keenan, W. A. Welch, & A. M. Swartz, “Ngram time series model to predict activity type and energy cost from wrist, hip and ankle accelerometers: Implications of age,” Physiological Measurement, vol. 36, no.11, pp. 2335–2351, 2015. doi:10.1088/0967-3334/36/11/2335
R. Banerjee & P. K. Srivastava, “Reconstruction of contested landscape: Detecting land cover transformation hosting cultural heritage sites from Central India using remote sensing,” Land Use Policy, vol. 34, pp. 193–203, 2013. doi:https://doi.org/10.1016/j.landusepol.2013.03.005
R. Colaço & J. de Abreu e Silva, “Commercial classification and location modelling: Integrating different perspectives on commercial location and structure,” Land, vol. 10, no.6, 2021. doi:10.3390/land10060567
Y. Wang, T. Wang, X. Ye, J. Zhu, & J. Lee, “Using social media for emergency response and urban sustainability: A case study of the 2012 Beijing rainstorm,” Sustainability (Switzerland), vol. 8, no.1, pp. 1–17, 2016. doi:10.3390/su8010025
Y. Yu, M. Li, T. Ji, & Q. H. Wu, “Fault location in distribution system using convolutional neural network based on domain transformation,” CSEE Journal of Power and Energy Systems, vol. 7, no.3, pp. 472–484, 2021. doi:10.17775/CSEEJPES.2020.01620
V. H. Ferreira, R. Zanghi, M. Z. Fortes, S. Gomes, & A. P. Alves da Silva, “Probabilistic transmission line fault diagnosis using autonomous neural models,” Electric Power Systems Research, vol. 185, 106360, 2020. doi:https://doi.org/10.1016/j.epsr.2020.106360
Y.T. Cho, H. Su, W.J. Wu, D.C. Wu, M.F. Hou, C.H. Kuo, & J. Shiea, “Biomarker Characterization,” by MALDI–TOF/MS, pp. 209–254, 2015. doi:10.1016/bs.acc.2015.01.001
S. Fiamanya, L. Cipolla, M. Prieto, & J. Stelling, “Exploring the value of MALDI-TOF MS for the detection of clonal outbreaks of Burkholderia contaminans,” Journal of Microbiological Methods, vol. 181, 106130, 2021. doi:https://doi.org/10.1016/j.mimet.2020.106130
R. A. Fisher, “The Use of Multiple Measurements in Taxonomic Problems,” Annals of Eugenics, vol. 7, no.1, pp. 179–188, 1936. doi:10.1007/s13398-014-0173-7.2
W. J. Krzanowski, “The performance of fisher’s linear discriminant function under non-optimal conditions,” Technometrics, vol. 19, no.2, pp. 191–200, 1977. doi:10.1080/00401706.1977.10489527
M. A. A. Moussa, “Discrimination and allocation using a mixture of discrete and continuous variables with some empty states,” Computer Programs in Biomedicine, vol. 12, no. 2, pp. 161–171, 1980. doi:https://doi.org/10.1016/0010-468X(80)90062-8
W. J. Krzanowski, “Multiple discriminant analysis in the presence of mixed continuous and categorical data,” Computers & Mathematics with Applications, vol. 12, no.2, pp. 179–185, 1986 doi:https://doi.org/10.1016/0898-1221(86)90071-4
I. G. Vlachonikolis, “Predictive discrimination and classification with mixed binary and continuous variables,” Biometrika, vol. 77, no.3, pp. 657–662, 1990. doi:10.1093/biomet/77.3.657
A. Willse, & R. J. Boik, “Identifiable finite mixtures of location models for clustering mixed-mode data,” Statistics and Computing, vol. 9, no.2, pp. 111–121, 1999. doi:10.1023/A:1008842432747
C. J. Lawrence, & W. J. Krzanowski, “Mixture separation for mixed-mode data,” Statistics and Computing, vol. 6, no.1, pp. 85–92, 1996. doi:10.1007/BF00161577
B. S. Everitt, “A finite mixture model for the clustering of mixed-mode data,” Statistics and Probability Letters, vol. 6, pp. 305–309, 1988.
W. G. Cochran, “Comparison of two methods of handling covariates in discriminatory analysis,” Annals of the Institute of Statistical Mathematics, vol. 16, no.1, pp. 43–53, 1964. doi:10.1007/BF02868561
J. H. Friedman, “Regularized Discriminant Analysis,” Journal of the American Statistical Association, vol. 84, no.405, pp. 165, 1989. doi:10.2307/2289860
M. M. B. Yvonne, E. F. Stephen, & W. H. Paul, “Discrete Multivariate Analysis Theory and Practice,” New York, NY: Springer New York, 2007. doi:10.1007/978-0-387-72806-3
U. Olsson, “On The Robustness Of Factor Analysis Against Crude Classification Of The Observations,” Multivariate Behavioral Research, vol. 14, no. 4, pp. 485–500, 1979. doi:10.1207/s15327906mbr1404_7
A. R. Leon, A. Soo & T. Williamson, “Classification with Discrete and Continuous Variables via General Mixed-Data Models,” Journal of Applied Statistics, vol. 38, no.5, pp. 1021–1032, 2011.
L. Amiri, M. Khazaei, & M. Ganjali, “General location model with factor analyzer covariance matrix structure and its applications,” Advances in Data Analysis and Classification, vol. 11, no.3, pp. 593–609, 2017. doi:10.1007/s11634-016-0258-6
P. Baah, A. Adebanji, & R. G. Kakaï, “Optimal ratio of continuous to categorical variables for the two-group location model,” International Journal of Applied Mathematics and Statistics, vol. 42, no.12, pp. 18–26, 2013.
E. Krusińska, R. Slowinski, & J. Stefanowski, “Discriminant versus rough sets approach to vague data analysis,” Applied Stochastic Models and Data Analysis, vol. 8, no.1, pp. 43–56, 1992. doi:10.1002/asm.3150080107
K. M. Lang, & W. Wu, “A Comparison of Methods for Creating Multiple Imputations of Nominal Variables,” Multivariate Behavioral Research, vol. 52, no3, pp. 290–304, 2017. doi:10.1080/00273171.2017.1289360
P. D. Allison, Missing Data. In SAGE Handbook of Quantitative Methods in Psychology (pp. 72–90). 1 Oliver’s Yard, 55 City Road, London EC1Y 1SP United Kingdom: SAGE Publications Ltd, 2009. doi:10.4135/9780857020994.n4
J. Franco, J. Crossa, J. Villaseñor, S. Taba, & S. A. Eberhart, “Classifying genetic resources by categorical and continuous variables,” Crop Science, vol. 38, no.6, pp. 1688–1696, 1998. doi:10.2135/cropsci1998.0011183X003800060045x
J. Franco, J. Crossa, J. Villaseñor, A. Castillo, S. Taba, & S. A. Eberhart, “A two-stage, three-way method for classifying genetic resources in multiple environments,” Crop Science, vol. 39, no.1, pp. 259–267, 1999. doi:10.2135/cropsci1999.0011183X003900010040x
L. Gutiérrez, J. Franco, J. Crossa, & T. Abadie, “Comparing a preliminary racial classification with a numerical classification of the maize landraces of Uruguay,” Crop Science, vol. 43, no.2, pp. 718–727, 2003. doi:10.2135/cropsci2003.0718
I. S. Andrade, C. A. F. Melo de, G. H. de S. Nunes, I. S. A. Holanda, L. C. Grangeiro, & R. X. Corrêa, “Morphoagronomic genetic diversity of Brazilian melon accessions based on fruit traits,” Scientia Horticulturae, vol. 243, pp. 514–523, 2019. doi:https://doi.org/10.1016/j.scienta.2018.09.006
B. P. Brasileiro, C. D. Marinho, P. M. A. Costa, L. A. Peternelli, M. D. V. Resende, D. E. Cursi, M. H. P. Barbosa, “Genetic diversity and coefficient of parentage between clones and sugarcane varieties in Brazil,” Genetics and Molecular Research, vol. 13, no.4, pp. 9005–9018, 2014. doi:10.4238/2014.October.31.15
R. N. F. Kurosawa, A. T. do Amaral Junior, F. H. L. Silva, A. D. dos Santos, M. Vivas, S. H. Kamphorst, & G. F. Pena, “Multivariate approach in popcorn genotypes using the Ward-MLM strategy: Morpho-agronomic analysis and incidence of Fusarium spp,” Genetics and Molecular Research, vol. 16, no.1, 2017. doi:10.4238/gmr16019528
R. Ortiz, J. Crossa, J. Franco, R. Sevilla, & J. Burgueño, “Classification of Peruvian highland maize races using plant traits,” Genetic Resources and Crop Evolution, vol. 55, no.1, pp. 151–162, 2008. doi:10.1007/s10722-007-9224-7
G. Padilla, M. E. Cartea, & A. Ordás, “Comparison of several clustering methods in grouping kale landraces,” Journal of the American Society for Horticultural Science, vol. 132, no.3, pp. 387–395, 2007. doi:10.21273/jashs.132.3.387
G. Padilla, M. E. Cartea, V. M. Rodríguez, & A. Ordás, “Genetic diversity in a germplasm collection of Brassica rapa subsp rapa L. from northwestern Spain,” Euphytica, vol. 145, no.1-2, pp. 171–180, 2005. doi:10.1007/s10681-005-0895-x
I. S. Andrade, C. A. F. de Melo, G. H. de Sousa Nunes, I. S. A. Holanda, L. C. Grangeiro, R. X. Corrêa, “Morphoagronomic genetic diversity of Brazilian melon accessions based on fruit traits,” Scientia Horticulturae, vol. 243, pp. 514-523, 2019.
H. Hamid, L. M. Mei, & S. S. S. Yahaya, “New discrimination procedure of location model for handling large categorical variables,” Sains Malaysiana, vol. 46, no.6, pp. 1001–1010, 2017. doi:10.17576/jsm-2017-4606-20
J. J. Daudin, “Selection of Variables in Mixed-Variable Discriminant Analysis,” Biometrics, vol. 42, no.3, pp. 473–481, 1986.
I. G. Vlachonikolis, & F. H. C. Marriott, “Discrimination with mixed binary and continuous data,” Journal of the Royal Statistical Society, Series C (Applied Statistics), vol. 31, no.1, pp. 23–31, 1982.
H. Hamid, P. A. H. Ngu, & F. M. Alipiah, “New smoothed location models integrated with PCA and two types of MCA for handling large number of mixed continuous and binary variables,” Pertanika Journal of Science and Technology, vol. 26, no.1, pp. 247–260, 2018.
H. Hamid, “New location model based on automatic trimming and smoothing approaches,” Journal of Computational and Theoretical Nanoscience, vol. 15, no.2, pp. 493–499, 2018a. doi:10.1166/jctn.2018.7148
H. Hamid, “Winsorized and smoothed estimation of the location model in mixed variables discrimination,” Applied Mathematics and Information Sciences, vol. 12, no.1, pp. 133–138, 2018b. doi:10.18576/amis/120112
Downloads
Published
How to Cite
Issue
Section
License

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
All papers should be submitted electronically. All submitted manuscripts must be original work that is not under submission at another journal or under consideration for publication in another form, such as a monograph or chapter of a book. Authors of submitted papers are obligated not to submit their paper for publication elsewhere until an editorial decision is rendered on their submission. Further, authors of accepted papers are prohibited from publishing the results in other publications that appear before the paper is published in the Journal unless they receive approval for doing so from the Editor-In-Chief.
IJISAE open access articles are licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. This license lets the audience to give appropriate credit, provide a link to the license, and indicate if changes were made and if they remix, transform, or build upon the material, they must distribute contributions under the same license as the original.