Hybrid Gene Selection Method Using Graph Theory and Chaotic Bee Colony Optimization

Authors

  • C. Kondalraj, R. Murugesan

Keywords:

Graph Theory, Gene selection, Chaotic Bee Colony Optimization, Bioinformatics, Multi-Objective Optimization

Abstract

Gene expression data in bioinformatics often suffer from high dimensionality and limited size, impacting the efficacy of data mining and machine learning algorithms. Gene selection methods aim to mitigate this issue by identifying relevant genes while discarding irrelevant or redundant ones. Traditional methods may struggle with accuracy and efficiency in selecting optimal gene subsets. This paper introduces a hybrid approach combining graph theory and Chaotic Bee Colony Optimization (CBCO) for gene selection. Initially, a filter method based on Fisher score reduces the gene pool. Next, genes are represented as nodes in a graph, where relationships construct edges. Graph K-means clustering groups genes into clusters, enhancing diversity. The CBCO algorithm then optimizes gene subset selection based on multiple criteria: classification error, node and edge centrality, specificity, and number of genes selected. A repair operator ensures at least one gene per cluster is chosen, enhancing overall solution robustness. Evaluation on datasets shows a superior classification accuracy and reduced gene selection compared to state-of-the-art methods. For instance, the proposed method achieves an average accuracy improvement of 5% and reduces gene selection by 30% across datasets. The hybrid method effectively addresses gene selection challenges by integrating graph-based clustering and multi-objective CBCO optimization. It surpasses existing techniques by enhancing classification accuracy and reducing computational overhead, demonstrating its potential for improving bioinformatics analyses.

Downloads

Download data is not yet available.

References

Dhas, P. E., Govindaraj, A., & Jyoshna, B. (2024). Spatial clustering based gene selection for gene expression analysis in microarray data classification. Automatika, 65(1), 152-158.

Shesayar, R., Agarwal, A., Taqui, S. N., Natarajan, Y., Rustagi, S., Bharti, S., ... & Sivakumar, S. (2023). Nanoscale molecular reactions in microbiological medicines in modern medical applications. Green Processing and Synthesis, 12(1), 20230055.

Liu, Z., Qiu, H., & Letchmunan, S. (2024). Self-adaptive attribute weighted neutrosophic c-means clustering for biomedical applications. Alexandria Engineering Journal, 96, 42-57.

Dhiman, G., Kumar, A. V., Nirmalan, R., Sujitha, S., Srihari, K., & Raja, R. A. (2023). Multi-modal active learning with deep reinforcement learning for target feature extraction in multi-media image processing applications. Multimedia Tools and Applications, 82(4), 5343-5367.

Sridharan, S., Satheeshkumar, K., Rajesh, R., & Deivasigamani, S. (2024, June). Clustering Method Analysis for Gene Expression Data using Fire Fly Optimization and Simple K-means Algorithm with Machine Learning. In 2024 3rd International Conference on Applied Artificial Intelligence and Computing (ICAAIC) (pp. 373-379). IEEE.

Khan, I., Amin, M. A., Eklund, E. A., & Gartel, A. L. (2024). Regulation of HOX gene expression in AML. Blood cancer journal, 14(1), 42.

Liu, T., Fang, Z., Li, X., Zhang, L., Cao, D. S., Li, M., & Yin, M. (2024). Assembling spatial clustering framework for heterogeneous spatial transcriptomics data with GRAPHDeep. Bioinformatics, 40(1), btae023.

Stathopoulou, K. M., Georgakopoulos, S., Tasoulis, S., & Plagianakos, V. P. (2024). Investigating the overlap of machine learning algorithms in the final results of RNA-seq analysis on gene expression estimation. Health Information Science and Systems, 12(1), 14.

Petegrosso, R., Li, Z., & Kuang, R. (2020). Machine learning and statistical methods for clustering single-cell RNA-sequencing data. Briefings in bioinformatics, 21(4), 1209-1223.

Miao, Z., Moreno, P., Huang, N., Papatheodorou, I., Brazma, A., & Teichmann, S. A. (2020). Putative cell type discovery from single-cell gene expression data. Nature methods, 17(6), 621-628.

Karim, M. R., Beyan, O., Zappa, A., Costa, I. G., Rebholz-Schuhmann, D., Cochez, M., & Decker, S. (2021). Deep learning-based clustering approaches for bioinformatics. Briefings in bioinformatics, 22(1), 393-415.

Shi, Z., Zhu, F., Wang, C., & Min, W. (2024, July). Spatial Gene Expression Prediction from Histology Images with STco. In International Symposium on Bioinformatics Research and Applications (pp. 89-100). Singapore: Springer Nature Singapore.

Vadapalli, S., Abdelhalim, H., Zeeshan, S., & Ahmed, Z. (2022). Artificial intelligence and machine learning approaches using gene expression and variant data for personalized medicine. Briefings in bioinformatics, 23(5), bbac191.

Cong, Y., Shintani, M., Imanari, F., Osada, N., & Endo, T. (2022). A new approach to drug repurposing with two-stage prediction, machine learning, and unsupervised clustering of gene expression. OMICS: A Journal of Integrative Biology, 26(6), 339-347.

Mallick, P. K., Mohapatra, S. K., Chae, G. S., & Mohanty, M. N. (2023). Convergent learning–based model for leukemia classification from gene expression. Personal and Ubiquitous Computing, 27(3), 1103-1110.

GSE12345 Dataset, https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE12345

Downloads

Published

12.06.2024

How to Cite

C. Kondalraj. (2024). Hybrid Gene Selection Method Using Graph Theory and Chaotic Bee Colony Optimization. International Journal of Intelligent Systems and Applications in Engineering, 12(4), 2199 –. Retrieved from https://www.ijisae.org/index.php/IJISAE/article/view/6569

Issue

Section

Research Article