Importance of Artificial Intelligence in Neural Network: Speech Signal Segmentation Using K-Means Clustering with Kernelized Deep Belief Networks
Keywords:
speech processing, segmentation, deep learning, K-means C, KDBNAbstract
There has been a tonne of study on use of ML for speech processing applications, particularly voice recognition, over the past few decades. This study suggested an innovative method in speech signal processing and segmentation relied upon deep learning configurations. As input, this speech signal has been accumulated from the crime scene and this signal has been pre-processed using using K-means clustering (K-means C)for cluster the fragments of the input speech signal and process them for noise removal and signal artifacts removal. Here the segmentation is carried out for processed signal using Kernel based deep belief networks (KDBN). Experimental results demonstrate that proposed method outperforms the input speech signal based on both weighted accuracy (WA) and unweighted accuracy (UA).
Downloads
References
Arias-Vergara, T., Klumpp, P., Vasquez-Correa, J. C., Noeth, E., Orozco-Arroyave, J. R., & Schuster, M. (2021). Multi-channel spectrograms for speech processing applications using deep learning methods. Pattern Analysis and Applications, 24(2), 423-431.
Krecichwost, M., Mocko, N., & Badura, P. (2021). Automated detection of sigmatism using deep learning applied to multichannel speech signal. Biomedical Signal Processing and Control, 68, 102612.
Yang, X. K., Qu, D., Zhang, W. L., & Zhang, W. Q. (2018). An adapted data selection for deep learning-based audio segmentation in multi-genre broadcast channel. Digital Signal Processing, 81, 8-15.
Santhanavijayan, A., Naresh Kumar, D., & Deepak, G. (2021). A semantic-aware strategy for automatic speech recognition incorporating deep learning models. In Intelligent system design (pp. 247-254). Springer, Singapore.
Papakostas, M., & Giannakopoulos, T. (2018). Speech-music discrimination using deep visual feature extractors. Expert Systems with Applications, 114, 334-344.
França, R. P., Monteiro, A. C. B., Arthur, R., & Iano, Y. (2021). An overview of deep learning in big data, image, and signal processing in the modern digital age. Trends in Deep Learning Methodologies, 63-87.
Yao, Z., Wang, Z., Liu, W., Liu, Y., & Pan, J. (2020). Speech emotion recognition using fusion of three multi-task learning-based classifers: HSF-DNN, MS-CNN and LLD-RNN. Speech Communication, 120, 11–19.
Sharmadha, S., Shivani, K., Shruthi, K., Bharathi, B., and Kavitha, S. (2020) “Automatic speech recognition using Deep Neural Network”. International Conference on Soft Computing and Signal Processing, Hyderabad, India, pp. 353–361
Korkmaz, Y., & Boyacı, A. (2022). milVAD: A bag-level MNIST modelling of voice activity detection using deep multiple instance learning. Biomedical Signal Processing and Control, 74, 103520.
Li, X., Ma, D., & Yin, B. (2021). Advance research in agricultural text-to-speech: the word segmentation of analytic language and the deep learning-based end-to-end system. Computers and Electronics in Agriculture, 180, 105908.

Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2023 V. Kakulapati, Gagandeep Singh Gill, Chandramma R., Sambhrant Srivastava, Meenakshi Sharma, Vijay Kumar

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
All papers should be submitted electronically. All submitted manuscripts must be original work that is not under submission at another journal or under consideration for publication in another form, such as a monograph or chapter of a book. Authors of submitted papers are obligated not to submit their paper for publication elsewhere until an editorial decision is rendered on their submission. Further, authors of accepted papers are prohibited from publishing the results in other publications that appear before the paper is published in the Journal unless they receive approval for doing so from the Editor-In-Chief.
IJISAE open access articles are licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. This license lets the audience to give appropriate credit, provide a link to the license, and indicate if changes were made and if they remix, transform, or build upon the material, they must distribute contributions under the same license as the original.