Empowering Accented Speech Analysis in Malayalam Through Cutting-Edge Fusion of Self Supervised Learning and Autoencoders
Keywords:
Autoencoders, self-supervised learning, human-computer interface, Accented speech recognition, Malayalam speech recognitionAbstract
This research explores the application of autoencoders in handling accented speech data for the Malayalam language. The primary objective is to leverage the power of autoencoders to learn a compressed representation of the input data and utilize it to train various machine learning models for improved accuracy rates and reduced word error rates (WER). The study involves a two-step process. Firstly, an autoencoder neural network architecture is employed to encode the accented speech data into a lower-dimensional latent space representation. The encoder network effectively captures the essential features and patterns present in the data. The compressed representation obtained from the encoder is then fed into the decoder, which reconstructs the original input data. In the second step, the encoded model is utilized to train several machine learning models, including logistic regression, decision tree classifier, support vector machine (SVM), random forest classifier(RFC), K-nearest neighbors (KNN), stochastic gradient descent (SGD), and multilayer perceptron (MLP). The encoded features act as inputs to these models, enabling them to learn from the compact representation of the accented speech data. Experimental results indicate that the trained machine learning models, using the encoded features, achieve higher accuracy rates compared to traditional approaches. This improvement in accuracy demonstrates the effectiveness of autoencoders in capturing and representing the significant characteristics of the accented speech data. Moreover, the utilization of the encoded model also leads to lower word error rates, indicating enhanced performance in accurately transcribing and recognizing accented speech in the Malayalam language. This finding showcases the potential of autoencoders in improving the overall accuracy and efficiency of speech-processing tasks for accented languages.
Downloads
References
Sahu, S., Gupta, R., Sivaraman, G., AbdAlmageed, W., & Espy-Wilson, C. Y. (2017). Adversarial Auto-Encoders for Speech Based Emotion Recognition. https://doi.org/10.21437/interspeech.2017-1421
Lee, H., Huang, P., Cheng, Y., & Wang, H. (2022). Chain-based Discriminative Autoencoders for Speech Recognition. arXiv (Cornell University). https://doi.org/10.48550/arxiv.2203.13687
Deng, J., Xu, X., Zhang, Z., Frühholz, S., & Schuller, B. (2018). Semi supervised Autoencoders for Speech Emotion Recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 26(1), 31–43. https://doi.org/10.1109/taslp.2017.2759338
Karita, S., Watanabe, S., Iwata, T., Delcroix, M., Ogawa, A., & Nakatani, T. (2019). Semisupervised End-to-end Speech Recognition Using Text-to-speech and Autoencoders. https://doi.org/10.1109/icassp.2019.8682890
Huang, P., Xu, H., Li, J., Baevski, A., Auli, M., Galuba, W., Metze, F., & Feichtenhofer, C. (2022). Masked Autoencoders that Listen. arXiv (Cornell University). https://doi.org/10.48550/arxiv.2207.06405
Atmaja, B. T., & Sasou, A. (2022). Evaluating Self-Supervised Speech Representations for Speech Emotion Recognition. IEEE Access, 10, 124396–124407. https://doi.org/10.1109/access.2022.3225198
Peng, S., Kai, C., Tian, T., & Jingying, C. (2022). An autoencoder-based feature level fusion for speech emotion recognition. Digital Communications and Networks. https://doi.org/10.1016/j.dcan.2022.10.018
Bastanfard, A., & Abbasian, A. (2023). Speech emotion recognition in Persian based on stacked autoencoder by comparing local and global features. Multimedia Tools and Applications. https://doi.org/10.1007/s11042-023-15132-3
Ying, Y., Tu, Y., & Zhou, H. (2021). Unsupervised Feature Learning for Speech Emotion Recognition Based on Autoencoder. Electronics, 10(17), 2086. https://doi.org/10.3390/electronics10172086
Barkani, F., Hamidi, M., Laaidi, N. et al. Amazigh speech recognition based on the Kaldi ASR toolkit. Int. j. inf. tecnol. (2023). https://doi.org/10.1007/s41870-023-01354-z
Abou-Loukh, S. J. . and Abdul-Razzaq, S. M. . (2023) “Isolated Word Speech Recognition Using Mixed Transform”, Journal of Engineering, 19(10), pp. 1271–1286. doi: 10.31026/j.eng.2013.10.06.
Al Dujaili, M.J., Ebrahimi-Moghadam, A. Automatic speech emotion recognition based on hybrid features with ANN, LDA and K_NN classifiers. Multimed Tools Appl (2023). https://doi.org/10.1007/s11042-023-15413-x
Downloads
Published
How to Cite
Issue
Section
License

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
All papers should be submitted electronically. All submitted manuscripts must be original work that is not under submission at another journal or under consideration for publication in another form, such as a monograph or chapter of a book. Authors of submitted papers are obligated not to submit their paper for publication elsewhere until an editorial decision is rendered on their submission. Further, authors of accepted papers are prohibited from publishing the results in other publications that appear before the paper is published in the Journal unless they receive approval for doing so from the Editor-In-Chief.
IJISAE open access articles are licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. This license lets the audience to give appropriate credit, provide a link to the license, and indicate if changes were made and if they remix, transform, or build upon the material, they must distribute contributions under the same license as the original.