Automatic Speech Emotion Recognition Using Hybrid Deep Learning Techniques
Keywords:
Automatic Speech Emotion Recognition, Deep Learning, Human-Computer Interaction, Convolutional Neural Network, Long Short Term MemoryAbstract
An emerging field of research is the advancement of deep learning techniques for speech emotion recognition. The current scenario of human-computer interaction is being significantly impacted by and altered by speech recognition technologies. In human-computer interaction, developing an interface that can sense and react accurately like a human is one of the main crucial challenges. As a result, the Automatic Speech Emotion Recognition (ASER) system has been developed. It extracts and identifies important data from voice signals to classify various emotional categories. The novel advancements in deep learning have also led to a major improvement in the ASER system's performance. Numerous methods, including some well-known speech analysis and classification approaches, have been used to derive emotions from signals in the literature on ASER. Recently, deep learning methods have been suggested as an alternative to conventional methods in ASER. The main goal of this research is to use deep learning techniques to analyze different emotions from speech. Because deep learning networks have sophisticated feature extraction processes, they are frequently utilized for emotional classification, in advance of traditional/machine learning systems that depend on manual feature extraction before classifying the emotional state. To extract features and identify different emotions depending on input data, the authors have implemented the most efficient hybrid deep learning algorithms, CNN+LSTM. By training and testing the suggested network algorithm with the standard dataset, the authors, accordingly, achieved the highest accuracy.
Downloads
References
Li, S., Li, J., Liu, Q., & Gong, Z. (2022, June). Adversarial speech generation and natural speech recovery for speech content protection. In Proceedings of the Thirteenth Language Resources and Evaluation Conference (pp. 7291-7297).
Langari, S., Marvi, H., & Zahedi, M. (2020). Efficient speech emotion recognition using modified feature extraction. Informatics in Medicine Unlocked, 20, 100424.
Issa, D., Demirci, M. F., & Yazici, A. (2020). Speech emotion recognition with deep convolutional neural networks. Biomedical Signal Processing and Control, 59, 101894.
Mittal, S., Agarwal, S., & Nigam, M. J. (2018, November). Real-time multiple face recognition: A deep learning approach. In Proceedings of the 2018 International Conference on Digital Medicine and Image Processing (pp. 70-76).
Huang, K. Y., Wu, C. H., Hong, Q. B., Su, M. H., & Chen, Y. H. (2019, May). Speech emotion recognition using deep neural network considering verbal and nonverbal speech sounds. In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 5866-5870). IEEE.
Mokgonyane, T. B., Sefara, T. J., Modipa, T. I., Mogale, M. M., Manamela, M. J., & Manamela, P. J. (2019, January). Automatic speaker recognition system based on machine learning algorithms. In 2019 Southern African Universities Power Engineering Conference/Robotics and Mechatronics/Pattern Recognition Association of South Africa (SAUPEC/RobMech/PRASA) (pp. 141-146). IEEE.
Sokolov, A., & Savchenko, A. V. (2019, January). Voice command recognition in intelligent systems using deep neural networks. In 2019 IEEE 17th world symposium on applied machine intelligence and informatics (SAMI) (pp. 113-116). IEEE.
Kwon, M., & Choi, H. J. (2019, February). Automatic speech recognition dataset augmentation with pre-trained model and script. In 2019 IEEE International Conference on Big Data and Smart Computing (BigComp) (pp. 1-3). IEEE.
Singh, A. P., Nath, R., & Kumar, S. (2018, November). A survey: Speech recognition approaches and techniques. In 2018 5th IEEE Uttar Pradesh Section International Conference on Electrical, Electronics and Computer Engineering (UPCON) (pp. 1-4). IEEE.
Byun, K., Song, E., Kim, J., Kim, J. M., & Kang, H. G. (2019, June). Excitation-by-SampleRNN Model for Text-to-Speech. In 2019 34th International Technical Conference on Circuits/Systems, Computers and Communications (ITC-CSCC) (pp. 1-4). IEEE.
Watanabe, S., Hori, T., Karita, S., Hayashi, T., Nishitoba, J., Unno, Y., ... & Ochiai, T. (2018). Espnet: End-to-end speech processing toolkit. arXiv preprint arXiv:1804.00015.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. Advances in neural information processing systems, 30.
Chan, W., Jaitly, N., Le, Q., & Vinyals, O. (2016, March). Listen, attend and spell: A neural network for large vocabulary conversational speech recognition. In 2016 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 4960-4964). IEEE.
Hussain, M., Abishek, S., Ashwanth, K. P., Bharanidharan, C., & Girish, S. (2021, May). Feature Specific Hybrid Framework on composition of Deep learning architecture for speech emotion recognition. In Journal of Physics: Conference Series (Vol. 1916, No. 1, p. 012094). IOP Publishing.
Chen, M., He, X., Yang, J., & Zhang, H. (2018). 3-D convolutional recurrent neural networks with attention model for speech emotion recognition. IEEE Signal Processing Letters, 25(10), 1440-1444.
Peng, Z., Zhu, Z., Unoki, M., Dang, J., & Akagi, M. (2018, July). Auditory-inspired end-to-end speech emotion recognition using 3D convolutional recurrent neural networks based on spectral-temporal representation. In 2018 IEEE International Conference on Multimedia and Expo (ICME) (pp. 1-6). IEEE.
Peng, Z., Li, X., Zhu, Z., Unoki, M., Dang, J., & Akagi, M. (2020). Speech emotion recognition using 3d convolutions and attention-based sliding recurrent networks with auditory front-ends. IEEE Access, 8, 16560-16572.
Zhao, Z., Zheng, Y., Zhang, Z., Wang, H., Zhao, Y., & Li, C. (2018). Exploring spatio-temporal representations by integrating attention-based bidirectional-LSTM-RNNs and FCNs for speech emotion recognition.
Zhao, Z., Bao, Z., Zhao, Y., Zhang, Z., Cummins, N., Ren, Z., & Schuller, B. (2019). Exploring deep spectrum representations via attention-based recurrent and convolutional neural networks for speech emotion recognition. IEEE Access, 7, 97515-97525.
Li, R., Wu, Z., Jia, J., Zhao, S., & Meng, H. (2019, May). Dilated residual network with multi-head self-attention for speech emotion recognition. In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 6675-6679). IEEE.
Zhao, J., Mao, X., & Chen, L. (2019). Speech emotion recognition using deep 1D & 2D CNN LSTM networks. Biomedical signal processing and control, 47, 312-323.
Lotfian, R., & Busso, C. (2019). Over-sampling emotional speech data based on subjective evaluations provided by multiple individuals. IEEE Transactions on Affective Computing, 12(4), 870-882.
Li, Y., Zhao, T., & Kawahara, T. (2019, September). Improved End-to-End Speech Emotion Recognition Using Self Attention Mechanism and Multitask Learning. In Interspeech (pp. 2803-2807).
Zayene, B., Jlassi, C., & Arous, N. (2020, September). 3D convolutional recurrent global neural network for speech emotion recognition. In 2020 5th International Conference on Advanced Technologies for Signal and Image Processing (ATSIP) (pp. 1-5). IEEE.
Shaw, P., Uszkoreit, J., & Vaswani, A. (2018). Self-attention with relative position representations. arXiv preprint arXiv:1803.02155.
Muruganandam, S., Joshi, R., Suresh, P., Balakrishna, N., Kishore, K. H., & Manikanthan, S. V. (2023). A deep learning based feed forward artificial neural network to predict the K-barriers for intrusion detection using a wireless sensor network. Measurement: Sensors, 25, 100613.
Downloads
Published
How to Cite
Issue
Section
License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
All papers should be submitted electronically. All submitted manuscripts must be original work that is not under submission at another journal or under consideration for publication in another form, such as a monograph or chapter of a book. Authors of submitted papers are obligated not to submit their paper for publication elsewhere until an editorial decision is rendered on their submission. Further, authors of accepted papers are prohibited from publishing the results in other publications that appear before the paper is published in the Journal unless they receive approval for doing so from the Editor-In-Chief.
IJISAE open access articles are licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. This license lets the audience to give appropriate credit, provide a link to the license, and indicate if changes were made and if they remix, transform, or build upon the material, they must distribute contributions under the same license as the original.