Explainable AI Models for Voice-Based Content Classification in Streaming Television Platforms

Saraschandra Arveti

Authors

Saraschandra Arveti, Anish Hadkar, Mani Teja Nutalapati

Keywords:

Explainable AI, Voice Classification, Audio Features, Streaming Platforms, Speech Analysis and Deep Learning.

Abstract

The swift growth of streaming television platforms has made it possible to have a tremendous amount of audiovisual content that needs to be classified efficiently and transparently for recommending, moderating, and making the content accessible. In this paper, we put forward an explainable artificial intelligence (XAI) system for voice, based content classification that detects categories of programs by using speech signals extracted from streaming media. Our framework combines deep audio feature extraction with interpretable machine learning techniques that result in both excellent prediction performance and transparency of the decision, making process. Speech fragments are initially handled with the Mel, frequency cepstral coefficients (MFCCs) and spectral embeddings. These are later given to a hybrid structure that integrates a convolutional neural network (CNN) and a transformer, based attention layer. To strengthen the interpretability, the SHAP, driven explanation parts emphasize the most significant acoustic features and temporal voice patterns that have a great impact on classification decisions. The tool is test on a multi, genre streaming dataset comprising news sports entertainment, documentaries, and advertisements. The experimental results show excellent results and interpretability gains. The suggested model obtains 92.6% classification accuracy, which is 11.4% higher than that of baseline audio classifiers. The explainability module correctly detects essential speech signals with 89% explanation consistency. Furthermore, inference latency is cut by 27%, making near real, time deployment possible. Moreover. This model helps to improve genre recommendation precision by 18% in simulated streaming situations.

Downloads

Download data is not yet available.

References

Preethi, P., Saravanan, T., Mohanraj, R., & Gayathri, P. G. (2024). A real-time environmental air pollution predictor model using a dense deep learning approach in IoT infrastructure. GLOBAL NEST JOURNAL, 26(3).

Pamulaparthyvenkata, S., Sharma, J., Dattangire, R., Vishwanath, M., Mulukuntla, S., Preethi, P., & Indhumathi, N. (2024, June). Deep Learning and EHR-Driven Image Processing Frame- work for Lung Infection Detection in Healthcare Applications. In 2024 15th International Conference on Computing Communication and Networking Technologies (ICCCNT) (pp. 1-7). IEEE.

Raj, R. R. M., Saravanan, T., Preethi, P., & Ezhilarasi, I. (2022). Comparative evaluation of efficacy of therapeutic ultrasound and phonophoresis in myofascial pain dysfunction syn- drome. Journal of Indian Academy of Oral Medicine and Radiology, 34(3), 242-245.

Raza, A. (2025). The application of artificial intelligence in credit risk evaluation: Obstacles and opportunities in path to financial justice. Center for Management Science Research, 3(2), 240-251.

Chohan, M. A., Farooqi, M. A., Raza, A., Rasheed, M. N., & Shahzad, K. (2024). ARTIFICIAL INTELLIGENCE AND INTELLECTUAL PROPERTY RIGHTS: FROM CONTENT CREATION TO OWNERSHIP.

Raza, A., & Bashir, N. (2023). Artificial intelligence as a creator and inventor: legal challenges and protections in copyright, patent, and trademark law. Artificial Intelligence as a Creator and Inventor: Legal Challenges and Protections in Copyright, Patent, and Trademark Law (December 31, 2023).

Singh, B. (2023). Software-Defined Data Centers: Innovations in Network Architecture for High Availability. Available at SSRN 5331661.

Tiwari, “MFCC and its applications in speaker recognition,” 2010.

Gourisaria et al., “Comparative analysis of audio classification with MFCC and STFT features,”2023.

Chu et al., “A CNN sound classification mechanism using data augmentation,” 2023.

Costantini et al., “High-level CNN and machine learning methods for speaker recognition,” 2023.

Ouyang, “Speech emotion detection based on MFCC and CNN-LSTM architecture,” 2023.

Gong et al., “Self-Supervised Audio Spectrogram Transformer (SSAST),” 2021.

Vu et al., “Toward end-to-end interpretable convolutional neural networks for waveform signals,” 2024.

Gong, Y., Chung, Y. A., & Glass, J. (2021). AST: Audio spectrogram transformer. Proceedings of the Interspeech Conference, 571–575.

Ansari, M. I., & Hasan, T. (2022). SpectNet: End-to-end audio signal classification using learnable spectrograms. arXiv preprint arXiv:2211.09352.

Lata, S. (2024). A comparative analysis of CNN-LSTM and MFCC-LSTM for sentiment recognition from speech signals. International Journal of Intelligent Systems and Applications in Engineering, 12(21s), 4392–4400.

Xu, C. (2024). Neural networks for audio classification: Multi-scale CNN-LSTM approach to animal sound recognition. Applied and Computational Engineering, 89, 172–177.

Li, P., Wu, J., Wang, Y., Lan, Q., & Xiao, W. (2022). Spectrogram transformer model for underwater acoustic target recognition. Journal of Marine Science and Engineering, 10(10), 1428.

Hershey, S., Chaudhuri, S., Ellis, D., Gemmeke, J., Jansen, A., Moore, R., et al. (2017). CNN architectures for large-scale audio classification. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 131–135.

Tokozume, Y., Ushiku, Y., & Harada, T. (2017). Learning from between-class examples for deep sound recognition. International Conference on Learning Representations (ICLR).

Ravanelli, M., & Bengio, Y. (2018). Speaker recognition from raw waveform with SincNet. IEEE Spoken Language Technology Workshop (SLT), 1021–1028.

Explainable AI Models for Voice-Based Content Classification in Streaming Television Platforms

Authors

Keywords:

Abstract

Downloads

References

Downloads

Published

How to Cite

Issue

Section

License

Similar Articles

Announcements

Information for Authors

ijisae

Information

Indexed By

Explainable AI Models for Voice-Based Content Classification in Streaming Television Platforms

Authors

Keywords:

Abstract

Downloads

References

Downloads

Published

How to Cite

Issue

Section

License

Similar Articles

Announcements

Information for Authors

Like, Subscribe and Share This Video

ijisae

Information

Indexed By