Explainable AI Models for Voice-Based Content Classification in Streaming Television Platforms
Keywords:
Explainable AI, Voice Classification, Audio Features, Streaming Platforms, Speech Analysis and Deep Learning.Abstract
The swift growth of streaming television platforms has made it possible to have a tremendous amount of audiovisual content that needs to be classified efficiently and transparently for recommending, moderating, and making the content accessible. In this paper, we put forward an explainable artificial intelligence (XAI) system for voice, based content classification that detects categories of programs by using speech signals extracted from streaming media. Our framework combines deep audio feature extraction with interpretable machine learning techniques that result in both excellent prediction performance and transparency of the decision, making process. Speech fragments are initially handled with the Mel, frequency cepstral coefficients (MFCCs) and spectral embeddings. These are later given to a hybrid structure that integrates a convolutional neural network (CNN) and a transformer, based attention layer. To strengthen the interpretability, the SHAP, driven explanation parts emphasize the most significant acoustic features and temporal voice patterns that have a great impact on classification decisions. The tool is test on a multi, genre streaming dataset comprising news sports entertainment, documentaries, and advertisements. The experimental results show excellent results and interpretability gains. The suggested model obtains 92.6% classification accuracy, which is 11.4% higher than that of baseline audio classifiers. The explainability module correctly detects essential speech signals with 89% explanation consistency. Furthermore, inference latency is cut by 27%, making near real, time deployment possible. Moreover. This model helps to improve genre recommendation precision by 18% in simulated streaming situations.
Downloads
References
Preethi, P., Saravanan, T., Mohanraj, R., & Gayathri, P. G. (2024). A real-time environmental air pollution predictor model using a dense deep learning approach in IoT infrastructure. GLOBAL NEST JOURNAL, 26(3).
Pamulaparthyvenkata, S., Sharma, J., Dattangire, R., Vishwanath, M., Mulukuntla, S., Preethi, P., & Indhumathi, N. (2024, June). Deep Learning and EHR-Driven Image Processing Frame- work for Lung Infection Detection in Healthcare Applications. In 2024 15th International Conference on Computing Communication and Networking Technologies (ICCCNT) (pp. 1-7). IEEE.
Raj, R. R. M., Saravanan, T., Preethi, P., & Ezhilarasi, I. (2022). Comparative evaluation of efficacy of therapeutic ultrasound and phonophoresis in myofascial pain dysfunction syn- drome. Journal of Indian Academy of Oral Medicine and Radiology, 34(3), 242-245.
Raza, A. (2025). The application of artificial intelligence in credit risk evaluation: Obstacles and opportunities in path to financial justice. Center for Management Science Research, 3(2), 240-251.
Chohan, M. A., Farooqi, M. A., Raza, A., Rasheed, M. N., & Shahzad, K. (2024). ARTIFICIAL INTELLIGENCE AND INTELLECTUAL PROPERTY RIGHTS: FROM CONTENT CREATION TO OWNERSHIP.
Raza, A., & Bashir, N. (2023). Artificial intelligence as a creator and inventor: legal challenges and protections in copyright, patent, and trademark law. Artificial Intelligence as a Creator and Inventor: Legal Challenges and Protections in Copyright, Patent, and Trademark Law (December 31, 2023).
Singh, B. (2023). Software-Defined Data Centers: Innovations in Network Architecture for High Availability. Available at SSRN 5331661.
Tiwari, “MFCC and its applications in speaker recognition,” 2010.
Gourisaria et al., “Comparative analysis of audio classification with MFCC and STFT features,”2023.
Chu et al., “A CNN sound classification mechanism using data augmentation,” 2023.
Costantini et al., “High-level CNN and machine learning methods for speaker recognition,” 2023.
Ouyang, “Speech emotion detection based on MFCC and CNN-LSTM architecture,” 2023.
Gong et al., “Self-Supervised Audio Spectrogram Transformer (SSAST),” 2021.
Vu et al., “Toward end-to-end interpretable convolutional neural networks for waveform signals,” 2024.
Gong, Y., Chung, Y. A., & Glass, J. (2021). AST: Audio spectrogram transformer. Proceedings of the Interspeech Conference, 571–575.
Ansari, M. I., & Hasan, T. (2022). SpectNet: End-to-end audio signal classification using learnable spectrograms. arXiv preprint arXiv:2211.09352.
Lata, S. (2024). A comparative analysis of CNN-LSTM and MFCC-LSTM for sentiment recognition from speech signals. International Journal of Intelligent Systems and Applications in Engineering, 12(21s), 4392–4400.
Xu, C. (2024). Neural networks for audio classification: Multi-scale CNN-LSTM approach to animal sound recognition. Applied and Computational Engineering, 89, 172–177.
Li, P., Wu, J., Wang, Y., Lan, Q., & Xiao, W. (2022). Spectrogram transformer model for underwater acoustic target recognition. Journal of Marine Science and Engineering, 10(10), 1428.
Hershey, S., Chaudhuri, S., Ellis, D., Gemmeke, J., Jansen, A., Moore, R., et al. (2017). CNN architectures for large-scale audio classification. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 131–135.
Tokozume, Y., Ushiku, Y., & Harada, T. (2017). Learning from between-class examples for deep sound recognition. International Conference on Learning Representations (ICLR).
Ravanelli, M., & Bengio, Y. (2018). Speaker recognition from raw waveform with SincNet. IEEE Spoken Language Technology Workshop (SLT), 1021–1028.
Downloads
Published
How to Cite
Issue
Section
License

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
All papers should be submitted electronically. All submitted manuscripts must be original work that is not under submission at another journal or under consideration for publication in another form, such as a monograph or chapter of a book. Authors of submitted papers are obligated not to submit their paper for publication elsewhere until an editorial decision is rendered on their submission. Further, authors of accepted papers are prohibited from publishing the results in other publications that appear before the paper is published in the Journal unless they receive approval for doing so from the Editor-In-Chief.
IJISAE open access articles are licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. This license lets the audience to give appropriate credit, provide a link to the license, and indicate if changes were made and if they remix, transform, or build upon the material, they must distribute contributions under the same license as the original.


