Automating Document Narration: A Deep Learning-Based Speech Captioning System for Visually impaired Person
Keywords:
Assistive Technology, Deep Learning LSTM, Image Captioning Optical Character Recognition (OCR) Text-to-Speech (TTS)Abstract
This paper presents an innovative deep learning-based speech captioning system designed to enhance document accessibility for visually impaired individuals. Leveraging a modular architecture that integrates Convolutional Neural Networks (ResNet50), Long Short-Term Memory (LSTM) networks, Optical Character Recognition (OCR), and Text-to-Speech (TTS) technology, the system transforms both textual and visual content from printed documents into real-time, natural-sounding audio. The proposed framework employs image preprocessing and intelligent segmentation techniques to distinguish between text and image regions, followed by content-specific processing—text is extracted via Tesseract OCR, while visual regions are described using an image captioning model based on ResNet-LSTM integration. The summarized content is then converted into speech using the Google TTS API. A custom-built hardware assembly with a mobile-mounted camera, adjustable alignment, and portable design ensures ease of use in real-world settings.
Experimental evaluation on a 100-document dataset demonstrates high accuracy rates— 94% for text recognition, 91% for image detection, and 89% for caption generation. Quality assessments reveal minimal error margins, affirming the system’s reliability and effectiveness. This study underscores the potential of AI-driven multimodal solutions in promoting inclusive information access and enabling independent navigation of printed materials by visually impaired users. Future work will focus on real-time enhancements and participatory design inputs from end users to further optimize the system’s usability and impact.
Downloads
References
WHO, “Blindness and Vision Impairment,” Sustainable Development Goals Series, 2023.https://www.who.int/news-room/fact-sheets/detail/blindness-and-visual- impairment (accessed Mar. 31, 2024).
H. Akula, G. D. Reddy, and M. Kolla, “Assistive System for the Visually Impaired using Multiple Cameras and Sensors,” in Proceedings of 4th International Conference on Cybernetics, Cognition and Machine Learning Applications, ICCCMLA 2022, 2022, pp. 363–369. doi: 10.1109/ICCCMLA56841.2022.9989288.
S. C. Jakka, Y. V. Sai, A. Jesudoss, and A. Viji Amutha Mary, “Blind Assistance System using Tensor Flow,” 3rd Int. Conf. Electron. Sustain. Commun. Syst. ICESC 2022 - Proc., no. Icesc, pp. 1505–1511, 2022, doi: 10.1109/ICESC54411.2022.9885356.
S. Zulaikha Beevi, P. Harish Kumar, S. Harish, and S. J. Lakshan, “Decision Making Algorithm for Blind Navigation Assistance using Deep Learning,” in 2022 1st International Conference on Computational Science and Technology, ICCST 2022 - Proceedings, 2022, pp. 268–272. doi: 10.1109/ICCST55948.2022.10040269.
A. Saicharan, C. Jayalakshmi, B. Sowjanya, K. Raveendra, and M. P. Aslam, “Breaking Boundaries: Advancing Accessibility with Camera Vision to Voice Object Recognition for the Visually Impaired,” in Proceedings of 2nd International Conference on Advancements in Smart, Secure and Intelligent Computing, ASSIC 2024, 2024. doi: 10.1109/ASSIC60049.2024.10507906.
A. Navhule, Akif, K. Byndoor, A. N. Mohammed Nayaz, D. Shetty, and H. Sarojadevi, “Citizen Cane - An Object Detection and Image Description System for the Visually Impaired,” in Proceedings of the 2024 3rd Edition of IEEE Delhi Section Flagship Conference, DELCON 2024, 2024. doi: 10.1109/DELCON64804.2024.10866517.
A. Kariri and K. Elleithy, “Astute Support System for Visually Impaired and Blind with Highest Intersection over Union for Object Detection and Recognition with Voice Feedback,” in 2024 IEEE Long Island Systems, Applications and Technology Conference, LISAT 2024, 2024. doi: 10.1109/LISAT63094.2024.10807856.
K. M. Safiya and R. Pandian, “Computer Vision and Voice Assisted Image Captioning Framework for Visually Impaired Individuals using Deep Learning Approach,” in 2023 4th IEEE Global Conference for Advancement in Technology, GCAT 2023, 2023. doi: 10.1109/GCAT59970.2023.10353449.
T. Ghandi, H. Pourreza, and H. Mahyar, “Deep Learning Approaches on Image Captioning: A Review,” ACM Comput. Surv., vol. 56, no. 3, Mar. 2023, doi: 10.1145/3617592.
P. Sharma, “Generating Caption From Images Using Flickr Image Dataset,” 2024 15th Int. Conf. Comput. Commun. Netw. Technol., pp. 1–7, 2024, doi: 10.1109/ICCCNT61001.2024.10724963.
S. Das and R. Sharma, “A TextGCN-Based Decoding Approach for Improving Remote Sensing Image Captioning,” IEEE Geosci. Remote Sens. Lett., 2024, doi: 10.1109/LGRS.2024.3523134.
K. Jivrajani et al., “AIoT-Based Smart Stick for Visually Impaired Person,” IEEE Trans. Instrum. Meas., vol. 72, 2023, doi: 10.1109/TIM.2022.3227988.
2019 IEEE International Conference on Signal Processing, Information, Communication & Systems (SPICSCON) : 28th-30th November, FARS Hotel and Resorts, Dhaka, Bangladesh. IEEE, 2019.
S. Ji, V. Jayaswal, K. Deeksha, S. Kumari, A. Kumar, and P. Bhagat, “A Novel
Approach for Image Captioning using Deep Learning Techniques,” in 2024 1st International Conference on Advanced Computing and Emerging Technologies, ACET 2024, 2024. doi: 10.1109/ACET61898.2024.10730753.
Q. Mohd, I. Hussain, C. V. S. Satyamurty, and R. K. Godi, “Enhancing Accessibility : Image Captioning for Visually Impaired Individuals in the Realm of ECE Advancements,” 2024 4th Int. Conf. Technol. Adv. Comput. Sci., pp. 317–321, 2024, doi: 10.1109/ICTACS62700.2024.10840791.
S. Samundeswari, V. Lalitha, V. Archana, and K. Sreshta, “OPTICAL CHARACTER RECOGNITION for VISUALL YCHALLENGED PEOPLE with SHOPPING CART
USING AI,” 2022 Int. Virtual Conf. Power Eng. Comput. Control Dev. Electr. Veh. Energy Sect. Sustain. Futur. PECCON 2022, 2022, doi: 10.1109/PECCON55017.2022.9851037.
F. Sen Apu, F. I. Joyti, M. A. U. Anik, M. W. U. Zobayer, A. K. Dey, and S. Sakhawat, “Text and voice to braille translator for blind people,” 2021 Int. Conf. Autom. Control Mechatronics Ind. 4.0, ACMI 2021, vol. 0, no. July, pp. 8–9, 2021, doi: 10.1109/ACMI53878.2021.9528283.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Pritam Langde, Shrinivas Patil, Prachi Langde

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
All papers should be submitted electronically. All submitted manuscripts must be original work that is not under submission at another journal or under consideration for publication in another form, such as a monograph or chapter of a book. Authors of submitted papers are obligated not to submit their paper for publication elsewhere until an editorial decision is rendered on their submission. Further, authors of accepted papers are prohibited from publishing the results in other publications that appear before the paper is published in the Journal unless they receive approval for doing so from the Editor-In-Chief.
IJISAE open access articles are licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. This license lets the audience to give appropriate credit, provide a link to the license, and indicate if changes were made and if they remix, transform, or build upon the material, they must distribute contributions under the same license as the original.