Leveraging Contextual Factors for Word Sense Disambiguation in Hindi Language
Keywords:
Word Sense Disambiguation, MuRIL, INLTKAbstract
This study presents an unsupervised model for addressing word sense disambiguation, to leverage accurate determination of the intended meaning of a word within a sentence. Identification of the correct sense demands high precision for applications like Machine translation, information retrieval, question answering, sentiment analysis, summarization, language generation. In recent years, few developments have been done in this field specifically for Indian languages. The unavailability of large labelled corpora poses a great challenge to applying large language models to this disambiguation task. Our approach leverages the deep learning BERT-based MuRIL model and measuring the Euclidean distance between synsets of words with multiple senses, achieving an accuracy of 89%. Second, we have curated a dataset based on the Indian theories of meanings which uses contextual factors for disambiguating the exact meaning of a word. The outcomes of this study offer valuable insights into the capabilities of language models applied to Indian languages, and their potential in reducing linguistic ambiguity.
Downloads
References
Lesk M., “Automatic Sense Disambiguation using Machine Readable Dictionaries: How to Tell a Pine Cone from an Ice Cream Cone” in Proceedings of the 5th Annual International Conference on Systems Documentation, Ontario, Canada, pp. 24-26, 1986.
Baldwin T., Kim S., Bond F., Fujita S., Martinez D., and Tanaka T., “A Reexamination of MRDbased Word Sense Disambiguation,” Journal of ACM Transactions on Asian Language Processing, vol. 9, no. 1, pp. 1-21, 2010.
Gaurav S Tomar, Manmeet Singh, Shishir Rai, Atul Kumar, Ratna Sanyal and Sudip S, “Probabilistic Latent Semantic Analysis for Unsupervised Word Sense Disambiguation” in International Journal of Computer Science Issues, Vol. 10, Issue 5, 2013
Banerjee S. and Pederson T., “An Adapted Lesk Algorithm for Word Sense Disambiguation using WordNet,” in Proceedings of the 3rd International Conference on Computational Linguistics and Intelligent Text Processing, Mexico City, Mexico, pp. 136-145, 2002.
Banerjee S. and Pederson T., “Extended Gloss Overlaps as a Measure of Semantic Relatedness,” available at: http://www.d.umn.edu/~tpederse/ Pubs/ijcai03.pdf, last visited 2013.
Vasilescu F., Langlasi P., and Lapalme G., “Evaluating Variants of the Lesk Approach for Disambiguating Words,” available at: http://www. lrec-conf.org/proceedings/lrec2004/pdf/219.pdf, last visited 2012.
Zhang, D. Q., Chen, S. C. (2003), “Clustering incomplete data using kernel-based fuzzy c-means algorithm”, Neural Processing Letters, 18 (3) 155-162.
Satyendr Singh and Tanveer Siddiqui, “Utilizing Corpus Statistics for Hindi Word Sense Disambiguation”, In The International Arab Journal of Information Technology, Vol. 12, No. 6A, 2015
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. “Bert: Pre-training of deep bidirectional transformers for language understanding”, arXiv preprint arXiv:1810.04805
Luyao Huang, Chi Sun, Xipeng Qiu∗ , Xuanjing Huang, “GlossBERT: BERT for Word Sense Disambiguation with Gloss Knowledge”, In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, pages 3509–3514, Hong Kong, China, November 3–7, 2019
Simran Khanuj, Diksha Bansal, Sarvesh Mehtani, Savya Khosla, Atreyee Dey, Balaji Gopalan, Dilip Kumar Margam, Pooja Aggarwal, Rajiv Teja Nagipogu, Shachi Dave, Shruti Gupta, Subhash Chandra Bose Gali, Vish Subramanian, Partha Talukdar, “MuRIL: Multilingual Representations for Indian Languages”, In arXiv:2103.10730v2 [cs.CL] 2 Apr 2021
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. “Bert: Pre-training of deep bidirectional transformers for language understanding”, arXiv preprint arXiv:1810.04805
Luyao Huang, Chi Sun, Xipeng Qiu∗ , Xuanjing Huang, “GlossBERT: BERT for Word Sense Disambiguation with Gloss Knowledge”, In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, pages 3509–3514, Hong Kong, China, November 3–7, 2019
Simran Khanuj, Diksha Bansal, Sarvesh Mehtani, Savya Khosla, Atreyee Dey, Balaji Gopalan, Dilip Kumar Margam, Pooja Aggarwal, Rajiv Teja Nagipogu, Shachi Dave, Shruti Gupta, Subhash Chandra Bose Gali, Vish Subramanian, Partha Talukdar, “MuRIL: Multilingual Representations for Indian Languages”, In arXiv:2103.10730v2 [cs.CL] 2 Apr 2021
Akhtar, S.S., Gupta, A., Vajpayee, A., Srivastava, A., Shrivastava, M., 2017,pp. In: ‘‘Word similarity datasets for Indian languages: Annotation and baseline systems. Association for Computational Linguistics, Valencia, Spain, pp. 91–94.
Mishra, B. K., & Jain, S. (2023). An Innovative Method for Hindi Word Sense Disambiguation. SN Computer Science, 4(6), 704.
P. Jha, S. Agarwal, A. Abbas and T. Siddiqui, "Comparative Analysis of Path-based Similarity Measures for Word Sense Disambiguation," 2023 3rd International conference on Artificial Intelligence and Signal Processing (AISP), VIJAYAWADA, India, 2023, pp. 1-5, doi: 10.1109/AISP57993.2023.10134960.
Ritesh Panjwani, Diptesh Kanojia, and Pushpak Bhattacharyya, pyiwn: A Python-based API to access Indian Language WordNets, Global WordNet Conference (GWC 2018), January 2018.
Emilie Aussant. Sanskrit Theories on Homonymy and Polysemy . Bulletin d’Études Indiennes, 2014, Les études sur les langues indiennes. Leur contribution à l’histoire des idées linguistiques et à la linguistique contemporaine, 32. ffhalshs-01502381f
Downloads
Published
How to Cite
Issue
Section
License

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
All papers should be submitted electronically. All submitted manuscripts must be original work that is not under submission at another journal or under consideration for publication in another form, such as a monograph or chapter of a book. Authors of submitted papers are obligated not to submit their paper for publication elsewhere until an editorial decision is rendered on their submission. Further, authors of accepted papers are prohibited from publishing the results in other publications that appear before the paper is published in the Journal unless they receive approval for doing so from the Editor-In-Chief.
IJISAE open access articles are licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. This license lets the audience to give appropriate credit, provide a link to the license, and indicate if changes were made and if they remix, transform, or build upon the material, they must distribute contributions under the same license as the original.