Designing of a Novel Framework for Marathi Natural Language Processing: MR-LIWC2015
Keywords:
English LIWC, LIWC, Marathi, Marathi LIWC, Marathi translation, NLP, Natural language processing, Sentiment analysis, translation, translation procedure, translation processAbstract
The role of linguistic analysis in understanding human behaviour, emotions, and psychological states has gained significant prominence in various domains, including psychology, social sciences, and computational linguistics. The Linguistic Inquiry and Word Count (LIWC) is a widely used tool, developed by American social psychologist James W. Pennebaker and team of the University of Texas, Austin, enables automated linguistic analysis of text. This analysis provides insights into psychological and emotional dimensions. However, its applicability has been mainly restricted to English and a few other languages, limiting its usage in multilingual contexts. Originally developed in English, it has been adapted to several other languages like German, Dutch, Spanish, Chinese, Turkish, French, etc. However, this tool is not yet available for Marathi language- a major language spoken by people of Maharashtra, India. This paper presents a novel framework for the development and evaluation of a Marathi translation of the LIWC dictionary, aiming to expand its utility to the Marathi speaking population. The development process of Marathi version of LIWC is based on English LIWC-2015. The work is unique since it is the first LIWC translation for any Indian language. The development of Marathi version of LIWC includes several steps like initial translation and wildcard(*) expansion, dictionary expansion , linguistic analysis , wordlist development ,cultural adaptation ,wordlist validation process , refinement phase , equivalence research, addition of summary variables and wrap-up final dictionary in official LIWC format. The evaluation of the Marathi LIWC is conducted on a diverse dataset of Marathi text samples, encompassing social media posts, speech transcripts, blogs, short stories and book summaries. The performance of the translated dictionary is assessed based on its ability to accurately capture linguistic features, emotional tones, and psychological constructs present in the Marathi language. To evaluate the effectiveness of the Marathi LIWC, a diverse dataset of Marathi texts was analyzed using both the original English LIWC and the newly developed Marathi LIWC. The results of the evaluation demonstrate that the Marathi LIWC maintains its alignment with the original LIWC's underlying linguistic and psychological dimensions while catering to the specifics of the Marathi language. The translated dictionary exhibited promising reliability and validity in capturing linguistic and psychological features within Marathi texts.
Downloads
References
Pennebaker, J. W., Francis, M. E., & Booth, R. J., “Linguistic inquiry and word count: LIWC 2001”, Mahway: Lawrence Erlbaum Associates, 71(2001), 2001.
Chung, C. K., & Pennebaker, J. W., “Linguistic inquiry and word count (LIWC): pronounced “Luke,”... and other useful facts”, In Applied natural language processing: Identification, investigation and resolution, 2012, (pp. 206-229). IGI Global.
Boyd, R. L., Ashokkumar, A., Seraj, S., & Pennebaker, J. W., “The development and psychometric properties of LIWC-22”, Austin, TX: University of Texas at Austin, 2022, 1-47.
Carvalho, F., Rodrigues, R. G., Santos, G., Cruz, P., Ferrari, L., & Guedes, G. P., “Evaluating the Brazilian Portuguese version of the 2015 LIWC Lexicon with sentiment analysis in social networks”, In Anais do VIII Brazilian Workshop on Social Network Analysis and Mining, 2019, (pp. 24-34). SBC.
Balage Filho, P., Pardo, T. A. S., & Aluísio, S., “An evaluation of the Brazilian Portuguese LIWC dictionary for sentiment analysis”, In Proceedings of the 9th Brazilian Symposium in Information and Human Language Technology,2013
Huang Jinlan, Chung, C. K., Hui, N., Lin Yizheng, Xie Yitai, Lam, B. C., ... & Pennebaker, J. W. , “The Development of the Chinese Linguistic Inquiry and Word Count Dictionary]”, Chinese Journal of Psychology, 2012, 54(2), 185-201
Boot, P., Zijlstra, H., & Geenen, R, “The Dutch translation of the linguistic inquiry and word count (LIWC) 2007 dictionary”, Dutch Journal of Applied Linguistics, 2017, 6(1), 65-76.
Van Wissen, L., & Boot, P.,”An electronic translation of the LIWC Dictionary into Dutch”, In Electronic lexicography in the 21st century: Proceedings of eLex 2017 conference (pp. 703-715). Brno: Lexical Computing., 2017
Piolat, A., Booth, R., Chung, C. K., Davids, M., & Pennebaker, J. W., “The French dictionary for LIWC: Modalities of construction and examples of use| La version franaise du dictionnaire pour le LIWC:”, modalités de construction et exemples d'utilisation, 2011
Meier, T., Boyd, R. L., Pennebaker, J. W., Mehl, M. R., Martin, M., Wolf, M., & Horn, A. B., ““LIWC auf Deutsch”: The development, psychometrics, and introduction of DE-LIWC2015”, PsyArXiv, (a)., 2019
Wolf, M., Horn, A. B., Mehl, M. R., Haug, S., Pennebaker, J. W., & Kordy, H.,”Computergestützte quantitative textanalyse: äquivalenz und robustheit der deutschen version des linguistic inquiry and word count”, Diagnostica,, 2008, 54(2), 85-98.
Agosti, A., & Rellini, A., “The Italian liwc dictionary”, Austin, TX: LIWC. Net ,2007
Igarashi, T., Okuda, S., & Sasahara, K., “Development of the Japanese Version of the Linguistic Inquiry and Word Count Dictionary 2015”, Frontiers in psychology, 2022, 13, 841534
Dudău, D. P., & Sava, F. A.,”The development and validation of the Romanian version of Linguistic Inquiry and Word Count 2015 (Ro-LIWC2015)”, Current Psychology, 2022, 41(6), 3597-3614.
Kailer, A., & Chung, C. K., “The russian liwc2007 dictionary”, Austin, TX: LIWC. net., 2011
Bjekić, J., Lazarević, L. B., Živanović, M., & Knežević, G., “Psychometric evaluation of the Serbian dictionary for automatic text analysis-LIWCser”, Psihologija, 2014, 47(1), 5-32.
Ramirez-Esparza, N., Chung, C., Kacewic, E., & Pennebaker, J. ,”The psychology of word use in depression forums in English and in Spanish: Testing two text analytic approaches”, In Proceedings of the international AAAI conference on web and social media (Vol. 2, No. 1, pp. 102-108), 2008
Zasiekin, S., “Exploring Bohdan Lepky’s Translation Ethics Using Linguistic Inquiry and Word Count”, East European Journal of Psycholinguistics, 8(2)., 2021
Popale, Lata, and Pushpak Bhattacharyya.,"Creating Marathi WordNet.", The WordNet in Indian Languages : 147-166., 2017
Falotico, R., & Quatto, P., “Fleiss’ kappa statistic without paradoxes”, Quality & Quantity, 2015, 49, 463-470.
Downloads
Published
How to Cite
Issue
Section
License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
All papers should be submitted electronically. All submitted manuscripts must be original work that is not under submission at another journal or under consideration for publication in another form, such as a monograph or chapter of a book. Authors of submitted papers are obligated not to submit their paper for publication elsewhere until an editorial decision is rendered on their submission. Further, authors of accepted papers are prohibited from publishing the results in other publications that appear before the paper is published in the Journal unless they receive approval for doing so from the Editor-In-Chief.
IJISAE open access articles are licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. This license lets the audience to give appropriate credit, provide a link to the license, and indicate if changes were made and if they remix, transform, or build upon the material, they must distribute contributions under the same license as the original.