Comparative Analysis of Lexical Chain, Bidirectional Encoder Representations from Transformers (BERT) and Graph based Approaches for Condensation of Low Resource Language Documents
Keywords:
Automatic Summarization, BERT, Extractive Summarization, Graph based approach, Low Resource Languages, Lexical ChainAbstract
Humans use language as their primary and exclusive means of communication. There are around 7000 different languages spoken in the world. Among them, Low Resource Languages (LRL) are ones that do not have the linguistic resources required to create statistical NLP applications. The most common way that people express and store their thoughts is through writing. Technological developments are making the world smaller by making distant communication more accessible. Owing to the rise in internet usage, fresh textual content is created every second. Not all of the information in this text is helpful. In light of this, document condensation or summarization is becoming a more important responsibility. There are two methods for creating summaries: extractive and abstractive. While essential phrases and sentences from the original document are kept in an extractive summary, an abstractive summary is created by reworking the main sentences. When it comes to LRL materials, summarizing becomes more difficult. The studies for condensation or automatic summarization of LRL documents using BERT, lexical chain and Graph based approach are the main topic of this study.
Downloads
References
Deshpande, Pranjali, and Sunita Jahirabadkar. "A Survey on Statistical Approaches for Abstractive Summarization of Low Resource Language Documents." Smart Trends in Computing and Communications. Springer, Singapore, 2022. 729-738.
Barzilay, Regina, and Michael Elhadad. "Using lexical chains for text summarization." Advances in automatic text summarization (1999): 111-121.
P. Deshpande and S. Jahirabadkar, "Study of Low Resource Language Document Extractive Summarization using Lexical chain and Bidirectional Encoder Representations from Transformers (BERT)," International Conference on Computational Performance Evaluation (ComPE), 2021, pp. 457-461.
MarathiWordNet: https://www.cfilt.iitb.ac.in/~wordnetbeta/marathiwn/wn.php
www.maharastnayak.in, an initiative by Vivek Magazine.
Devlin, Jacob, Chang, Ming-Wei, Lee, Kenton and Toutanova, Kristina. "BERT: Pre- training of Deep Bidirectional Transformers for Language Understanding.” Paper Presented at the meeting of the NAACL-HLT (1), 2019.
Vaswani, Ashish, Shazeer, Noam, Parmar, Niki, Uszkoreit, Jakob, Jones, Llion, Gomez, Aidan N., Kaiser, Lukasz and Polosukhin, Illia “Attention Is All You Need”. (2017). cite arxiv: 1706.03762
EI-Kassas, Wafaa S., et al. "EdgeSumm: Graph-based framework for automatic text summarization." Information Processing & Management 57.6 (2020): 102264.
Brants, Thorsten. "TnT-a statistical part-of-speech tagger." arXiv preprint cs/0003055 (2000).
www.nltk.org
Mandelbaum, Amit, and Adi Shalev. "Word embeddings and their use in sentence classification tasks." arXiv preprint arXiv:1610.08229 (2016).
M. M. Haider, M. A. Hossin, H. R. Mahi and H. Arif, "Automatic Text Summarization Using Gensim Word2Vec and K-Means Clustering Algorithm," 2020 IEEE Region 10 Symposium (TENSYMP), 2020, pp. 283-286, doi: 10.1109/TENSYMP50017.2020.9230670.
Downloads
Published
How to Cite
Issue
Section
License

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
All papers should be submitted electronically. All submitted manuscripts must be original work that is not under submission at another journal or under consideration for publication in another form, such as a monograph or chapter of a book. Authors of submitted papers are obligated not to submit their paper for publication elsewhere until an editorial decision is rendered on their submission. Further, authors of accepted papers are prohibited from publishing the results in other publications that appear before the paper is published in the Journal unless they receive approval for doing so from the Editor-In-Chief.
IJISAE open access articles are licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. This license lets the audience to give appropriate credit, provide a link to the license, and indicate if changes were made and if they remix, transform, or build upon the material, they must distribute contributions under the same license as the original.