Ina-SASet: Dataset for Developing Indonesian Sentiment Lexicon for Extracting Consumer Preference Based on Fine Grained Sentiment Analysis Technique

Authors

  • Bagus Setya Rintyarna, Wiwik Suharso, Abadi Sanosra, Ika Safitri Windiarti

Keywords:

Sentiment analysis; Lexicon; Consumer preference, Twitter.

Abstract

We collect and build a database called Ina-SASet, i.e.: dataset for developing Indonesian sentiment Lexicon for extracting consumer Preference based on fine grained sentiment analysis technique. The dataset was gathered from Twitter from January 1st 2022 to December 31st 2022, since Twitter is a popular microblogging platform for obtaining huge repository. We used Twitter API by employing several related keywords to crawl the dataset from Twitter. As many as 121.327 was successfully grabbed from Twitter platform. We then text pre-processed the dataset by 1) splitting the dataset into sentence, 2) applying tokenizing to the split sentences, 3) removing stop word by using online stop word collection, 3) stemming the token by matching them with Kamus Besar Bahasa Indonesia (KBBI) online repository. The stemmed token was then saved into MySQL database. The creation of an Indonesian sentiment lexicon resource pertaining to consumer preference extraction would be aided by this endeavour.

Downloads

Download data is not yet available.

References

B. S. Rintyarna, R. Sarno, and C. Fatichah, “Evaluating the performance of sentence level features and domain sensitive features of product reviews on supervised sentiment analysis tasks,” J. Big Data, vol. 6, no. 1, 2020, doi: 10.1186/s40537-019-0246-8.

A. J. Najafabadi, A. Skryzhadlovska, and O. F. Valilai, “Agile Product Development by Prediction of Consumers’ Behaviour; using Neurobehavioral and Social Media Sentiment Analysis Approaches,” Procedia Comput. Sci., vol. 232, no. 2023, pp. 1683–1693, 2024, doi: 10.1016/j.procs.2024.01.166.

J. Chi, “Explaining US travel behavior with perceived threat of pandemic, consumer sentiment, and economic policy uncertainty,” Transp. Policy, vol. 137, no. July 2022, pp. 90–99, 2023, doi: 10.1016/j.tranpol.2023.04.019.

T. Anderson, S. Sarkar, and R. Kelley, “Analyzing public sentiment on sustainability : A comprehensive review and application of sentiment analysis techniques,” Nat. Lang. Process. J., vol. 8, no. August, p. 100097, 2024, doi: 10.1016/j.nlp.2024.100097.

A. Daza, N. D. González Rueda, M. S. Aguilar Sánchez, W. F. Robles Espíritu, and M. E. Chauca Quiñones, “Sentiment Analysis on E-Commerce Product Reviews Using Machine Learning and Deep Learning Algorithms: A Bibliometric Analysisand Systematic Literature Review, Challenges and Future Works,” Int. J. Inf. Manag. Data Insights, vol. 4, no. 2, 2024, doi: 10.1016/j.jjimei.2024.100267.

H. Adiningtyas and A. S. Auliani, “Sentiment analysis for mobile banking service quality measurement,” Procedia Comput. Sci., vol. 234, pp. 40–50, 2024, doi: 10.1016/j.procs.2024.02.150.

O. Alsemaree, A. S. Alam, S. S. Gill, and S. Uhlig, “Sentiment analysis of Arabic social media texts: A machine learning approach to deciphering customer perceptions,” Heliyon, vol. 10, no. 9, p. e27863, 2024, doi: 10.1016/j.heliyon.2024.e27863.

N. Ortelli, M. de Lapparent, and M. Bierlaire, “Resampling estimation of discrete choice models,” J. Choice Model., vol. 50, no. January, p. 100467, 2024, doi: 10.1016/j.jocm.2023.100467.

K. Lee, J. Kim, J. Kwon, and J. Yeo, “Maritime supply chain risk sentiment and the korea trade volume: A news big-data analysis perspective,” Asian J. Shipp. Logist., vol. 40, no. 1, pp. 42–51, 2024, doi: 10.1016/j.ajsl.2024.01.001.

M. R. A. Rashid, K. F. Hasan, R. Hasan, A. Das, M. Sultana, and M. Hasan, “A comprehensive dataset for sentiment and emotion classification from Bangladesh e-commerce reviews,” Data Br., vol. 53, 2024, doi: 10.1016/j.dib.2024.110052.

T. D. Platform, “Developer Agreement and Policy, (Accessed Aug 2023)”.

P. Krugman and R. Wells, “»Consumer Preferences and Consumer Choice,” Economics, pp. 253–280, 2006, doi: 10.1007/978-1-349-91968-0_12.

B. S. Rintyarna, H. Kuswanto, R. Sarno, and E. K. Rachmaningsih, “Modelling Service Quality of Internet Service Providers during COVID-19 : The Customer Perspective Based on Twitter Dataset,” pp. 1–12, 2022.

D. Bui, L. Dräger, B. Hayo, and G. Nghiem, “Macroeconomic expectations and consumer sentiment during the COVID-19 pandemic: The role of others’ beliefs,” Eur. J. Polit. Econ., vol. 77, no. September 2022, p. 102295, 2023, doi: 10.1016/j.ejpoleco.2022.102295.

P. Balart, “Semiorder preferences and price-oriented buyers in a hotelling model,” J. Econ. Behav. Organ., vol. 188, pp. 394–407, 2021, doi: 10.1016/j.jebo.2021.05.015.

B. S. Rintyarna, R. Setyaningtyas, and I. S. Windiarti, “Assessment of Technology Acceptance Model of Water Quality Monitoring Technology Application on Rural Areas of Jember Regency,” vol. 5, no. 2, pp. 1–7, 2023.

J. Zhang, A. Zhang, D. Liu, and Y. Bian, “Customer preferences extraction for air purifiers based on fine-grained sentiment analysis of online reviews,” Knowledge-Based Syst., vol. 228, p. 107259, 2021, doi: 10.1016/j.knosys.2021.107259.

Y. Chen, “ScienceDirect ScienceDirect support new energy automobile purchase decision Management Consumer preference disaggregation based on online reviews to Consumer preference disaggregation based online support new energy automobile purchase decision,” Procedia Comput. Sci., vol. 221, pp. 1485–1492, 2023, doi: 10.1016/j.procs.2023.08.013.

B. S. Rintyarna, R. Sarno, and C. Fatichah, “Enhancing the performance of sentiment analysis task on product reviews by handling both local and global context,” Int. J. Inf. Decis. Sci., vol. 11, no. xxxx, 2018.

M. Liao, “Identification of a rational inattention discrete choice model,” J. Econom., vol. 240, no. 1, p. 105670, 2024, doi: 10.1016/j.jeconom.2024.105670.

B. S. Rintyarna, “Mapping acceptance of Indonesian organic food consumption under Covid-19 pandemic using Sentiment Analysis of Twitter dataset,” J. Theor. Appl. Inf. Technol., vol. 99, no. 5, pp. 1009–1019, 2021.

W. Lin and L. C. Liao, “Lexicon-based prompt for financial dimensional sentiment analysis,” Expert Syst. Appl., vol. 244, no. September 2022, p. 122936, 2024, doi: 10.1016/j.eswa.2023.122936.

S. Zervoudakis, E. Marakakis, H. Kondylakis, and S. Goumas, “OpinionMine: A Bayesian-based framework for opinion mining using Twitter Data,” Mach. Learn. with Appl., vol. 3, no. September 2020, p. 100018, 2021, doi: 10.1016/j.mlwa.2020.100018.

B. S. Rintyarna, R. Sarno, and C. Fatichah, “Semantic Features for Optimizing Supervised Approach of Sentiment Analysis on Product Reviews,” MDPI Comput., vol. 8, no. 3, pp. 1–16, 2019.

M. N. Hoque, U. Salma, M. J. Uddin, M. M. Ahamad, and S. Aktar, “Exploring transformer models in the sentiment analysis task for the under-resource Bengali language,” Nat. Lang. Process. J., vol. 8, no. June, p. 100091, 2024, doi: 10.1016/j.nlp.2024.100091.

I. B. K. Manuaba, “A Sentiment Analysis Model for the COVID-19 Vaccine in Indonesia Using Twitter API v2, TextBlob, and Googletrans,” Procedia Comput. Sci., vol. 227, pp. 1101–1110, 2023, doi: 10.1016/j.procs.2023.10.621.

J. Asian, B. Nazief, and H. Williams, “Stemming Indonesian : A confix-stripping approach . Stemming Indonesian : A Confi x-Stripping Approach,” no. January, 2007, doi: 10.1145/1316457.1316459.

J. R. Jim, M. A. R. Talukder, P. Malakar, M. M. Kabir, K. Nur, and M. F. Mridha, “Recent advancements and challenges of NLP-based sentiment analysis: A state-of-the-art review,” Nat. Lang. Process. J., vol. 6, no. February, p. 100059, 2024, doi: 10.1016/j.nlp.2024.100059.

Supriyono, A. P. Wibawa, Suyono, and F. Kurniawan, “A survey of text summarization: Techniques, evaluation and challenges,” Nat. Lang. Process. J., vol. 7, no. October 2023, p. 100070, 2024, doi: 10.1016/j.nlp.2024.100070.

M. Siino, I. Tinnirello, and M. La Cascia, “Is text preprocessing still worth the time? A comparative survey on the influence of popular preprocessing methods on Transformers and traditional classifiers,” Inf. Syst., vol. 121, no. March 2023, p. 102342, 2024, doi: 10.1016/j.is.2023.102342.

Downloads

Published

06.08.2024

How to Cite

Bagus Setya Rintyarna. (2024). Ina-SASet: Dataset for Developing Indonesian Sentiment Lexicon for Extracting Consumer Preference Based on Fine Grained Sentiment Analysis Technique. International Journal of Intelligent Systems and Applications in Engineering, 12(23s), 295–300. Retrieved from https://www.ijisae.org/index.php/IJISAE/article/view/6739

Issue

Section

Research Article