Customer Gender Prediction Model based on E-Commerce Data
Keywords:
Data analytics, Machine learning, Predictive modelling, Decision-making, Advanced analytics, Data management, Data forecast, Business Value, Big dataAbstract
Demographic data of Customers such as gender, age, etc. provide valuable information for marketing and personalization of web applications for e-commerce service providers. However, online consumers often do not supply this sort of information because of privacy concerns and other safety reasons. In this article, we propose a method for gender prediction based on their catalogue browsing data on e-commerce systems, including the date and time of access, list of categories and items displayed, etc. We use a machine learning approach and explore many features derived from catalogue viewing data to predict the sex of viewers. Experiments on PAKDD Data Mining Competition datasets were carried out and the successful results were achieved. Results 92.3% for balanced accuracy and 91.2% for macro F1 indicate that fundamental features such as time viewing, product/category features used in combination with advanced features such as product/category sequence and transfers effectively promote customers' prediction.
Downloads
References
S. Argamon, J. Good, M. Koppel, A. Shimoni, "Gender, genre, and style of writing in formal texts," Text 23(3), August 2003.
S. Argamon, J. Pennebaker, M. Koppel, J. Schler, "Profiling an anonymous text automatically," ACM Communications, v.52 n.2, February 2009.
J. C. A. Culotta, R. N. Kumar, J. Cutler, "The Demographics of website traffic data users on Twitter," Proceedings of 29th AAAI Artificial Intelligence Conference, Jan 2015.
O. A. Anderson, de Vel, M. Corney, and G. M. Mohay, "Mining forensic authors' email material," SIGMOD Record 30(4), p. 55-64, 2001.
And. Y. Yang, J. Tang, Y. Yang, N. V. Chawla, "Inferring demographics of consumers and networking tactics in mobile social networks." ACM. ACM. 15–24, 2014. 2014.
D. T. Duc, T. Duc, P. B. Hanh, In: Recent Advances in Intelligent Information and Database Systems, pp. 286–295: "Using content based features for profiling Vietnamese forum articles" Berlin, 2016, Springer International Publishing.
J. Hu, H. J. Zeng, H. Li, C. Niu, C. Chen, "Demographic forecast based on consumer surfing behaviour," 16th International World Wide Web Conference Proceedings, pp. 151-160, 2007.
F. M. Debbabi, Iqbal, B. C. M. Fung, L. A. Khan, 'Verification of forensical inquiry by email authorship,' Proceedings for the 2010 ACM Applied Computing Symposium, ser. SAC '10. SAC' New York, NY, United States: ACM, pp. 1591-1598.
S. Kabbur, E. H. Han, G. Kabbur. Karypis, "Website-based predicting demographic attributes content based approaches," ICDM Proceedings, pp. 863-868, 2010.
M. S. Argamon and Koppel, and A. R. Shimoni, "Automatic gender categorization of written texts," Literary and Linguistic Computing, 17(4), pp: 401-412, 2002.
S. Kotsiantis, D. and Kanellopoulos, P. The GESTS International Transactions on Computer Science and Engineering 30(1), pp. 25-36, 2006, "Making Unbalanced Datasets: A Study."
C. X. Ling, V. S. Sheng, "Cost-sensitive learning and the problem of class imbalance." In: Sammut C(ed) Machine Learning Encyclopaedia. Berlin, Springer, 2008.
M. And A, Pennachiotti. M. Popescu, "A Twitter user classification machine learning solution." AAAI prosecutions, 2011.
T. M. Phuong, D. V. Phuong, "Gender history prediction," KSE 2013 Fifth International Conference Proceedings, volume 1. 271-283, 2013. 2013.
D. R. Gravel, D. Trieschnigg, and T. R. Gravel. Meder: "How old do you think I am? language and age research in twitter," Seventh International AAAI Weblog and Social Media Conference 2013.
F. Rangel and P. Rosso, "Use of language and profiling by authors: gender and age identity," in natural language processing and cognitive sciences, p. 177, 2013.
J. Schler, M. Koppel, S. Argamon, and J. Pennebaker, In Proceedings of the Weblog Computational Approaches Symposium of the AAAI Spring, pp. 191-197,2006.
R. E. Schapire, 'Machine Learning Enhancement: Summary,' Proc. MSRI Non-linear Estimation and Classification Workshop, 2001.
J J. J. C. Ying, Y. J. Chang, C. M. Huang, and V. S. Tseng, In Nokia Mobile Data Challenge, 'demographic forecast based on mobile users,' 2012.
C. Zhang, and P. Zhang, "Predicting gender from blog posts," Technical Paper, Massachusetts University Amherst, USA, 2010.
Downloads
Published
How to Cite
Issue
Section
License

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
All papers should be submitted electronically. All submitted manuscripts must be original work that is not under submission at another journal or under consideration for publication in another form, such as a monograph or chapter of a book. Authors of submitted papers are obligated not to submit their paper for publication elsewhere until an editorial decision is rendered on their submission. Further, authors of accepted papers are prohibited from publishing the results in other publications that appear before the paper is published in the Journal unless they receive approval for doing so from the Editor-In-Chief.
IJISAE open access articles are licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. This license lets the audience to give appropriate credit, provide a link to the license, and indicate if changes were made and if they remix, transform, or build upon the material, they must distribute contributions under the same license as the original.