Named Entity Recognition Driven Synthesis of IT Job Descriptions in Morocco: A Comparative Analysis of BERT and BiLSTM Models

Authors

  • Zineb Elkaimbillah, Zineb Mcharfi, Mohamed Khoual, Bouchra El Asri

Keywords:

BERT, BiLSTM, Information Technology, Job descriptions, Named Entity Recognition, Summarization.

Abstract

The information technology (IT) sector, characterized by its dynamism and diversity, represents a major challenge for jobseekers and recruiters alike, who have to navigate through massive lists of job offers to extract relevant information. This article proposes a new approach to meeting this challenge by integrating Named Entity Recognition (NER) into the synthesis of job descriptions in the IT domain. This exploration in the IT sector offers a significant contribution to the optimization of job search processes and recruitment strategies specific to this sector. Our approach, which includes the conceptualization, data preparation and training of BERT (Bidirectional Encoder Representations from Transformers) and BiLSTM (Bi-directional Long Short-Term Memory) models, enables us to compare the performance of two NER models through in-depth evaluation. The originality of our approach lies in the use of Named Entity Recognition (NER) as the cornerstone of automatic synthesis. By harnessing the power of NER, we simplify and streamline the process of efficiently extracting crucial information such as organizations, locations and job titles. The results underline the transformative potential of NER in improving the accessibility and comprehensibility of complex information contained in job advertisements in the IT sector. By automating the extraction of relevant entities such as job titles, skills required, company names, work locations, responsibilities requested, technical and non-technical skills, diplomas and years of experience required, we facilitate the job search process. Our evaluations show that BERT models outperform BiILSTM models in terms of accuracy and performance in named entity recognition, demonstrating their superiority for this specific task.

Downloads

Download data is not yet available.

References

Nadkarni, P. M., Ohno-Machado, L., & Chapman, W. W. (2011). Natural language processing: an introduction. Journal of the American Medical Informatics Association, 18(5), 544-551.

Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.

Chiu, J. P., & Nichols, E. (2016). Named entity recognition with bidirectional LSTM-CNNs. Transactions of the association for computational linguistics, 4, 357-370.

Darji, H., Mitrović, J., & Granitzer, M. (2023). German BERT model for legal named entity recognition. arXiv preprint arXiv:2303.05388.

Zhang, Y., & Zhang, H. (2023). FinBERT–MRC: Financial Named Entity Recognition Using BERT Under the Machine Reading Comprehension Paradigm. Neural Processing Letters, 1-21

Srivastava, S., Paul, B., & Gupta, D. (2023). Study of Word Embeddings for Enhanced Cyber Security Named Entity Recognition. Procedia Computer Science, 218, 449-460

An, Q., Pan, B., Liu, Z., Du, S., & Cui, Y. (2023). Chinese Named Entity Recognition in Football Based on ALBERT-Bilstm Model. Applied Sciences, 13(19), 10814.

Veena, G., Kanjirangat, V., & Gupta, D. (2023). AGRONER: An unsupervised agriculture named entity recognition using weighted distributional semantic model. Expert Systems with Applications, 229, 120440.

Novo, A. S., & Gedikli, F. (2023, February). Explaining BERT model decisions for near-duplicate news article detection based on named entity recognition. In 2023 IEEE 17th International Conference on Semantic Computing (ICSC) (pp. 278-281). IEEE.

Shen, H., Cao, H., Sun, G., & Chen, D. (2023). Research on Chinese Semantic Named Entity Recognition in Marine Engine Room Systems Based on BERT. Journal of Marine Science and Engineering, 11(7), 1266.

Yuan, T., Qin, X., & Wei, C. (2023). A Chinese Named Entity Recognition Method Based on ERNIE-Bilstm-CRF for Food Safety Domain. Applied Sciences, 13(5), 2849.

Çetindağ, C., Yazıcıoğlu, B., & Koç, A. (2023). Named-entity recognition in Turkish legal texts. Natural Language Engineering, 29(3), 615-642.

Leng, T., Altenbek, G., Ma, Y., & Haisa, G. (2023, October). Tourism named entity recognition method based on knowledge enhancement. In Fifth International Conference on Artificial Intelligence and Computer Science (AICS 2023) (Vol. 12803, pp. 782-789). SPIE.

Fareri, Silvia, et al. "SkillNER: Mining and mapping soft skills from any text." Expert Systems with Applications 184 (2021): 115544.

Kesim, E., & Deliahmetoglu, A. (2023). Named entity recognition in resumes. arXiv preprint arXiv:2306.13062.

Liu, J., Ng, Y. C., Gui, Z., Singhal, T., Blessing, L. T., Wood, K. L., & Lim, K. H. (2022). Title2Vec: A contextual job title embedding for occupational named entity recognition and other applications. Journal of Big Data, 9(1), 99.

Dobreva, J., Jofche, N., Jovanovik, M., & Trajanov, D. (2020). Improving NER performance by applying text summarization on pharmaceutical articles. In ICT Innovations 2020. Machine Learning and Applications: 12th International Conference, ICT Innovations 2020, Skopje, North Macedonia, September 24–26, 2020, Proceedings 12 (pp. 87-97). Springer International Publishing.

Kouris, P., Alexandridis, G., & Stafylopatis, A. (2021). Abstractive text summarization: Enhancing sequence-to-sequence models using word sense disambiguation and semantic content generalization. Computational Linguistics, 47(4), 813-859.

Marek, P., Müller, Š., Konrád, J., Lorenc, P., Pichl, J., & Šedivý, J. (2021). Text summarization of czech news articles using named entities. arXiv preprint arXiv:2104.10454.

Senthamizh, S. R., & Arutchelvan, K. (2022). Automatic text summarization using document clustering named entity recognition. International Journal of Advanced Computer Science and Applications, 13(9).

Diab, M. (2009, April). Second generation AMIRA tools for Arabic processing: Fast and robust tokenization, POS tagging, and base phrase chunking. In 2nd international conference on Arabic language resources and tools (Vol. 110, p. 198).

Elkaimbillah, Z., El Asri, B., Mikram, M., & Rhanoui, M. (2023). Construction of an Ontology-based Document Collection for the IT Job Offer in Morocco. International Journal of Advanced Computer Science and Applications, 14(7).

Roy, Bipraneel, and Hon Cheung. "A deep learning approach for intrusion detection in internet of things using bi-directional long short-term memory recurrent neural network." 2018 28th international telecommunication networks and applications conference (ITNAC). IEEE, 2018.

Hou, Linlin, et al. "Method and dataset entity mining in scientific literature: a CNN+ Bilstm model with self-attention." Knowledge-Based Systems 235 (2022): 107621.

An, Y., Xia, X., Chen, X., Wu, F. X., & Wang, J. (2022). Chinese clinical named entity recognition via multi-head self-attention based Bilstm-CRF. Artificial Intelligence in Medicine, 127, 102282.

Downloads

Published

22.08.2024

How to Cite

Zineb Elkaimbillah. (2024). Named Entity Recognition Driven Synthesis of IT Job Descriptions in Morocco: A Comparative Analysis of BERT and BiLSTM Models. International Journal of Intelligent Systems and Applications in Engineering, 12(4), 2859 –. Retrieved from https://www.ijisae.org/index.php/IJISAE/article/view/6769

Issue

Section

Research Article