Faster Model Improvements through Weakly Supervised Labels

Authors

  • Ashish Bansal

Keywords:

machine learning, weakly supervised learning, supervised learning, NLP,

Abstract

Deep neural networks are becoming omnipresent in natural language applications (NLP). How- ever, they require large amounts of labeled training data, which is often only available for English. This is a big challenge for many languages and domains where labeled data is limited. In recent years, a variety of methods have been proposed to tackle this situation. This paper gives an overview of these approaches that help you train NLP models in resource-lean scenarios. This includes both ideas to increase the amount of labeled data as well as methods following the popular pre-train and fine-tune paradigm.

Supervised learning techniques construct predictive models by learning from a large number of training examples, where each training example has a label indicating its ground-truth output. Though current techniques have achieved great success, it is noteworthy that in many tasks it is difficult to get strong supervision information like fully ground-truth labels due to the high cost of the data-labeling process. Thus, it is desirable for machine-learning techniques to work with weak supervision.

This paper outlines the advantages of weakly supervised learning in collecting more robust data fastly and using less resource, focusing on three typical types of weak supervision: incomplete supervision, where only a subset of training data is given with labels; inexact supervision, where the training data are given with only coarse-grained labels; and inaccurate supervision, where the given labels are not always ground-truth.

The main focus will be on the weak supervision technique where we will explain how a smaller dataset is used to train a classifier model and then that model is used to label the new data having weak labels which might be accurately predicting those labels to some extent. This method in- volves human-in-loop where human would reviews those predicted labels and correct the wrong predictions which create an additional data points to train a new weak labeler model. Using this technique iteratively it helped researchers in creating more ground truth data that can be used to train better performing models very fast.

Downloads

Download data is not yet available.

References

Alexander Ratner, Stephen H. Bach, Henry Ehrenberg, Jason Fries, Sen Wu, and Christopher Re. 2017. ´ Snorkel: Rapid training data creation with weak supervision. Proc. VLDB Endow., 11(3):269–282.

Pierre Lison, Jeremy Barnes, AliaksandrHubin, and SamiaTouileb. 2020. Named entity recognition without labelled data: A weak supervision approach. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 1518–1533, Online. Association for Computational Linguistics.

Esteban Safranchik, Shiying Luo, and Stephen Bach. 2020b. Weakly supervised sequence tagging from noisy rules. Proceedings of the AAAI Conference on Artificial Intelligence, 34(04):5570–5578.

Daniel Y. Fu, Mayee F. Chen, Frederic Sala, Sarah M. Hooper, KayvonFatahalian, and Christo- pher Re.´ 2020. Fast and three-rious: Speeding up weak supervision with triplet methods. In Proceedings of the 37th International Conference on Machine Learning (ICML 2020).

Mike Mintz, Steven Bills, Rion Snow, and Daniel Jurafsky. 2009. Distant supervision for re- lation extraction without labeled data. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, pages 1003–1011, Suntec, Singapore. Association for Computational Linguistics.

Alan Ritter, Luke Zettlemoyer, Mausam, and Oren Etzioni. 2013. Modeling missing data in distant supervision for information extraction. Transactions of the Association for Computa- tional Linguistics, 1:367–378.

Jingbo Shang, Liyuan Liu, Xiaotao Gu, Xiang Ren, Teng Ren, and Jiawei Han. 2018. Learning named entity tagger using domain-specific dictionary. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 2054–2064, Brussels, Belgium. Association for Computational Linguistics.

Hyun-Chul Kim and ZoubinGhahramani. 2012. Bayesian classifier combination. In Proceed- ings of the Fifteenth International Conference on Artificial Intelligence and Statistics, volume 22 of Proceedings of Machine Learning Research, pages 619–627, La Palma, Canary Islands. PMLR.

Dirk Hovy, Taylor Berg-Kirkpatrick, Ashish Vaswani, and Eduard Hovy. 2013. Learning whom to trust with MACE. In Proceedings of the 2013 Conference of the North American Chap- ter of the Association for Computational Linguistics: Human Language Technologies, pages 1120–1130, Atlanta, Georgia. Association for Computational Linguistics.

An Thanh Nguyen, Byron Wallace, Junyi Jessy Li, Ani Nenkova, and Matthew Lease. 2017. Aggregating and predicting sequence labels from crowd annotations. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Pa- pers), pages 299–309, Vancouver, Canada. Association for Computational Linguistics.

Jason Fries, Sen Wu, Alex Ratner, and Christopher Re.´ 2017. Swellshark: A generative model for biomedical named entity recognition without labeled data.

Omer Sagi and LiorRokach. 2018. Ensemble learning: A survey. WIREs Data Mining and Knowledge Discovery, 8(4):e1249.

Clarke, James, Dan Goldwasser, Ming-Wei Chang, and Dan Roth. 2010. “Driving Semantic Parsing from the World’s Response.” In Proceedings of the Fourteenth Conference on Com- putational Natural Language Learning, 18–27. Association for Computational Linguistics.

Guu, Kelvin, PanupongPasupat, Evan Zheran Liu, and Percy Liang. 2017. “From Language to Programs: Bridging Reinforcement Learning and Maximum Marginal Likelihood.” arXiv Preprint arXiv:1704.07926.

Stewart, Russell, and Stefano Ermon. 2017. “Label-Free Supervision of Neural Networks with Physics and Domain Knowledge.” In AAAI, 2576–82.

Takamatsu, Shingo, Issei Sato, and Hiroshi Nakagawa. 2012. “Reducing Wrong Labels in Dis- tant Supervision for Relation Extraction.” In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers-Volume 1, 721–29. Association for Computational Linguistics.

Tibshirani, Robert. 1996. “Regression Shrinkage and Selection via the Lasso.” Journal of the Royal Statistical Society. Series B (Methodological). JSTOR, 267–88.

Varma, Paroma, Rose Yu, Dan Iter, Christopher De Sa, and Christopher Ré. 2016. “Socratic Learning: Empowering the Generative Model.” arXiv Preprint arXiv:1610.08123.

Xiao, Tong, Tian Xia, Yi Yang, Chang Huang, and Xiaogang Wang. 2015. “Learning from Mas- sive Noisy Labeled Data for Image Classification.” In Proceedings of the Ieee Conference on Computer Vision and Pattern Recognition, 2691–9.

Zaidan, Omar F, and Jason Eisner. 2008. “Modeling Annotators: A Generative Approach to Learning from Annotator Rationales.” In Proceedings of the Conference on Empirical Meth- ods in Natural Language Processing, 31–40. Association for Computational Linguistics.

Downloads

Published

12.06.2024

How to Cite

Ashish Bansal. (2024). Faster Model Improvements through Weakly Supervised Labels. International Journal of Intelligent Systems and Applications in Engineering, 12(4), 4806–4810. Retrieved from https://www.ijisae.org/index.php/IJISAE/article/view/7186

Issue

Section

Research Article