Semi-Supervised Learning in NLP: Leveraging Large-Scale Unlabeled Data for Model Training

EasyChair Preprint 12270

8 pages•Date: February 24, 2024

Abstract

This paper explores the efficacy of leveraging large-scale unlabeled data for model training in NLP tasks. This paper explores various techniques and methodologies employed in semi-supervised learning for NLP, focusing on how large-scale unlabeled data can be effectively utilized to enhance model training. The theoretical foundations of semi-supervised learning, including methods such as self-training, co-training, and multi-view learning, are discussed, highlighting their applications and effectiveness in NLP tasks. Additionally, recent advancements in neural network architectures, such as pre-training and fine-tuning strategies, which have significantly contributed to the success of semi-supervised learning in NLP, are reviewed. Furthermore, challenges and future directions in semi-supervised learning for NLP, including scalability, domain adaptation, and robustness to noisy data, are examined.

Keyphrases: learning, semi, supervised

Links:

https://easychair.org/publications/preprint/Q3gD

BibTeX entry

BibTeX does not have the right entry for preprints. This is a hack for producing the correct reference:

@booklet{EasyChair:12270,
  author    = {Kurez Oroy and Danny Robert},
  title     = {Semi-Supervised Learning in NLP: Leveraging Large-Scale Unlabeled Data for Model Training},
  howpublished = {EasyChair Preprint 12270},
  year      = {EasyChair, 2024}}

Download PDF Open PDF in browser