Adversarial Attacks on BERT-Based Fake News Detection Models

EasyChair Preprint 14949

15 pages•Date: September 20, 2024

Abstract

The rise of fake news poses significant challenges to the integrity of information dissemination, necessitating robust detection mechanisms. BERT (Bidirectional Encoder Representations from Transformers), a state-of-the-art model in natural language processing, has shown promising results in identifying fake news. However, its susceptibility to adversarial attacks—deliberate perturbations designed to mislead models—raises concerns about its reliability. This paper explores the vulnerabilities of BERT-based fake news detection models to various adversarial attacks, including textual perturbations and gradient-based methods. We examine the impact of these attacks on model performance, highlighting a significant reduction in detection accuracy. Furthermore, we discuss potential defenses, such as adversarial training, input transformation techniques, and the development of more robust model variants. By addressing these adversarial challenges, we aim to enhance the resilience of fake news detection systems, ensuring more reliable and trustworthy automated news verification. This study underscores the necessity for ongoing research and innovation to fortify NLP models against adversarial threats in real-world applications.

Keyphrases: Fake News Detection, Gradient-based attacks, Paraphrasing Attacks, Textual Perturbations, adversarial attacks

Links:

https://easychair.org/publications/preprint/GQsh

BibTeX entry

BibTeX does not have the right entry for preprints. This is a hack for producing the correct reference:

@booklet{EasyChair:14949,
  author    = {Ayuns Luz and Edwin Frank},
  title     = {Adversarial Attacks on BERT-Based Fake News Detection Models},
  howpublished = {EasyChair Preprint 14949},
  year      = {EasyChair, 2024}}

Download PDF Open PDF in browser