Embedding Layout in Text for Document Understanding Using Large Language Models

EasyChair Preprint 12130

14 pages•Date: February 15, 2024

Mohammad Minouei, Mohammad Reza Soheili and Didier Stricker

Abstract

In this paper, we address the challenge of effectively utilizing Large Language Models (LLMs) for Visually Rich Document Understanding (VRDU), a key part of intelligent document processing systems. While LLMs excel in various Natural Language Processing (NLP) tasks, their application for extracting information from complex structured documents like invoices and forms is limited. This limitation arises from the difficulty in contextually understanding these documents, largely due to the lack of layout information. Our research is dedicated to unlocking the full potential of LLMs for VRDU by integrating OCR data into an HTML format, which preserves the essential spatial layout for accurate information extraction. The empirical results show a notable improvement, with a more than 20 percent increase over baseline performances. This research highlights the promising potential of LLMs in VRDU and sets the stage for further innovations in automated document processing.

Keyphrases: Information Extraction, Large Language Model, document understanding

Links:

https://easychair.org/publications/preprint/dwvK

BibTeX entry

BibTeX does not have the right entry for preprints. This is a hack for producing the correct reference:

@booklet{EasyChair:12130,
  author    = {Mohammad Minouei and Mohammad Reza Soheili and Didier Stricker},
  title     = {Embedding Layout in Text for Document Understanding Using Large Language Models},
  howpublished = {EasyChair Preprint 12130},
  year      = {EasyChair, 2024}}

Download PDF Open PDF in browser