Download PDFOpen PDF in browser

Applications of LLM and NLP in the Retrieval and Analysis of Institutional Publications

10 pagesPublished: November 6, 2025

Abstract

Modern higher education institutions are increasingly requiring advanced tools to understand and analyze their research output effectively. This paper explores the integration of Large Language Models (LLMs) and Natural Language Processing (NLP) to develop an advanced system for institutional publication analysis at the University of Münster. We demonstrate how Retrieval Augmented Generation (RAG) pipelines can enhance the discovery and analysis of research documents through semantic search and intelligent information processing. Our current approach integrates vector-based document representation with graph-based modeling to construct a RAG pipeline for publication retrieval and analysis. Although still in early development, the system is designed to support several use cases, including the identification of research trends and collaboration patterns, as well as the generation of summaries and reports for specific research topics. We share practical insights from the implementation of this system and discuss technical solutions to common challenges related to the processing and analysis of scientific publications.

Keyphrases: ai artificial intelligence, bibliometrics, llm large language models, nlp natural language processing, rag retrieval augmented generation, scientometrics

In: Laurence Desnos, Raimund Vogl, Lazaros Merakos, Carmen Diaz, Janina Mincer-Daszkiewicz and Stuart Mclellan (editors). Proceedings of EUNIS 2025 annual congress in Belfast, vol 107, pages 60-69.

BibTeX entry
@inproceedings{EUNIS2025:Applications_LLM_NLP_Retrieval,
  author    = {Luis Filipe de Araújo Pessoa and Raimund Vogl},
  title     = {Applications of LLM and NLP in the Retrieval and Analysis of Institutional Publications},
  booktitle = {Proceedings of EUNIS 2025 annual congress in Belfast},
  editor    = {Laurence Desnos and Raimund Vogl and Lazaros Merakos and Carmen Diaz and Janina Mincer-Daszkiewicz and Stuart Mclellan},
  series    = {EPiC Series in Computing},
  volume    = {107},
  publisher = {EasyChair},
  bibsource = {EasyChair, https://easychair.org},
  issn      = {2398-7340},
  url       = {/publications/paper/r25J},
  doi       = {10.29007/v3v7},
  pages     = {60-69},
  year      = {2025}}
Download PDFOpen PDF in browser