Download PDFOpen PDF in browser

FinRAD: Financial Readability Assessment Dataset - 13,000+ Definitions of Financial Terms for Measuring Readability

EasyChair Preprint no. 8001

9 pagesDate: May 22, 2022

Abstract

In today’s world, the advancement and spread of the Internet and digitalization have resulted in most information being openly accessible. This holds true for financial services as well. Investors make data driven decisions by analysing publicly available information like annual reports of listed companies, details regarding asset allocation of mutual funds, etc. Many a time these financial documents contain unknown financial terms. In such cases, it becomes important to look at their definitions. However, not all definitions are equally readable. Readability largely depends on the structure, complexity and constituent terms that make up a definition. This brings in the need for automatically evaluating the readability of definitions of financial terms. This paper presents a dataset, FinRAD, consisting of financial terms, their definitions and embeddings. In addition to standard readability scores (like “Flesch Reading Index (FRI)”, “Automated Readability Index (ARI)", “SMOG Index Score (SIS)”,“Dale-Chall formula(DCF)”, etc.), it also contains the readability scores (AR) assigned based on sources from which the terms have been collected. We manually inspect a sample from it to ensure the quality of the assignment. Subsequently, we prove that these rule-based standard readability scores do not correlate well with the assigned read-ability scores of definitions of financial terms. Finally, wepresent a few neural baselines using transformer based architecture to automatically calculate the readability scores. Pre-trainedFinBERT model fine-tuned onFinRADcorpus performs the best(AU-ROC = 0.9927, F1 = 0.9610). This corpus can be down-loaded from https://github.com/sohomghosh/FinRAD_Financial_Readability_Assessment_Dataset

Keyphrases: Financial Dataset, Financial texts, Natural Language Processing, readability

BibTeX entry
BibTeX does not have the right entry for preprints. This is a hack for producing the correct reference:
@Booklet{EasyChair:8001,
  author = {Sohom Ghosh and Shovon Sengupta and Sudip Kumar Naskar and Sunny Kumar Singh},
  title = {FinRAD: Financial Readability Assessment Dataset - 13,000+ Definitions of Financial Terms for Measuring Readability},
  howpublished = {EasyChair Preprint no. 8001},

  year = {EasyChair, 2022}}
Download PDFOpen PDF in browser