Download PDFOpen PDF in browser

"How good is good enough?" Establishing quality thresholds for the automatic text analysis of retro-digitized comics

EasyChair Preprint no. 573

10 pagesDate: October 11, 2018

Abstract

Stylometry in the form of simple statistical text analysis has proven to be a powerful tool for text classification, e.g. in the form of authorship attribution. When analyzing retro-digitized comics, manga and graphic novels, the researcher is confronted with the problem that automated text recognition (ATR) still leads to results that have comparatively high error rates, while the manual transcription of texts remains highly time-consuming. In this paper, we present an approach and measures that specify whether stylometry based on unsupervised ATR will produce reliable results for a given da-taset of comics images.

Keyphrases: ATR, automatic text analysis, graphic novels, OCR, Stylometric Analysis, text analysis

BibTeX entry
BibTeX does not have the right entry for preprints. This is a hack for producing the correct reference:
@Booklet{EasyChair:573,
  author = {Rita Hartel and Alexander Dunst},
  title = {"How good is good enough?" Establishing quality thresholds for the automatic text analysis of retro-digitized comics},
  howpublished = {EasyChair Preprint no. 573},
  doi = {10.29007/1zvp},
  year = {EasyChair, 2018}}
Download PDFOpen PDF in browser