Optical Character Recognition for a Redaction System Using Machine Learning Techniques.

EasyChair Preprint 3495

12 pages•Date: May 28, 2020

M. Kanchana, Muskan Sharma and Hrithik Somani

Abstract

This paper presents the use of OCR in an automatic Redaction System. A Redactor is a system which takes in any electronic document as an input from the user and identifies sensitive information, mainly nouns, such as: Person name, country name, gender, credit card information, phone numbers, email id, any confidential information that is to be not shown to the end user who the document is to be sent to. Initially, the user inputs a document, probably an image. This image is then pre-processed and put into the OCR which extracts the text out of the image. Hence, to be able to identify the sensitive information the very first step is to extract the information. A major application of an OCR is Redaction. Reading of information present in the documents can be read with the help of an OCR Machine.

Keyphrases: Named Entity Recognition, Natural Language Processing, Optical Character Recognition, machine learning

Links:

https://easychair.org/publications/preprint/Mxzz

BibTeX entry

BibTeX does not have the right entry for preprints. This is a hack for producing the correct reference:

@booklet{EasyChair:3495,
  author    = {M. Kanchana and Muskan Sharma and Hrithik Somani},
  title     = {Optical Character Recognition for a Redaction System Using Machine Learning Techniques.},
  howpublished = {EasyChair Preprint 3495},
  year      = {EasyChair, 2020}}

Download PDF Open PDF in browser