IJDAR (2001) 3: 125–137OCRSpell: an interactive spelling correction systemfor OCR errors in textKazem Taghva, Eric StofskyInformation Science Research Institute, University of Nevada, Las Vegas, Las Vegas, NV 89154-4021, USA;e-mail: taghva@isri.unlv.eduReceived August 16, 2000 / Revised October 6, 2000Abstract. In this paper, we describe a spelling correc- tion of all the characters [16]. In fact, Damerau deter-tionsystemdesignedspecificallyforOCR-generatedtext mined that 80% of all misspellings can be corrected bythat selects candidate words through the use of infor- the above approach [7]. However, this sample containedmation gathered from multiple knowledge sources. This errors that were typographical in nature. For OCR text,systemfortextcorrectionisbasedonstaticanddynamic the above procedure can not be relied upon to deliverdevice mappings, approximate string matching, and n- corrected text for many reasons:gram analysis. Our statistically based, Bayesian system– In OCR text, word isolation is much more difficultincorporates a learning feature that collects confusioninformation at the collection and document levels. An since errors can include the substitution and inser-tion of numbers, punctuation, and other nonalpha-evaluation of the new system is presented as well.betic characters.– Device mappings are not guaranteed to be one-Keywords: OCR-Spellcheckers–Informationretrievalto-one. For example, the substitution of iii for– Error correction – ...
Voir