Skip navigation
Brigham Young University
Login
Computer Science

Computer Science

William (Bill) Lund's Qualifying Presentation

pedestal_and_apple.jpg
ABSTRACT:

This paper shows the degree to which the optical character recognition (OCR) output from poor quality documents can be improved through applying the results of multiple OCR engines to construct an aligned word lattice consisting of word hypotheses. Results from a collection of poor quality mid-twentieth century typewritten documents demonstrate an average reduction in word error rate (WER) of close to 40% through the use of three OCR engines. Additionally, an innovative admissible heuristic for the A* algorithm is developed, which results in a significant reduction in state space exploration to identify all optimal alignments of the OCR text output, a necessary step toward the construction of the word hypotheses lattice.  On average 0.0079% of the state space is explored to identify all optimal alignments of the OCR output of documents in the collection.

eStore