BYU CS students recently won second place in an international competition for their handwriting recognition software.
CS professor Bill Barrett and a group of grad students worked on their submission for months, competing against student teams from around the world and even a professional company.
The competition, called International Conference on Frontiers in Handwriting Recognition Software (ICFHR), required the teams to work with part of an historical document. The team needed to create a computer system that could scan the document and produce an accurate transcription.
The BYU computer science team included CS graduate students Curtis Wigington, Lucas Pinto, Seth Stewart and Brian Davis.
To do this the team needed to find a way for the software to recognize each character and word, but the team faced several challenges along the way.
“The primary one was probably the bleed-through,” Davis said. “It’s like when you press really hard with a pen on paper and you can see the ink through the page.”
Another challenge was the archaic German dialect used in the document. If the document had been written in a modern language, the team could have programmed the software to know what to expect. But with older languages, as in this case, a computer has a harder time predicting what is written.
Instead, the team applied their knowledge of how the human eye recognizes visual information. The team created an algorithm that allowed the software to recognize patterns of words and character styles, similar to the way the human eye identifies different shapes.
“For the other problems with the documents we just had to explore combinations of algorithms to see what could help overcome those issues,” Wigington said. “Most of it was just a lot of trial and error to figure out what worked and what didn't.”
Once submissions were in, the competition measured how accurate a transcription the software could produce. To do this, each team’s software was used to transcribe another excerpt from the same document used to design the software.
To do this, the competition measured the percent of missed characters and words. The BYU CS team missed only 5 percent of characters, with about 20 percent of words missed. This placed them just behind the team from Germany who won first place.
“We were very pleased to place 2nd in this competition especially because we were relative newcomers to using this kind of network for handwriting recognition,” Barrett said.
“I was amazed at how well it went just based on how little time we had to put it together and the fact that I could barely read any of it and somehow [the algorithim] was reading successfully,” Wigington said.
The long-term goal is to get down to less than 1 percent of missed characters or words, Stewart said. That level of accuracy would mean the technology could be used to improve indexing efforts and be able to search for more meaningful information.
But no matter how advanced the technology becomes, there may always be a need for human interaction to aid the process, Stewart said.
Even so, the applications for this sort of technology are far-reaching. BYU has a particular interest in developing this field because of the LDS church’s involvement in Family History Indexing.
“We have been working on the problem of handwriting recognition in historical documents for many years,” Barrett said. “Much of this was motivated by the desire to assist the Church and FamilySearch.”
Barrett attributes much of the success of the competition to the students’ hard work and competitive edge. The students have continued refining their software since the competition and using the technology to transcribe other documents.
“We have now applied this handwriting recognition technology to thousands of authors and multiple languages on documents written centuries apart and have achieved results that surpass the best known [technology] that we are aware of,” Barrett said.