Computing That Serves

Improving the Quality of Neural Machine Translation Using Terminology Tokenization

Duane K. Dougal: MS Thesis Proposal
Friday, September 15, 3:00PM
3350 TMCB
Advisor: Deryle Lonsdale

Because most organizations use an increasing number of domain- or organization- specific words and phrases, a translation process, whether human or automated, must also be able to accurately and efficiently use these specific multilingual terminology collections.
However, comparatively little has been done to explore the use of vetted terminology as an input to MT for improved results. In fact, no established process currently exists to integrate terminology into MT as a general practice, and especially no established process for NMT exists to ensure that the translation of individual terms is consistent with an approved terminology collection.
The use of tokenization as a method of inserting terminology, of evaluating terminology insertion, or both will be the focus of this thesis.