Online Learning for Hybrid Text Corpus Simplification and Named Entity Recognition

September 08, 2021

Monday September 13th at 2pm

Advisor: Mark Clement

MS Thesis Defense for Brandon Bingham


Traditionally, state-of-the-art results in named entity recognition (NER) have relied on large labeled datasets. Unfortunately, these large datasets present a challenge for human labelers due to the large amount of data that they need to label and how repetitive the process is. Besides being costly, this process can result in errors in the dataset, which can have a negative impact on the final accuracy of the model.

This proposal introduces a novel approach to training NER systems in new domains and/ or new languages using a combination of online learning, text corpora simplification, and named entity recognition. I show that combining these systems into one significantly reduces cognitive load and time for experts tasked with labeling training data for domains for which no suitable knowledge base exists.