Online Learning for Hybrid Text Corpus Simplification and Named Entity Recognition
September 08, 2021
Monday September 13th at 2pm
Advisor: Mark Clement
MS Thesis Defense for Brandon Bingham
Abstract:
Traditionally, state-of-the-art results in named entity recognition (NER) have relied on large labeled datasets. Unfortunately, these large datasets present a challenge for human labelers due to the large amount of data that they need to label and how repetitive the process is. Besides being costly, this process can result in errors in the dataset, which can have a negative impact on the final accuracy of the model.
This proposal introduces a novel approach to training NER systems in new domains and/ or new languages using a combination of online learning, text corpora simplification, and named entity recognition. I show that combining these systems into one significantly reduces cognitive load and time for experts tasked with labeling training data for domains for which no suitable knowledge base exists.