Skip navigation
Brigham Young University
Login
Computer Science

Computer Science

George Busby's MS Thesis Proposal

antique_books_2.jpg

Joint or Pipeline?: A Comparison of Approaches for Multi-Layered Sequence Annotation Tasks in NLP 

 

ABSTRACT:

 

We propose that joint approaches do not offer enough of a gain for the required computational burden over pipeline approaches and propose to conduct a much needed comparison of techniques, for three Natural Language Processing (NLP) tasks involving multi-layered sequence annotation. Multi-layered sequence annotation tasks involve the labeling of a sequence with multiple levels of annotations, either in a joint or cascaded manner. The three sequence labeling tasks to be addressed in this thesis are Syriac diacritization, segmentation, and morphological tagging; Arabic diacritization, segmentation, and morphological tagging; and English part of speech tagging and phrase chunking. This is the first work on a probabilistic morphological analyzer for Syriac. We will compare several techniques on these tasks in terms of accuracy and training and test times. This type of a comparative analysis benefits those interested in machine learning techniques for multi-layered sequence labeling, in particular, those interested in sequence tasks which can be modeled as sub-sequence tasks. The techniques we will compare are 1-best greedy, n-best, sampling-based inference, Factorial Hidden Markov Models (FHMMs), Dynamic Conditional Random Fields (DCRFs), cross-product space sequence tagging, and search-based. We will also provide a comparison of different interdependencies, different sub-task Markov orders, different annotation orderings and different features for all tasks.

 

eStore