Computing That Serves

Skeptical Topic Modeling: Connecting Meanings, Topics, and Sentiment


Thursday, October 25, 2012 - 12:00pm


Jordan Boyd-Graber

University of Maryland


Eric Ringger, Kevin Seppi

 Imagine you need to get the gist of what's going on in a large text
dataset such as all tweets that mention Obama, all e-mails sent within a
company, or all newspaper articles published by the New York Times in the 1990s.
Topic models, which automatically discover the themes which permeate a corpus,
are a popular tool for discovering what's being discussed.

Topic models aren't perfect; errors hamper adoption, degrade performance in
downstream computational tasks, and prevent users from making sense of large
datasets.  I begin by first explaining that the answer is not simply to build
fancier statistical models, as the usefulness of topic models does not always
correlate with likelihood, the traditional objective function of statistical
models.  Given this understanding, what is possible to improve topic models?

As an attempt to answer this question, I will present models incorporate human
knowledge into topic models via ontologies, through direct interaction with
users, or by mimicking social processes.  After describing the statistical
formalisms that allow these knowledge repositories to be seamlessly integrated
and the associated computational challenges, I demonstrate how these models can
contribute to real-word natural language processing tasks such as classifying
documents, predicting sentiment, topic segmentation, and detecting who is
influential in a conversation.


Jordan Boyd-Graber is an assistant professor in Maryland's iSchool and the Institute for Advanced Computer Studies, where he is also a member of the Cloud Computing Center and the Computational Linguistics and Information
Processing (CLIP) Lab.  He is a 2010 graduate of Princeton University, with a PhD thesis on "Linguistic Extensions of Topic Models" working under David Blei.  Jordan's research focus is in applying machine learning and Bayesian
probabilistic models to problems that help us better understand social interaction or the human cognitive process.  This research often leads him to use tools such as large-scale inference for probabilistic methods, natural language processing, multilingual corpus understanding, and human computation.

His research applies statistical models to natural language problems in ways that interact with humans, learn from humans, or help researchers understand humans.  Jordan is an expert in the application of topic models, completely
automatic tools that can discover structure and meaning in large, multilingual datasets. He is a contributor to the Natural Language Toolkit (NLTK), a popular tool used in natural language education and research.  His 
current work is supported by NSF, IARPA, and ARL.

He has received the NIPS Best Student Paper award honorable mention, a Computing Innovation Fellowship (declined), and received the Jorgensen scholarship while an undergrad at the California Institute of Technology.

Jordan is originally from Iowa, where he received a BS in computer science and in history in 2004.  In his spare time, Jordan enjoys competing in and writing questions for trivia competitions.