Computing That Serves

Unambiguous + Unlimited = Unsupervised


Thursday, December 6, 2007 - 11:00am


Marti Hearst, PhD, Associate Professor, School of Information, UC Berkeley

The key to many modern computational linguistics problems is to train a machine learning algorithm over large numbers of labeled examples. However, in most cases, acquisition of labeled data is expensive, and so for years researchers have been striving to develop unsupervised algorithms that require little or no labeled data.

I will discuss one type of (nearly) unsupervised algorithm which is enjoying wider applicability recently due to the nearly unlimited amount of searchable text that has become available via web search engines. The main idea is to find a way to restate the problem such that at least some unambiguous examples of the problem are likely to be found in the vast sea of text.  I will show examples of this kind of algorithm applied to problems of structural ambiguity resolution and semantic relation identification and touch on the larger implications.

Joint work with Preslav Nakov.  Supported in part by NSF DBI-0317510.


Dr. Marti Hearst is an associate professor in the School of Information at UC Berkeley, with an affiliate appointment in the
Computer Science Division.  Her primary research interests are user interfaces and visualization for search engines, computational linguistics, and empirical analysis of social media.

She received BA, MS, and PhD degrees in Computer Science from the University of California at Berkeley, and she was a Member of the Research Staff at Xerox PARC from 1994 to 1997. Prof. Hearst is on the editorial boards of ACM Transactions on the Web and ACM Transactions on Computer-Human Interaction and was formerly on the boards of Computational Linguistics, ACM Transactions on Information Systems, and IEEE Intelligent Systems, and was the program co-chair of HLT-NAACL '03 and SIGIR '99. She has received an NSF CAREER award, an IBM Faculty Award, an Okawa Foundation Fellowship, and two student-initiated Excellence in Teaching awards.