Computing That Serves

Active and Proactive Machine Learning


Thursday, October 28, 2010 - 11:00am


Jaime Carbonell
Director, Language Technologies Institute
Allen Newell Professor, Computer Science
Carnegie Mellon University


Christophe Giraud-Carrier

Subtitle: From Fundamentals to Applications in Computational Biology, Machine Translation and Wind Energy

Supervised machine learning has a voracious appetite for labeled data, but said data may be scarce compared to unlabeled data.  Active learning seeks to determine which unlabeled data instances are most valuable to label (e.g. an expert physician providing a diagnosis given a set of symptoms and lab results) in order to reduce predictive error.  We advocate a major extension to active learning relaxing restrictive assumptions such as the existence of a single omniscient labeling oracle.  Instead we investigate more realistic settings such as the presence of multiple potentially-fallible or reluctant external information sources with variable costs and unknown reliability.  Proactive learning reaches out to these sources and jointly optimizes learning source properties (e.g. labeler accuracy, expertise area), selection of source, and selection of maximally-informative instances for the learning task at hand.  The proactive sampling methods trade off cost vs. information value and amortized benefit vs. immediate rewards, being largely agnostic to the base-level learning algorithms.  We have applied these methods to synthetic data, benchmark test data, and most recently are applying them to new challenges such as low-resource machine translation, inferring the human protein interactome (and host-pathogen interactomes), and large-scale wind-energy optimization.  The talk present improvements in active learning, move to the proactive framework and touch on these application areas.


Dr. Jaime Carbonell is the Director of the Language Technologies Institute and Allen Newell Professor of Computer Science at Carnegie Mellon University.  He received BS degrees in Physics and Mathematics from MIT, and MS and PhD degrees in Computer Science from Yale University. His current research spans multiple aspects Language Technologies and Machine Learning: text mining, machine translation, active and proactive machine learning, detection of hidden trends and categories,  automated summarization, question answering, etc.  Dr. Carbonell’s most recent work focuses on core challenges in machine learning such as proactive learning,  machine translation for rare languages  and computational proteomics. He co-edited four machine learning books, and served as editor-in-chief of the Machine Learning Journal. Dr. Carbonell has served on multiple governmental advisory committees such as the Human Genome Committee of the National Institutes of Health, the Oakridge National Laboratories Scientific Advisory Board, the National Institute of Standards and Technology Interactive Systems Scientific Advisory Board, and the German National Artificial Intelligence (DFKI) Scientific Advisory Board.