Computing That Serves

Machine Learning in Public Health


Thursday, April 1, 2010 - 11:00am


Kristin Bennett
Departments of Mathematical Sciences and Computer Science
Rensselaer Polytechnic Institute

We examine two ongoing projects that use machine learning to tackle public health problems.  In each case, learning models are constructed to predict unobserved properties of the data that correspond to public health questions of interest.   

In the first project, we show how customized machine learning models can accelerate drug discovery by predicting key properties of a molecule associated with metabolism.   The cost of drug development has sky rocketed to over $1 billion per drug with many drugs experiencing late state failures due to problems with absorption, distribution, metabolism, and excretion (ADME-tox).  Thus virtual screening of ADME-tox properties of molecules could greatly decrease the time and cost necessary for drug discovery.  Our goal is to predict the group of hydrogen atoms from which a hydrogen is abstracted (removed) by an enzyme during metabolism. A novel multiple-instance ranking model predicts the preferred hydrogen group within a molecule by ranking the groups, with the ambiguity of not knowing which hydrogen atom within the preferred group is actually abstracted.    

In the second project, we examine how DNA fingerprints of Mycobacterium tuberculosis can be used to track and control the spread of tuberculosis (TB).  An estimated 1/3 of the world’s population is infected with TB resulting in over 2 million deaths per year.  We develop learning models to predict the genetic lineages of M. tuberculosis based on DNA fingerprints.  Mining of tuberculosis patient surveillance data with respect to these genetic lineages helps discover outbreaks and improve TB control.


Kristin P. Bennett is a Professor in the Mathematical Sciences and Computer Science Departments at Rensselaer Polytechnic Institute. Bennett is an active member of the machine learning, data mining, and operations research communities, serving as present or past associate or guest editor for ACM Transactions on Knowledge Discovery from Data, SIAM Journal on Optimization, Naval Research Logistics, Machine Learning Journal, IEEE Transactions on Neural Networks, and Journal on Machine Learning Research. She served as program chair of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. She has a Ph.D. and M.S. in Computer Sciences from the University of Wisconsin-Madison, and a B.S. in Mathematics and Computer Science from the University of Puget Sound. She has been researching mathematical-programming approaches to machine learning such as support vector machines since 1989.  In addition, she has worked extensively on successful application of machine learning to problems in chemistry, biology, epidemiology, engineering, and business.