Computing That Serves

Unsupervised Data Visualization for Big Data Exploratory Analysis


Tuesday, March 13, 2018 - 12:00pm


Kevin Moon


Tony Martinez


Colloquium presented by Kevin Moon
Tuesday, March 13, 2018 at 12:00 P.M.
Location: 1170 TMCB


We live in an era of big data in which researchers in nearly every field are generating thousands or even millions of samples in high dimensions. Most methods in data science focus on prediction or impose restrictive assumptions that require established knowledge and understanding of the data; i.e. these methods require some level of expert supervision. However, in many cases, this knowledge is unavailable and the goal of data analysis is scientific discovery and to develop a better understanding of the data. There is especially a strong need for methods that perform unsupervised data visualization, which is crucial for developing intuition and understanding of the data. In this talk, I present PHATE: an unsupervised data visualization tool based on a new information distance that excels at denoising the data while preserving both global and local structure. In addition, I demonstrate PHATE on a variety of datasets including facial images, mass cytometry data, and new single-cell RNA-sequencing data. On the latter, I show how PHATE can be used to discover novel surface markers for sorting cell populations.


Kevin Moon graduated from Brigham Young University with a B.S. and M.S. in electrical engineering with a focus on signal processing. He then obtained an M.S. degree in math and a Ph.D. in electrical engineering at the University of Michigan where his research focused on accurately estimating distributional functionals such as entropy, divergence, and mutual information. He currently works on developing machine learning tools for analyzing big biological data in the Genetics department and the Applied Mathematics program at Yale University.