Computing That Serves

The Magnificent Extensible Markov Model (EMM)


Thursday, March 11, 2010 - 11:00am


(Maggie) H. Dunham
Department of Computer Science and Engineering
Southern Methodist University

For the past five years, the data mining research group at SMU has been investigating the properties of and applications for a new dynamic Markov modeling technique call Extensible Markov Model (EMM).  Originally targeted to spatiotemporal prediction problems, the EMM is uniquely suited to datasets that grow over time – such as data streams.  We have previously studied its use for flood prediction and anomaly/rare event/intrusion detection.   We are currently examining its use for hurricane intensity forecasting, stream clustering, and bioinformatics.  In this talk, Professor Dunham will first introduce the EMM model, then provide an overview of current research in temporal stream clustering and bioinformatics DNA/RNA sequence analysis.:
•    We  propose a new extension to clustering data streams based on the Temporal Relationship Among Clusters for Data Streams (TRACDS). This is not a new clustering algorithm, but rather a way to capture the temporal relationships among clusters that is inherent in the ordering of observations in the data stream. We propose to capture this ordering relationship among the clusters by overlaying clusters created by any data stream clustering algorithm with an EMM.
•    Classification of sequences into a comprehensive taxonomic structure has traditionally required complex multi sequence alignment followed by clustering.  Our research proposes a more space and time efficient modeling and method as an alternative.  In particular, we propose use of Extensible Markov Models to create compact signature libraries that cluster similar segments of sequence families and structure them in a Markov model preserving the inherent intra-sequence order within. The signatures represent individual organisms with multiple gene copies or whole communities corresponding to large branches in the microbial taxonomy.


Margaret (Maggie) H. Dunham received the B.A. and M.S. degrees in mathematics from Miami University, Oxford, Ohio, and the Ph.D. degree in computer science from Southern Methodist University in 1970, 1972, and 1984 respectively. From August 1984 to the present, she has been first an assistant professor, an associate professor, and now a Professor in the department of Computer Science and Engineering at Southern Methodist University in Dallas. Professor Dunham's current research interests are in the areas of Data Mining and Bioinformatics. Her previous research has encompassed Database Concurrency Control, Main Memory Databases, Temporal Databases, and Mobile Computing. Dr. Dunham is author of the popular data mining text Data Mining Introductory and Advanced Topics published by Prentice-Hall. Dr. Dunham was Associate Editor of the IEEE Transaction on Knowledge and Data Engineering from 2000-2004, editor of the ACM SIGMOD Record from 1986 to 1988, as well as guest editor for several special issues of IEEE and ACM journals. She served as the general conference chair for the ACM SIGMOD/PODS conference held in Dallas in May 2000, and has served on the program and organizing committees for many ACM and IEEE conferences. She has published over 100 technical papers in many diverse database and data mining topics. Professor Dunham lives in Dallas with husband Jim, daughters Stephanie and Kristina, cat Ringo, and dog Sandy.