ABSTRACT:
Today's computers are capable of performing many tasks that previously only people could perform.This is partly due to recent techniques that have allowed computer systems to "learn" from experience. Machine Learning (ML) is the field of computer science that attempts to understand what it means to learn, and how such learning can be performed.
One major missing ingredient in the study of Machine Learning is a cohesive theory and framework in which a broad range of questions can be analyzed and studied. An effective "theory" of Machine Learning should be able to answer questions in areas such as: Learnability, Bias, Overfit, No-Free-Lunch, Meta Learning, Transfer Learning, Active Learning, and Sample Complexity. Other important issues such as the connection between Machine Learning and other fields such as Function Optimization, Compression, and Information Theory should also be explored.
Several paradigms or theories that attempt to formalize the learning process have been proposed including: PAC learnability, VC dimensionality, and the Extended Bayesian Framework (EBF). To date, none of the proposed frameworks have been able to answer all of these questions in a satisfactory manner. For example, in some cases it is possible to use VC dimension or PAC learnability to show that some problems are learnable and to place theoretical upper bounds on their sample complexity (the number of examples required to learn a given problem). However, in practice, such bounds are seldom if ever used, because they vastly overestimate the actual number of examples required. Furthermore, PAC learnability and VC dimensionality do not provide a framework in which the other questions can be easily answered. No single theory has yet been proposed which deal with all of the above areas.
We propose a Unified Bayesian Decision Theoretic Model (UBDTM). The UBDTM models both the classification problem, the regression problem, and the function optimization problem as a single graphical model that expresses the dependencies and relationships between observed feature vectors, observed labels, unobserved (test and validation) feature vectors, unobserved labels, the function that maps them, and the location of the extremum (or multiple extremum) of the function. There are many advantages to thinking about Machine Learning and optimization in this way. Instead of creating algorithms, the machine learning practitioner's job is now to create representations and priors, while the algorithm is fixed as simple inference. The model makes the algorithm's bias explicit in the function representation and priors. The model makes the need for a bias clear. Given the model's explicit bias, we can determine the function class over which the algorithm is expected to perform well. Meta learning can be modeled in terms of a hierarchical Bayesian model. Active Learning can be seen as the result of decision theory involving utilities and the expected value of sample information (or EVSI). Finally, a similar calculation can be used to analyze average case sample complexity.
It will be beyond the scope of a single dissertation to adequately address all of the above issues in terms of the UBDTM, however, we believe that this framework will provide a sound theoretical framework in which all of the above issues can be eventually addressed. This dissertation will focus on addressing a few specific questions in the context of the UBDTM. Specifically, we will show that the UBDTM provides another way of thinking about the bias and No-Free-Lunch problems, and that given some utility assumptions a priori distinctions between learning algorithms are possible. We will show that UBDTM provides another useful way of looking at overfit and uncertainty and the importance of explicit utility functions in Machine Learning for decision making. We will then demonstrate the ability of UBDTM to explain existing Machine Learning techniques and to guide improvements to those techniques with an example using the CMAC ANN topology. We will then focus on the insights that the model provides for active learning. We will show that the model explains the behavior of existing active learning techniques and guides the creation of new active learning techniques. The performance of these techniques will be demonstrated on several synthetic problems and on real world problems including a tagging problem which is part of a larger joint project to build an annotated Syriac corpus with the BYU Center for the Preservation of Ancient Religious Texts (CPART). Thus, tools derived from our model will be used to build a publicly available corpus of tagged Syriac tests. Finally we will show that these concepts of active learning can also be applied to function optimization since they share the same model. Thus, one advantage of the UBDTM is that it makes the connection between
supervised learning and function optimization clear. Although we believe that the remaining questions of meta learning, transfer learning, learnability, and sample complexity can also be addressed in terms of the UBDTM we will leave these problems for future work.
Intellectual Merit: The UBDTM has the potential to advance our general understanding of the supervised learning and function optimization problems, and various proposed solutions to them, by better understanding the underlying processes involved.
Broader Impact: Supervised Machine Learning problems are important in a wide variety of problems of relevance in a wide variety of fields, for example: linguistics (tagging natural language text); robotics (machine vision); homeland security (face recognition, machine translation); and medicine (disease diagnosis). Function optimization is one of the most common computational problems encountered in all of computer science. It is hoped that a better understanding of the statistical theory involved in supervised learning and optimization will lead to better algorithms and algorithm analysis. Testing will involve cross department research involving individuals from Computer Science, Linguistics, and CPART.

