Abstract:
This thesis addresses the problem of learning dynamic Bayesian network
(DBN) models to support reinforcement learning. It focuses on
learning regression tree models of the
conditional probability distributions of the DBNs. Existing
algorithms presume that the
stochasticity in the domain can be modeled as a deterministic function
with additive noise. This is inappropriate for many RL domains, where
the stochasticity takes the form of a random choice over
deterministic functions. This paper
introduces a regression tree algorithm in which each leaf node is
modeled as a finite mixture of deterministic functions. This mixture is
approximated via a greedy set cover. To combat overfitting, pruning
techniques incorporating log likelihood and KL-Divergence are employed.
Experiments on three challenging RL domains, two with stochastic variants,
show that this approach
finds trees that are more accurate and that are more likely to
correctly identify the conditional dependencies in the DBNs based on
small samples.