Bayesian methods for knowledge transfer and policy search in reinforcement learning Public Deposited

http://ir.library.oregonstate.edu/concern/graduate_thesis_or_dissertations/gq67jv42z

Descriptions

Attribute NameValues
Creator
Abstract or Summary
  • How can an agent generalize its knowledge to new circumstances? To learn effectively an agent acting in a sequential decision problem must make intelligent action selection choices based on its available knowledge. This dissertation focuses on Bayesian methods of representing learned knowledge and develops novel algorithms that exploit the represented knowledge when selecting actions. Our first contribution introduces the multi-task Reinforcement Learning setting in which an agent solves a sequence of tasks. An agent equipped with knowledge of the relationship between tasks can transfer knowledge between them. We propose the transfer of two distinct types of knowledge: knowledge of domain models and knowledge of policies. To represent the transferable knowledge, we propose hierarchical Bayesian priors on domain models and policies respectively. To transfer domain model knowledge, we introduce a new algorithm for model-based Bayesian Reinforcement Learning in the multi-task setting which exploits the learned hierarchical Bayesian model to improve exploration in related tasks. To transfer policy knowledge, we introduce a new policy search algorithm that accepts a policy prior as input and uses the prior to bias policy search. A specific implementation of this algorithm is developed that accepts a hierarchical policy prior. The algorithm learns the hierarchical structure and reuses components of the structure in related tasks. Our second contribution addresses the basic problem of generalizing knowledge gained from previously-executed policies. Bayesian Optimization is a method of exploiting a prior model of an objective function to quickly identify the point maximizing the modeled objective. Successful use of Bayesian Optimization in Reinforcement Learning requires a model relating policies and their performance. Given such a model, Bayesian Optimization can be applied to search for an optimal policy. Early work using Bayesian Optimization in the Reinforcement Learning setting ignored the sequential nature of the underlying decision problem. The work presented in this thesis explicitly addresses this problem. We construct new Bayesian models that take advantage of sequence information to better generalize knowledge across policies. We empirically evaluate the value of this approach in a variety of Reinforcement Learning benchmark problems. Experiments show that our method significantly reduces the amount of exploration required to identify the optimal policy. Our final contribution is a new framework for learning parametric policies from queries presented to an expert. In many domains it is difficult to provide expert demonstrations of desired policies. However, it may still be a simple matter for an expert to identify good and bad performance. To take advantage of this limited expert knowledge, our agent presents experts with pairs of demonstrations and asks which of the demonstrations best represents a latent target behavior. The goal is to use a small number of queries to elicit the latent behavior from the expert. We formulate a Bayesian model of the querying process, an inference procedure that estimates the posterior distribution over the latent policy space, and an active procedure for selecting new queries for presentation to the expert. We show, in multiple domains, that the algorithm successfully learns the target policy and that the active learning strategy generally improves the speed of learning.
Resource Type
Date Available
Date Copyright
Date Issued
Degree Level
Degree Name
Degree Field
Degree Grantor
Commencement Year
Advisor
Committee Member
Academic Affiliation
Non-Academic Affiliation
Keyword
Subject
Rights Statement
Peer Reviewed
Language
Replaces
Additional Information
  • description.provenance : Approved for entry into archive by Julie Kurtz(julie.kurtz@oregonstate.edu) on 2012-10-18T15:52:26Z (GMT) No. of bitstreams: 1 WilsonAaronC2012.pdf: 7952750 bytes, checksum: e4f8ca3747a6ad46c02b5371043c412b (MD5)
  • description.provenance : Approved for entry into archive by Laura Wilson(laura.wilson@oregonstate.edu) on 2012-10-22T15:59:05Z (GMT) No. of bitstreams: 1 WilsonAaronC2012.pdf: 7952750 bytes, checksum: e4f8ca3747a6ad46c02b5371043c412b (MD5)
  • description.provenance : Submitted by Aaron Wilson (wilsonaa@onid.orst.edu) on 2012-10-15T19:44:56Z No. of bitstreams: 1 WilsonAaronC2012.pdf: 7952750 bytes, checksum: e4f8ca3747a6ad46c02b5371043c412b (MD5)
  • description.provenance : Made available in DSpace on 2012-10-22T15:59:05Z (GMT). No. of bitstreams: 1 WilsonAaronC2012.pdf: 7952750 bytes, checksum: e4f8ca3747a6ad46c02b5371043c412b (MD5) Previous issue date: 2012-07-28

Relationships

Parents:

This work has no parents.

Last modified

Downloadable Content

Download PDF

Items