Graduate Thesis Or Dissertation

 

Incorporating and Learning Behavior Constraints for Sequential Decision Making Public Deposited

Downloadable Content

Download PDF
https://ir.library.oregonstate.edu/concern/graduate_thesis_or_dissertations/sj139431s

Descriptions

Attribute NameValues
Creator
Abstract
  • Writing a program that performs well in a complex environment is a challenging task. In such problems, a method of deterministic programming combined with reinforcement learning (RL) can be helpful. However, current systems either force developers to encode knowledge in very specific forms (e.g., state-action features), or assume advanced RL knowledge (e.g., ALISP). This thesis explores techniques that make it easier for developers, who may not be RL experts, to encode their knowledge in the form of behavior constraints. We begin with the framework of adaptation-based programming (ABP) for writing self-optimizing programs. Next, we show how a certain type of conditional independency called "influence information" arises naturally in ABP programs. We propose two algorithms for learning reactive policies that are capable of leveraging this knowledge. Using influence information to simplify the credit assignment problem produces significant performance improvements. Next, we turn our attention to problems in which a simulator allows us to replace reactive decision-making with time-bounded search, which often outperforms purely reactive decision-making at significant computational cost. We propose a new type of behavior constraint in the form of partial policies, which restricts behavior to a subset of good actions. Using a partial policy to prune sub-optimal actions reduces the action branching factor, thereby speeding up search. We propose three algorithms for learning partial policies offline, based on reducing the learning problem to i.i.d. supervised learning and we give a reduction-style analysis for each one. We give concrete implementations using the popular framework of Monte-Carlo tree search. Experiments on challenging problems demonstrates large performance improvements in search-based decision-making generated by the learned partial policies. Taken together, this thesis outlines a programming framework for injecting different forms of developer knowledge into reactive policy learning algorithms and search-based online planning algorithms. It represents a few small steps towards a programming paradigm that makes it easy to write programs that learn to perform well.
Resource Type
Date Available
Date Copyright
Date Issued
Degree Level
Degree Name
Degree Field
Degree Grantor
Commencement Year
Advisor
Committee Member
Academic Affiliation
Non-Academic Affiliation
Keyword
Subject
Rights Statement
Peer Reviewed
Language
Replaces
Additional Information
  • description.provenance : Made available in DSpace on 2015-06-16T15:59:41Z (GMT). No. of bitstreams: 2 license_rdf: 1232 bytes, checksum: bb87e2fb4674c76d0d2e9ed07fbb9c86 (MD5) PintoJervis2015.pdf: 3582964 bytes, checksum: 48af06a5e6adeb29f7635bf03a62191d (MD5) Previous issue date: 2015-06-12
  • description.provenance : Approved for entry into archive by Julie Kurtz(julie.kurtz@oregonstate.edu) on 2015-06-14T19:31:37Z (GMT) No. of bitstreams: 2 license_rdf: 1232 bytes, checksum: bb87e2fb4674c76d0d2e9ed07fbb9c86 (MD5) PintoJervis2015.pdf: 3582964 bytes, checksum: 48af06a5e6adeb29f7635bf03a62191d (MD5)
  • description.provenance : Approved for entry into archive by Laura Wilson(laura.wilson@oregonstate.edu) on 2015-06-16T15:59:41Z (GMT) No. of bitstreams: 2 license_rdf: 1232 bytes, checksum: bb87e2fb4674c76d0d2e9ed07fbb9c86 (MD5) PintoJervis2015.pdf: 3582964 bytes, checksum: 48af06a5e6adeb29f7635bf03a62191d (MD5)
  • description.provenance : Submitted by Jervis Pinto (pintoj@onid.orst.edu) on 2015-06-12T21:32:00Z No. of bitstreams: 2 license_rdf: 1232 bytes, checksum: bb87e2fb4674c76d0d2e9ed07fbb9c86 (MD5) PintoJervis2015.pdf: 3582964 bytes, checksum: 48af06a5e6adeb29f7635bf03a62191d (MD5)

Relationships

Parents:

This work has no parents.

Items