New learning modes for sequential decision making Public Deposited

http://ir.library.oregonstate.edu/concern/graduate_thesis_or_dissertations/1z40kx414

Descriptions

Attribute NameValues
Creator
Abstract or Summary
  • This thesis considers the problem in which a teacher is interested in teaching action policies to computer agents for sequential decision making. The vast majority of policy learning algorithms o er teachers little flexibility in how policies are taught. In particular, one of two learning modes is typically considered: 1) Imitation learning, where the teacher demonstrates explicit action sequences to the learner, and 2) Reinforcement learning, where the teacher designs a reward function for the learner to autonomously optimize via practice. This is in sharp contrast to how humans teach other humans, where many other learning modes are commonly used besides imitation and practice. This thesis presents novel learning modes for teaching policies to computer agents, with the eventual aim of allowing human teachers to teach computer agents more naturally and efficiently. Our first learning mode is inspired by how humans learn: through rounds of practice followed by feedback from a teacher. We adopt this mode to create computer agents that learn from several rounds of autonomous practice followed by critique feedback from a teacher. Our results show that this mode of policy learning is more e effective than pure reinforcement learning, though important usability issues arise when used with human teachers. Next we consider a learning mode where the computer agent can actively ask questions to the teacher, which we call active imitation learning. We provide algorithms for active imitation learning that are proven to require strictly less interaction with the teacher than passive imitation learning. We also show that empirically active imitation learning algorithms are much more efficient than traditional passive imitation learning in terms of amount of interaction with the teacher. Lastly, we introduce a novel imitation learning mode that allows a teacher to specify shaping rewards to a computer agent in addition to demonstrations. Shaping rewards are additional rewards supplied to an agent for accelerating policy learning via reinforcement learning. We provide an algorithm to incorporate shaping rewards in imitation learning and show that it learns from fewer demonstrations than pure imitation learning. We wrap up by presenting a prototype User-Initiated Learning (UIL) system that allows an end user to demonstrate procedures containing optional steps and instruct the system to autonomously learn to predict when the optional steps should be executed, and remind the user if they forget. Our prototype supports user-initiated demonstration and learning via a natural interface, and has a built-in automated machine learning engine to automatically train and install a predictor for the requested prediction problem.
Resource Type
Date Available
Date Copyright
Date Issued
Degree Level
Degree Name
Degree Field
Degree Grantor
Commencement Year
Advisor
Committee Member
Academic Affiliation
Non-Academic Affiliation
Keyword
Subject
Rights Statement
Peer Reviewed
Language
Replaces
Additional Information
  • description.provenance : Submitted by Kshitij Judah (judahk@onid.orst.edu) on 2014-04-16T19:33:58Z No. of bitstreams: 1 JudahKshitij2014.pdf: 10092338 bytes, checksum: 782811f9fab0b8b63d8a17a6536cdb5a (MD5)
  • description.provenance : Approved for entry into archive by Julie Kurtz(julie.kurtz@oregonstate.edu) on 2014-04-18T15:56:43Z (GMT) No. of bitstreams: 1 JudahKshitij2014.pdf: 10092444 bytes, checksum: 741e8f3c7bc6b98a88964dd6d0deb71d (MD5)
  • description.provenance : Approved for entry into archive by Laura Wilson(laura.wilson@oregonstate.edu) on 2014-04-18T17:26:31Z (GMT) No. of bitstreams: 1 JudahKshitij2014.pdf: 10092444 bytes, checksum: 741e8f3c7bc6b98a88964dd6d0deb71d (MD5)
  • description.provenance : Rejected by Patricia Black(patricia.black@oregonstate.edu), reason: Replace file. on 2014-04-17T15:24:51Z (GMT)
  • description.provenance : Made available in DSpace on 2014-04-18T17:26:31Z (GMT). No. of bitstreams: 1 JudahKshitij2014.pdf: 10092444 bytes, checksum: 741e8f3c7bc6b98a88964dd6d0deb71d (MD5) Previous issue date: 2014-03-21
  • description.provenance : Submitted by Kshitij Judah (judahk@onid.orst.edu) on 2014-04-17T19:47:28Z No. of bitstreams: 1 JudahKshitij2014.pdf: 10092444 bytes, checksum: 741e8f3c7bc6b98a88964dd6d0deb71d (MD5)

Relationships

In Administrative Set:
Last modified: 08/22/2017

Downloadable Content

Download PDF
Citations:

EndNote | Zotero | Mendeley

Items