Graduate Thesis Or Dissertation
 

High-Dimensional Reinforcement Learning with Human Feedback

Public Deposited

Downloadable Content

Download PDF
https://ir.library.oregonstate.edu/concern/graduate_thesis_or_dissertations/gh93h377p

Descriptions

Attribute NameValues
Creator
Abstract
  • State-of-the-art personal robots need to perform complex manipulation tasks to be viable in complex scenarios. However, many of these robots, like the PR2, use manipulators with high degrees of freedom. High degrees of freedom are desirable from a functionality standpoint, but make the learning task more difficult by adding a high-dimensional state space. The problem is made worse in bimanual manipulation tasks. Our proposed approach is to scale existing reinforcement learning techniques to learn in high-dimensional robot control problems. We propose reducing the state space by using demonstrations to discover a representative low-dimensional manifold in which to learn. This allows the agent to converge quickly to a good policy. We call this Dimensionality-Reduced Reinforcement Learning (DRRL). However, when performing dimensionality reduction, sometimes important state information is lost. We extend this work by first learning in a single dimension, and then transferring that knowledge to a higher-dimensional space. By using our Iterative DRRL (IDRRL) framework with an existing learning algorithm, the agent converges quickly to a better policy by iterating to increasingly higher dimensions. IDRRL is robust to demonstration quality and can learn efficiently using few demonstrations. We use Principal Component Analysis (PCA) for our linear dimensionality reduction in DRRL and IDRRL. However, linear dimensionality reduction assumes that the underlying data can be represented by a lower dimension linear subspace. Robot state spaces typically include velocities and accelerations, whose equations of motion are inherently nonlinear. Standard linear dimensionality reduction techniques cannot accurately represent complex nonlinear structures. However, nonlinear dimensionality reduction techniques are too computationally complex to use online. To overcome these limitations, we introduce a novel approach to dimensionality reduction based on a system of cascading autoencoders (CAE), producing the new algorithm IDRRL-CAE. Optimization is useful, but fast learning doesn't help if the objective function is deceptive or difficult to define mathematically. In many cases, roboticists may not be able to predict all scenarios their robots may experience, and thus cannot design an objective function for every case apriori. In these situations it may be helpful to incorporate human feedback. To give effective feedback, users need an interface that is intuitive, time insensitive, and incorporates both fine-grained and coarse feedback. To incorporate human feedback in our learning, we use timeline interfaces. Timeline interfaces that allow you to move backward and forward through a video have been used by video editors for years. They are simple and designed for both non-experts and video editing experts. These interfaces allow a user to cut, concatenate, rewind, fast forward, and perform many other tasks on videos. They speed up the editing process by decoupling the timescale of the editing process from the timescale of the video being edited. These same concepts can be used in human feedback mechanisms for robot control systems. Current human feedback mechanisms require the user to quickly respond to robot actions, work in only discrete spaces, or only allow for either coarse or detailed feedback. The timeline interface paradigm naturally accounts for fine-grained state spaces, does not require quick human feedback, allows the user to make both coarse and fine-grained edits to video, and decouples the speed of the video from the speed of feedback. In this dissertation we present a proof-of-concept movie reel interface that uses this timeline interface paradigm.
License
Resource Type
Date Available
Date Issued
Degree Level
Degree Name
Degree Field
Degree Grantor
Commencement Year
Advisor
Committee Member
Academic Affiliation
Non-Academic Affiliation
Rights Statement
Publisher
Peer Reviewed
Language
Replaces

Relationships

Parents:

This work has no parents.

In Collection:

Items