Model-based approximation methods for reinforcement learning

Wang, Xin

Graduate Thesis Or Dissertation

Model-based approximation methods for reinforcement learning

Public Deposited

Download PDF

Citeable URL: https://ir.library.oregonstate.edu/concern/graduate_thesis_or_dissertations/000002283

Descriptions

Attribute Name	Values
Creator	Wang, Xin
Abstract	The thesis focuses on model-based approximation methods for reinforcement learning with large scale applications such as combinatorial optimization problems. First, the thesis proposes two new model-based methods to stablize the value–function approximation for reinforcement learning. The first one is the BFBP algorithm, a batch-like reinforcement learning process which iterates between the exploration and exploitation stages of the learning process. For the exploitation part, this thesis investigates the plausibility and performance of using more efficient offline algorithms such as linear regression, regression trees, and SVMs for value–function approximators. The thesis discovers that with systematic local search methods such as Limited Discrepancy Search and a good initial heuristic, the algorithm often coverges faster and to a better level of performance, compared with epsilon greedy exploration methods. The second method combines linear programming with the kernel trick to find value–function approximators for reinforcement learning. One formulation is based on SVM regression; the second is based on the Bellman equation; and the third seeks only to ensure that good moves have an advantage over bad moves. All formulations attempt to minimize the number of support vectors while fitting the data. The advantage of the kernel methods is that they can easily adjust the complexity of the function approximator to fit the complexity of the value function. The thesis also proposes a model-based policy gradient reinforcement learning algorithm. In our approach, we learn the models P(s′\|s, a) and R(s′\|s, a), and then use dynamic programming to compute the value of the policy directly from the model. Unlike online sampling-based policy gradient algorithms, it does not suffer from high variances, and it also converges faster. In summary, the thesis purposed model-based approximation algorithms for both value function based and policy gradient reinforcement learning, with promising application results on multiple problem domains and job-shop scheduling benchmarks.
License	All rights reserved
Resource Type	Dissertation
Date Available	2006-07-24T15:35:59+00:00
Date Issued	2006-05-08
Degree Level	Doctoral
Degree Name	Doctor of Philosophy (Ph.D.)
Degree Field	Computer Science
Degree Grantor	Oregon State University
Commencement Year	2007
Advisor	Dietterich, Thomas G.
Committee Member	Tadepalli, Prasad Quinn, Michael Burnett, Margaret Burkes, David
Academic Affiliation	Electrical Engineering and Computer Science
Non-Academic Affiliation	Oregon State University. Graduate School
Subject	Reinforcement learning (Machine learning) -- Mathematical models
Rights Statement	In Copyright
Publisher	Oregon State University
Peer Reviewed	No
Language	English [eng]
File Format	application/pdf
File Extent	1052885 bytes
Replaces	http://hdl.handle.net/1957/2581

Relationships

Parents:

This work has no parents.

In Collection:

Graduate Theses and Dissertations (GTD)

Items

Thumbnail	Title	Date Uploaded	Visibility	Actions
	thesis.pdf	2017-08-07	Public	Download

ScholarsArchive@OSU

Model-based approximation methods for reinforcement learning

Downloadable Content

Descriptions

Relationships

Items