A study of model-based average reward reinforcement learning

Ok, DoKyeong

Graduate Thesis Or Dissertation

A study of model-based average reward reinforcement learning

Public Deposited

Télécharger le fichier PDF

Citeable URL: https://ir.library.oregonstate.edu/concern/graduate_thesis_or_dissertations/n296x195z

Descriptions

Attribute Name	Values
Creator	Ok, DoKyeong
Abstract	Reinforcement Learning (RL) is the study of learning agents that improve their performance from rewards and punishments. Most reinforcement learning methods optimize the discounted total reward received by an agent, while, in many domains, the natural criterion is to optimize the average reward per time step. In this thesis, we introduce a model-based average reward reinforcement learning method called "H-learning" and show that it performs better than other average reward and discounted RL methods in the domain of scheduling a simulated Automatic Guided Vehicle (AGV). We also introduce a version of H-learning which automatically explores the unexplored parts of the state space, while always choosing an apparently best action with respect to the current value function. We show that this "Auto-exploratory H-Learning" performs much better than the original H-learning under many previously studied exploration strategies. To scale H-learning to large state spaces, we extend it to learn action models and reward functions in the form of Bayesian networks, and approximate its value function using local linear regression. We show that both of these extensions are very effective in significantly reducing the space requirement of H-learning, and in making it converge much faster in the AGV scheduling task. Further, Auto-exploratory H-learning synergistically combines with Bayesian network model learning and value function approximation by local linear regression, yielding a highly effective average reward RL algorithm. We believe that the algorithms presented here have the potential to scale to large applications in the context of average reward optimization.
Resource Type	Dissertation
Date Available	2012-10-25T19:10:44+00:00
Date Issued	1996-05-09
Degree Level	Doctoral
Degree Name	Doctor of Philosophy (Ph.D.)
Degree Field	Computer Science
Degree Grantor	Oregon State University
Commencement Year	1996
Advisor	Tadepalli, Prasad
Committee Member	Minoura, Toshimi Bose, Bella Saletore, Vikram Robson, Robert
Academic Affiliation	Computer Science
Non-Academic Affiliation	Oregon State University. Graduate School
Subject	Reinforcement learning (Machine learning)
Déclaration de droits	Copyright Not Evaluated
Publisher	Oregon State University
Peer Reviewed	No
Language	English [eng]
Digitization Specifications	File scanned at 300 ppi (Monochrome, 8-bit Grayscale) using ScandAll PRO 1.8.1 on a Fi-6670 in PDF format. CVista PdfCompressor 4.0 was used for pdf compression and textual OCR.
Replaces	http://hdl.handle.net/1957/34698

Des relations

Parents:

This work has no parents.

Dans Collection:

Graduate Theses and Dissertations (GTD)

Articles

La vignette	Titre	Date de téléchargement	Visibilité	actes
	OkDoKyeong1996.pdf	2017-08-09	Public	Télécharger

Hyrax

A study of model-based average reward reinforcement learning

Contenu téléchargeable

Descriptions

Des relations

Articles