H-learning : a reinforcement learning method to optimize undiscounted average reward

Technical Report

Public Deposited

Citeable URL: https://ir.library.oregonstate.edu/concern/technical_reports/2j62s6254

Descriptions

Attribute Name	Values
Creator	Oregon State University. Dept. of Computer Science Tadepalli, Prasad Ok, DoKyeong
Abstract	In this paper, we introduce a model-based reinforcement learning method called H-learning, which optimizes undiscounted average reward. We compare it with three other reinforcement learning methods in the domain of scheduling Automatic Guided Vehicles, transportation robots used in modern manufacturing plants and facilities. The four methods differ along two dimensions. They are either model-based or model-free, and optimize discounted total reward or undiscounted average reward. Our experimental results indicate that H-learning is more robust with respect to changes in the domain parameters, and in many cases, converges in fewer steps to better average reward per time step than all the other methods. An added advantage is that unlike the other methods it does not have any parameters to tune.
Resource Type	Research Paper
Date Available	2012-04-17T23:01:34+00:00
Date Issued	1994-05-12
Series	Technical report (Oregon State University. Department of Computer Science)
Subject	Reinforcement learning
Déclaration de droits	Copyright Not Evaluated
Funding Statement (additional comments about funding)	This research was supported by the National Science Foundation under grant number IRI:9111231.
Publisher	Corvallis, OR : Oregon State University, Dept. of Computer Science
Peer Reviewed	No
Language	English [eng]
Replaces	http://hdl.handle.net/1957/28790

La vignette	Titre	Date de téléchargement	Visibilité	actes
	H_learning_a_reinforcement_learning_method_for_optimizing_undiscounted_average_reward.pdf	2017-07-18	Public	Télécharger