Two heuristics for solving POMDPs having a delayed need to observe Public Deposited

http://ir.library.oregonstate.edu/concern/technical_reports/sf2686345

Descriptions

Attribute NameValues
Creator
Abstract or Summary
  • A common heuristic for solving Partially Observable Markov Decision Problems POMDPs is to first solve the underlying Markov Decision Process MDP and then construct a POMDP policy by performing a fixed depth lookahead search in the POMDP and evaluating the leaf nodes using the MDP value function. A problem with this approximation is that it does not account for the need to choose actions in order to gain information about the state of the world particularly when those observation actions are needed at some point in the future. This paper proposes two heuristics that are better than the MDP approximation in POMDPs where there is a delayed need to observe. The first approximation introduced in [2] is the even-odd POMDP in which the world is assumed to be fully observable every other time step. The even-odd POMDP can be converted into an equivalent MDP the even-MDP whose value function captures some of the sensing costs of the original POMDP. An online policy consisting in a step lookahead search combined with the value function of the even-MDP gives an approximation to the POMDPs value function that is at least as good as the method based on the value function of the underlying MDP. The second POMDP approximation is applicable to a special kind of POMDP which we call the Cost Observable Markov Decision Problem COMDP. In a COMDP the actions are partitioned into those that change the state of the world and those that are pure observation actions. For such problems we describe the chain-MDP algorithm which in many cases is able to capture more of the sensing costs than the even-odd POMDP approximation. We prove that both heuristics compute value functions that are upper bounded by (i.e., better than) the value function of the underlying MDP and in the case of the even-MDP also lower bounded by the POMDPs optimal value function. We show cases where the chain-MDP online policy is better equal or worse than the even-MDP online policy.
Resource Type
Date Available
Date Issued
Series
Keyword
Subject
Rights Statement
Publisher
Peer Reviewed
Language
Replaces
Additional Information
  • description.provenance : Submitted by Laura Wilson (laura.wilson@oregonstate.edu) on 2012-06-04T17:34:19Z No. of bitstreams: 1 Two heuristics for solving POMDPs having a delayed need to observe.pdf: 231975 bytes, checksum: b7cfc0e0459205c52cc94b71a99ee715 (MD5)
  • description.provenance : Made available in DSpace on 2012-06-04T17:35:24Z (GMT). No. of bitstreams: 1 Two heuristics for solving POMDPs having a delayed need to observe.pdf: 231975 bytes, checksum: b7cfc0e0459205c52cc94b71a99ee715 (MD5) Previous issue date: 2004
  • description.provenance : Approved for entry into archive by Laura Wilson(laura.wilson@oregonstate.edu) on 2012-06-04T17:35:24Z (GMT) No. of bitstreams: 1 Two heuristics for solving POMDPs having a delayed need to observe.pdf: 231975 bytes, checksum: b7cfc0e0459205c52cc94b71a99ee715 (MD5)

Relationships

Parents:

This work has no parents.

Last modified

Downloadable Content

Download PDF

Items