This work is inspired by problems in natural resource management centered on the challenge of invasive species. Computing optimal management policies for maintaining ecosystem sustainable is challenging. Many ecosystem management problems can be formulated as MDP (Markov Decision Process) planning problems. In a simulator-defined MDP, the Markovian dynamics and rewards...
Monte Carlo tree search (MCTS) is a class of online planning algorithms for Markov decision processes (MDPs) and related models that has found success in challenging applications. In the online planning approach, the agent makes a decision in the current state by performing a limited forward search over possible futures...
Markov Decision Processes (MDPs) are the de-facto formalism for studying sequential decision making problems with uncertainty, ranging from classical problems such as inventory control and path planning, to more complex problems such as reservoir control under rainfall uncertainty and emergency response optimization for fire and medical emergencies. Most prior research...
Writing a program that performs well in a complex environment is a challenging task. In such problems, a method of deterministic programming combined with reinforcement learning (RL) can be helpful. However, current systems either force developers to encode knowledge in very specific forms (e.g., state-action features), or assume advanced RL...
We investigate the data collection problem in sensor networks. The network consists of a number of stationary sensors deployed at different sites for sensing and storing data locally. A mobile element moves from sites to sites to collect data from the sensors periodically. There are different costs associated with the...
Gibbs sampling method is an important tool used in parameter estimation for many probabilistic models. Specifically, for many scenarios, it is difficult to generate high-dimensional data samples from its joint distribution. The Gibbs sampling provides a way to draw high-dimensional data via the conditional distributions which are typically easier to...
We consider the problem of wireless spectrum management in cognitive wireless networks that maximizes the revenue for a spectrum operator. Specifically, we study the problem on how a wireless spectrum operator can optimally allocate its limited spectrum to various classes users/devices who pay differently for their spectrum per unit time....
For continuous time, finite state and action, Markov decision
chains, optimal policies are studied; (i) a procedure for transforming
the terminal reward vector is given and it is established that this
transformation does not alter optimal policies, (ii) decision chains
with absorbing states are studied and the results obtained are...
A composite system of a Markovian Decision Process
with multiple criteria involves both discounting criteria
and nondiscounting criteria. The measurement of the
discounting criteria is chosen to be the total sum of
the payoffs over the time horizon and the measurement of
the non-discounting criteria is chosen to be the...
While the stability of time-homogeneous Markov chains have been extensively studied through the concept of mixing times, the stability of time-inhomogeneous Markov chains has not been studied as in depth. In this manuscript we will introduce special types of time-inhomogeneous Markov chains that are defined through an adiabatic transition. After...