A large number of sequential decision-making problems in uncertain environments
can be modeled as Markov Decision Processes (MDPs). In such settings, an agent
can observe at each time step the state of the environment and then executes an
action, causing a stochastic transition to a new state of the environment...