Abstract:
Monte-Carlo planning algorithms such as UCT make decisions at each step by
intelligently expanding a single search tree given the available time and then
selecting the best root action. Recent work has provided evidence that it can be
advantageous to instead construct an ensemble of search trees and make a
decision according to a weighted vote. However, these prior investigations have
only considered the application domains of Go and Solitaire and were limited in
the scope of ensemble configurations considered. In this paper, we conduct a
large scale empirical study of ensemble Monte-Carlo planning using the UCT
algorithm in a set of five additional diverse and challenging domains. In
particular, we evaluate the advantages of a broad set of ensemble configurations
in terms of space and time efficiency in both parallel and sequential time models.
Our results show that ensembles are an effective way to improve performance
given a parallel model, can significantly reduce space requirements and in some
cases may improve performance in a sequential model. Additionally, from our
work we produced an open-source planning library.