- Controllers for robotic systems can be complex and difficult to write by hand. Learning offers an approach to improve a controller through direct feedback. Learning is not trivial, as the feedback does not tell the agent how to improve, only how well its current actions solve the given task. Learning in sparsely rewarded domains increases the difficulty of learning as the agent receives less feedback to learn from. This is compounded in multiagent domains which require complex coordination to complete the global objective. Despite the fact that dense rewards are typically easier to learn from, they are not always easy to define; many problems are inherently sparsely rewarded. This work presents an algorithm for learning complex coordination in sparsely rewarded multi-agent domains. The algorithm is split into two steps. The first step involves learning a set of skill, with the second step learn when to use each skill. Experimental evidence is presented, showing the effectiveness of the presented algorithm in a modified version of the rover domain. The algorithm also seeks to provide more explainable learned policies than traditional black-box learners.
- Key Words: Reinforcement Learning, Multiagent, Multi-Reward, Sparse Reward