Autonomous multiagent teams can be used in complex exploration tasks to both expedite the exploration and improve the efficiency. However, use of multiagent systems presents additional challenges. Specifically, in domains where the agents' actions are tightly coupled, coordinating multiple agents to achieve cooperative behavior at the group level is difficult. In this work, we demonstrate that reward shaping can greatly benefit learning in tightly coupled multiagent exploration tasks. We argue that in tightly coupled domains, effective coordination depends on rewarding stepping stone actions, actions that would improve system's objective but are not rewarded because other agents have not yet found their proper actions. To this end, we build upon the current work in multiagent structural credit assignment literature and we extend the idea of counterfactuals introduced in difference evaluation functions.
Difference evaluation functions have a number of properties that make them ideal as learning signal, such as sensitivity to agent's actions and alignment with the global system objective. However, they fail to tackle the coordination problem in domains where the agent coupling is tight. Extending the idea of counterfactuals, we propose a novel reward structure, D₊₊. We investigate the performance of the D₊₊ in two different multiagent domains. We show that while both global team performance and the difference evaluation function fail to properly reward the stepping stone actions, our proposed algorithm successfully rewards such behaviors and provides superior performance (166% performance improvement and a quadruple convergence speed up) compared to policies learned using either the global reward or the difference reward.
description.provenance : Submitted by Aida Rahmattalabi (firstname.lastname@example.org) on 2016-08-05T14:57:20ZNo. of bitstreams: 1RahmattalabiAida2017.pdf: 878022 bytes, checksum: 9e456927ec0cbac0b13f40bca46b1a5f (MD5)
description.provenance : Made available in DSpace on 2016-08-10T23:16:38Z (GMT). No. of bitstreams: 1RahmattalabiAida2017.pdf: 878022 bytes, checksum: 9e456927ec0cbac0b13f40bca46b1a5f (MD5) Previous issue date: 2016-08-02
description.provenance : Approved for entry into archive by Julie Kurtz(email@example.com) on 2016-08-09T21:28:39Z (GMT) No. of bitstreams: 1RahmattalabiAida2017.pdf: 878022 bytes, checksum: 9e456927ec0cbac0b13f40bca46b1a5f (MD5)
description.provenance : Approved for entry into archive by Laura Wilson(firstname.lastname@example.org) on 2016-08-10T23:16:38Z (GMT) No. of bitstreams: 1RahmattalabiAida2017.pdf: 878022 bytes, checksum: 9e456927ec0cbac0b13f40bca46b1a5f (MD5)