Multiagent Learning via Dynamic Skill Selection

Sachdeva, Enna

Graduate Thesis Or Dissertation

Multiagent Learning via Dynamic Skill Selection

Public Deposited

Download PDF

Citeable URL: https://ir.library.oregonstate.edu/concern/graduate_thesis_or_dissertations/44558m99c

Descriptions

Attribute Name	Values
Creator	Sachdeva, Enna
Abstract	Multiagent coordination has many real-world applications such as self-driving cars, inventory management, search and rescue, package delivery, traﬃc management, warehouse management, and transportation. These tasks are generally character-ized by a global team objective that is often temporally sparse - realized only upon completing an episode. The sparsity of the shared team objective often makes it an inadequate learning signal to learn eﬀective strategies. Moreover, this reward signal does not capture the marginal contribution of each agent towards the global objective. This leads to the problem of structural credit assignment in multia-gent systems. Furthermore, due to a lack of accurate understanding of desired task behaviors, it is often challenging to manually design agent-speciﬁc rewards to improved coordination. While learning these undeﬁned local objectives is very critical for a successful coordination, it is extremely challenging due to these two core challenges. Firstly, due to interaction among agents in an environment, the complexity of the problem may rise exponentially with the number of agents, and their behavioral sophisti-cation. An agent perceives the environment as non-stationary, due to all learn-ing concurrently. This leads to an agent perceiving the coordination objective as extremely noisy. Secondly, the goal information required to learn coordination behavior is distributed among agents. This makes it diﬃcult for agents to learn undeﬁned desired behaviors that optimizes a team objective. The key contribution of this work is to address the credit assignment problem in multiagent coordination using several semantically meaningful local rewards. We argue that real-world multiagent coordination tasks can be decomposed into several meaningful skills. Further, we introduce MADyS, a framework that can optimize a global reward by learning to dynamically select the most optimal skill from semantically meaningful skills, characterized by their local rewards, without requiring any form of reward shaping. Here, each local reward describes a basic skill and is designed based on domain knowledge. MADyS combines gradient-based optimization to maximize dense local rewards and gradient-free optimization to maximize the sparse team-based reward. Each local reward is used to train a local policy learner using policy gradient (PG) - and an evolutionary algorithm (EA) that searches in a population of policies to maximize the global objective by picking the most optimal local reward at each time step of an episode. While these two processes occur concurrently, the experiences collected by the EA population are stored in a replay buﬀer and utilized by the PG based local rewards optimizer for better sample eﬃciency. Our experimental results show that MADyS outperforms several baselines. We also visualize the complex coordination behaviors by studying the temporal distri-bution shifts of the selected local rewards. By visualizing these shifts throughout an episode, we gain insight into how agents learn to (i) decompose a complex task into various sub-tasks, (ii) dynamically conﬁgure sub-teams, and (iii) assign the selected sub-tasks to the sub-teams to optimize as a team on the global objective.
License	All rights reserved
Resource Type	Masters Thesis
Date Issued	2020-12-04
Degree Level	Master's
Degree Name	Master of Science (M.S.)
Degree Field	Robotics
Degree Grantor	Oregon State University
Commencement Year	2021
Advisor	Tumer, Kagan
Committee Member	Turkan, Yelda Davidson, Joe Hollinger, Geoffrey
Rights Statement	In Copyright
Publisher	Oregon State University
Peer Reviewed	No
Language	English [eng]

Relationships

Parents:

This work has no parents.

In Collection:

Graduate Theses and Dissertations (GTD)

Items

Thumbnail	Title	Date Uploaded	Visibility	Actions
	SachdevaEnna2020.pdf	2020-12-14	Public	Download

ScholarsArchive@OSU

Multiagent Learning via Dynamic Skill Selection

Downloadable Content

Descriptions

Relationships

Items