How should reinforcement learning (RL) agents explain themselves to humans not trained
in AI? To gain insights into this question, we conducted a 124 participant, four-treatment
experiment to compare participants’ mental models of an RL agent in the context of a
simple Real-Time Strategy (RTS) game.
The four treatments isolated two types of explanations vs. neither vs. both together.
The two types of explanations were: (1) saliency maps (an “Input Intelligibility Type”
that explains the AI’s focus of attention), and (2) reward-decomposition bars (an “Output
Intelligibility Type” that explains the AI’s predictions of future types of rewards).
Our results show that a combined explanation that included saliency and reward bars
was needed to achieve a statistically significant difference in participants’ mental model
scores over the no-explanation treatment. However, this combined explanation was far
from a panacea: it exacted disproportionately high cognitive loads from the participants
who received the combined explanation. Further, in some situations, participants who
saw both explanations predicted the agent’s next action worse than all other treatments’