This thesis considers the problem in which a teacher is interested in teaching action policies to computer agents for sequential decision making. The vast majority of policy
learning algorithms o er teachers little flexibility in how policies are taught. In particular,
one of two learning modes is typically considered: 1)...