- Dynamic bipedal locomotion is among the most difficult and yet relevant problems in modern robotics. While a multitude of classical control methods for bipedal locomotion exist, they are often brittle or limited in capability. In recent years, work in applying reinforcement learning to robotics has lead to superior performance across a range of tasks, including bipedal locomotion. However, crafting a real-world controller using reinforcement learning is a difficult task, and the majority of successful efforts use simple feedforward neural networks to learn a control policy. Though these successful efforts have demonstrated learned control of a Cassie robot, the wide range of behaviors that can be learned by a memory-based architecture, which include walking at varying step frequencies, sidestepping, and walking backwards, (all in the same learned controller) have not yet been demonstrated. In keeping with recent work which has shown that memory-based architectures have resulted in better performance on a variety of reinforcement learning problems, I demonstrate some advantages of using a sophisticated, memory-based neural network architecture paired with state of the art reinforcement learning algorithms to achieve highly robust control policies on Agility Robotics' bipedal robot, Cassie. I also visualize the internal learned behavior of the memory using principal component analysis, and show that the architecture learns highly cyclic behaviors.
I show that dynamics randomization is a key tool in training robust memory-based neural networks, and that these networks can sometimes fail to transfer to hardware if not trained with dynamics randomization. I also demonstrate that various parameters of the dynamics, such as ground friction or ground slope angle, can be reconstructed by examining the internal memory of a recurrent neural network. This opens the door to automatic disturbance observers or online system-ID, which would be of significant benefit to problems which require hand-written disturbance observers requiring manual tuning.