Simon Du (University of Washington)

Sep 10, 2021

Title and Abstract

Toward Horizon-Free Reinforcement Learning

Reinforcement learning and contextual bandits are two widely studied sequential decision-making problems. Reinforcement learning generalizes contextual bandits and is often perceived to be more difficult due to the long planning horizon and unknown state-dependent transitions. I will talk about recent results on (nearly) horizon-free sample complexity bounds for reinforcement learning. I will present the main techniques for episodic tabular MDP and discuss extensions to related problems. These results imply that the long planning horizon and the unknown state-dependent transitions (at most) pose little additional difficulty on sample complexity.

Bio

Simon S. Du is an assistant professor in the Paul G. Allen School of Computer Science & Engineering at University of Washington. His research interests are broadly in machine learning such as deep learning, representation learning and reinforcement learning. Prior to starting as faculty, he was a postdoc at Institute for Advanced Study of Princeton. He completed his Ph.D. in Machine Learning at Carnegie Mellon University. Previously, he studied EECS and EMS at UC Berkeley. He has also spent time at Simons Institute and research labs of Facebook, Google and Microsoft.