Wenlong Mou (University of Toronto)

April 10, 2026
Soda Hall 510

Title and Abstract

What structures make model-free RL possible? An elliptic theory for controlled Markov diffusions

Can reinforcement learning with function approximation ever be as easy as supervised learning? In general, the answer is no — the Bellman operator contracts only in the sup-norm, not in the L^2-norm induced by the data distribution. This geometric mismatch makes model-free value learning with function approximation provably harder than regression. However, real-world problems often come with additional structures that may facilitate reinforcement learning.

In this talk, I will discuss recent advances in understanding the structures that enable model-free RL. Focusing on controlled Markov diffusions—a widely used class of dynamical systems—I will provide an affirmative answer to the question above.

Specifically, I will identify ellipticity as a key structure that makes model-free RL with function approximation tractable. Leveraging ellipticity, I will demonstrate desirable geometric properties of Bellman operators in an appropriate Sobolev space. Based on these insights, I will introduce a new class of algorithms for model-free RL with function approximation that achieve near-optimal oracle inequalities efficiently. Finally, I will discuss an application to fine-tuning diffusion-based generative models, where the ellipticity structure is exploited to design a PDE-based algorithm that attains fast convergence rates.

Bio

Wenlong Mou is an Assistant Professor in the Department of Statistical Sciences at the University of Toronto. He recently obtained his Ph.D. from the Department of EECS at UC Berkeley, where he was advised by Prof. Martin Wainwright and Prof. Peter Bartlett. Prior to Berkeley, he received his B.S. in Computer Science from Peking University in 2017, where he worked with Prof. Liwei Wang.