Siva Theja Maguluri (Georgia Tech)

Feb 11, 2020

Title and Abstract

Finite Sample Convergence Bounds of Off-Policy Reinforcement Learning Algorithms

The focus of our work is to obtain finite-sample andor finite-time convergence bounds of various model-free Reinforcement Learning (RL) algorithms. Many RL algorithms are special cases of Stochastic Approximation (SA), which is a popular approach for solving fixed point equations when the information is corrupted by noise. We first obtain finite-sample bounds for general SA using a generalized Moreau envelope as a smooth potential Lyapunov function. We then use this result to establish the first-known convergence rate of the V-trace algorithm for off-policy TD-Learning. We also use this result to improve the existing bound for the tabular Q-Learning algorithm from polynomial in state-space dimension to log of the dimension. We also use Lyapunov drift arguments to provide finite time error bounds of Q-learning algorithm with linear function approximation under an assumption on the sampling policy. This talk is based on the following papers: https:arxiv.orgabs2002.00874 and https:arxiv.orgabs1905.11425


Siva Theja Maguluri is an Assistant Professor in the School of Industrial and Systems Engineering at Georgia Tech. Before that, he was a Research Staff Member in the Mathematical Sciences Department at IBM T. J. Watson Research Center. He obtained his Ph.D. and MS in ECE as well as MS in Applied Math from UIUC, and B.Tech in Electrical Engineering from IIT Madras. His research interests are broadly in Applied Probability, Optimization and Reinforcement Learning, and include Scheduling, Resource Allocation and Revenue Optimization in a variety of systems including Data Centers, Cloud Computing, Wireless Networks, Block Chains, Ride hailing systems etc. He is a co-recipient of the “Best Publication in Applied Probability” award, presented by INFORMS Applied probability society every two years