Haipeng Chen (William & Mary)

May 1, 2026
Soda Hall 510

Title and Abstract

Steering LLMs with Light-Weight Auxiliary Policy Models using Reinforcement Learning

Reinforcement Learning (RL) has become a cornerstone in advancing the capability and alignment of large language models (LLMs). Existing works, such as Reinforcement Learning from Human Feedback (RLHF) and Reinforcement Learning with Verifiable Reward (RLVR) focus mostly on fine-tuning the underlying LLMs, which introduce inductive bias toward the LLMs and are costly in both training and inference. This talk highlights our recent works on RL for training separate, light-weight, auxiliary policy models (APMs) that interact with or guide LLMs. These APMs learn to evaluate, interpret, or modulate LLM outputs, providing adaptive feedback loops that extend model performance in downstream LLM tasks.

Bio

Haipeng Chen is an assistant professor and graduate program director of the Data Science Department at William & Mary. He directs the W&M Data-Driven Decision Intelligence (D3i) Lab and serves as the vice chair of the AI Blue Ribbon Collaborative at Center for Telehealth and eHealth Law (CTeL). Previously, he was a postdoc at Harvard University and DartmouthCollege. He obtained his Ph.D. from Nanyang Technological University and B.S. in Physics from the University of Science and Technology of China. His primary research interest lies in Use-Inspired AI. For AI techniques, he focuses on reinforcement learning, generative AI, and optimization. For application domains, he is interested in health, environment, and physical sciences. His research has been recognized with the best paper nomination at AAMAS-2021, Innovation Demonstration Award runner-up at IJCAI-2019, and Champion of the 2017 Microsoft Malmo Collaborative AI Challenge. He has published in premier AIdata science conferences such as ICLR, NeurIPS, AAAI, IJCAI, KDD, and journals (e.g., IEEEACM Transactions, Transportation Research)