Ahmad Beirami (Google DeepMind)

Feb 27, 2026
Soda Hall 510

Title and Abstract

Theoretical Guarantees for Best-of-N Sampling

Language model post-training aims to maximize a reward (e.g., coding capability) with minimal perturbation to the reference model. A standard baseline is best-of-N sampling, which selects the highest-scoring output from N reference model responses. Best-of-N remains a robust post-training baseline against complex reinforcement learning (RL) methods that directly maximize expected reward. This talk establishes theoretical reasoning behind the strong performance of best-of-N sampling.

Bio

Ahmad Beirami is building a new company in agentic AI. Previously, he led research on post-training language models at Google DeepMind. His work on model alignment advanced the Pareto frontier of Gemini model capabilities and earned an Outstanding Paper Award at ICLR 2025.