Ahmad Beirami (Google DeepMind)Feb 27, 2026 Title and AbstractTheoretical Guarantees for Best-of-N Sampling Language model post-training aims to maximize a reward (e.g., coding capability) with minimal perturbation to the reference model. A standard baseline is best-of-N sampling, which selects the highest-scoring output from N reference model responses. Best-of-N remains a robust post-training baseline against complex reinforcement learning (RL) methods that directly maximize expected reward. This talk establishes theoretical reasoning behind the strong performance of best-of-N sampling. BioAhmad Beirami is building a new company in agentic AI. Previously, he led research on post-training language models at Google DeepMind. His work on model alignment advanced the Pareto frontier of Gemini model capabilities and earned an Outstanding Paper Award at ICLR 2025. |