This post explores the properties of Best-of-N sampling under mis-specified reward models.
1 min read · April 16, 2025 · Notion Blog
2025