Lin Gui

Hi, I’m Lin Gui (桂林) 👋 A final-year PhD student at The University of Chicago. I work at the intersection of AI and statistics, and am passionate about changing theory into useful tools.

Publications & Preprints

Chasing the Tail: Effective Rubric-based Reward Modeling for Large Language Model Post-Training

Junkai Zhang, Zihao Wang, Lin Gui, Swarnashree Mysore Sathyendra, Jaehwan Jeong, Victor Veitch, Wei Wang, Yunzhong He, Bing Liu, Lifeng Jin

The Fourteenth International Conference on Learning Representations (ICLR), 2026

We theoretically point out the reward over-optimization stems from the inacuracy at the high-reward tail. Accordingly, we investigate whether rubric-based rewards can help mitigate the issue by: (1) leveraging high-quality off-policy responses, and (2) designing with a focus on differentiating among great and diverse responses.

Let it Calm: Exploratory Annealed Decoding for Verifiable Reinforcement Learning

Chenghao Yang, Lin Gui, Chenxiao Yang, Victor Veitch, Lizhu Zhang, Zhuokai Zhao

We develop a temperature annealing decoding strategy: start with a higher-than-standard temperature to encourage exploration and gradually reduce to a lower temperature to ensure the quality of the final output. This simple strategy allows effective exploration for reinforcement fine tuning for LLMs.

On the Significance of Softmax Geometry: Interpretability and Token Decoding

Yibo Jiang, Lin Gui, Sean M. Richardson, Mark Muchane, Yo Joong Choe, Victor Veitch

Recent findings show that learned embeddings of large language models do not lie in a Euclidean space, questioning the default use of cosine similarity. We adopt a more geometry-aware similarity metric and investigate it with two tasks: (1) learning interpretable features via sparse autoencoders, and (2) efficiently retrieving top-𝑘 likely next tokens.

BoNBoN Alignment for Large Language Models and the Sweetness of Best-of-n Sampling

Lin Gui, Cristina Gârbacea, and Victor Veitch

Advances in Neural Information Processing Systems (NeurIPS), 2024

We prove that the underlying sampling distribution of the best-of-n sampling is essentially the optimal model (distribution) for post-training alignment of large language models. It strikes a balance between enhancing output quality, such as helpfulness and harmlessness, and preserving closeness to the base model. Based on this theoretical finding, we propose a fine-tuning method, BoNBoN, to explicitly approximate the best-of-𝑛 distribution.

Concept Algebra for (Score-Based) Text-Controlled Generative Models

Zihao Wang, Lin Gui, Jeffrey Negrea, and Victor Veitch

Advances in Neural Information Processing Systems (NeurIPS), 2023

We link the score representations of the text-controlled diffusion models to real-world concepts, enabling controlled concept shifts in generated images through direct manipulation of these score representations.

Causal Estimation for Text Data with (Apparent) Overlap Violations

Lin Gui, and Victor Veitch

The Eleventh International Conference on Learning Representations (ICLR), 2023

We formulated a formal causal estimand tailored to the causal inference of the text-attribute question, and verified its identifiability under minimal conditions. We provided a computationally efficient estimation of the uncertainty quantification of this causal estimand, supported by theoretical assurances.

Validity and Power of Heavy-Tailed Combination Tests under Asymptotic Dependence

Lin Gui, Tiantian Mao, Jingshu Wang, Ruodu Wang

Major Revision at The Annals of Statistics

We establish the asymptotic validity of the heavy-tailed combination test under a broad class of dependence structures, including both asymptotic independence and dependence. We further show that its power advantage over the Bonferroni's test is pronounced under asymptotic dependence, with the gain increasing as the level of dependence strengthens.

Statistical Inference for Cell Type Deconvolution

Dongyue Xie, Lin Gui, Jingshu Wang

Minor Revision at Journal of the Royal Statistical Society, Series B

We study the cell type deconvolution problem, which seeks to estimate the proportions of different cell types in a mixed sample from gene expression data. We identify key conditions required for the identifiability of these proportions. To address this problem, we propose MEAD, a comprehensive statistical framework for both estimating and conducting inference on cell type proportions. MEAD further enables the comparison of deconvolved proportions across individuals, accounting for both gene–gene correlations and biological variability.

Aggregating Dependent Signals with Heavy-Tailed Combination Tests

Lin Gui, Yuchao Jiang, Jingshu Wang

Biometrika, 2025

We introduce the heavy-tailed combination test encompassing the widely used Cauchy combination test and harmonic mean p-value as special cases. Under asymptotic independence, we establish its validity but show that it is equivalent to the Bonferroni test in this theoretical setting. In contrast, under stronger dependence, empirical results suggest that the heavy-tailed combination test can substantially outperform Bonferroni.

Detecting Multiple Replicating Signals Using Adaptive Filtering Procedures

Jingshu Wang, Lin Gui, Weijie J. Su, and Chiara Sabatti, Art B. Owen

The Annals of Statistics, 2022

We introduced an innovative multiple testing procedure that enhances detection power by adaptively filtering out unlikely candidates of PC nulls, and theoretically established the control of both Family-Wise Error Rate (FWER) and False Discovery Rate (FDR) for this method.

Software

EAD

An implementation of exploratory annealed decoding (EAD) for reinforcement learning fine-tuning of large language models.

View on Github

Rubric construction

Rubric design and construction for rubric-based reward modeling of large language models.

View on Github

BoNBoN

A contrastive method for post-training alignment of large language models.

View on Github

TI-estimator

Causal inference for text data with apparent overlap violations.

View on Github

heavytailcombtest

This is a R package for heavy-tailed combination tests.

View on Github