Hi, I’m Lin Gui (桂林) 👋 A final-year PhD student at The University of Chicago. I work at the intersection of AI and statistics, and am passionate about changing theory into useful tools.
Publications & Preprints
Chasing the Tail: Effective Rubric-based Reward Modeling for Large Language Model Post-Training
Junkai Zhang, Zihao Wang, Lin Gui, Swarnashree Mysore Sathyendra, Jaehwan Jeong, Victor Veitch, Wei Wang, Yunzhong He, Bing Liu, Lifeng Jin
The Fourteenth International Conference on Learning Representations (ICLR), 2026
We theoretically point out the reward over-optimization stems from the inacuracy at the high-reward tail. Accordingly, we investigate whether rubric-based rewards can help mitigate the issue by: (1) leveraging high-quality off-policy responses, and (2) designing with a focus on differentiating among great and diverse responses.
Let it Calm: Exploratory Annealed Decoding for Verifiable Reinforcement Learning
Chenghao Yang, Lin Gui, Chenxiao Yang, Victor Veitch, Lizhu Zhang, Zhuokai Zhao
We develop a temperature annealing decoding strategy: start with a higher-than-standard temperature to encourage exploration and gradually reduce to a lower temperature to ensure the quality of the final output. This simple strategy allows effective exploration for reinforcement fine tuning for LLMs.
On the Significance of Softmax Geometry: Interpretability and Token Decoding
Yibo Jiang, Lin Gui, Sean M. Richardson, Mark Muchane, Yo Joong Choe, Victor Veitch
Recent findings show that learned embeddings of large language models do not lie in a Euclidean space, questioning the default use of cosine similarity. We adopt a more geometry-aware similarity metric and investigate it with two tasks: (1) learning interpretable features via sparse autoencoders, and (2) efficiently retrieving top-𝑘 likely next tokens.
BoNBoN Alignment for Large Language Models and the Sweetness of Best-of-n Sampling
Lin Gui, Cristina Gârbacea, and Victor Veitch
Advances in Neural Information Processing Systems (NeurIPS), 2024
We prove that the underlying sampling distribution of the best-of-n sampling is essentially the optimal model (distribution) for post-training alignment of large language models. It strikes a balance between enhancing output quality, such as helpfulness and harmlessness, and preserving closeness to the base model. Based on this theoretical finding, we propose a fine-tuning method, BoNBoN, to explicitly approximate the best-of-𝑛 distribution.
Concept Algebra for (Score-Based) Text-Controlled Generative Models
Zihao Wang, Lin Gui, Jeffrey Negrea, and Victor Veitch
Advances in Neural Information Processing Systems (NeurIPS), 2023
We link the score representations of the text-controlled diffusion models to real-world concepts, enabling controlled concept shifts in generated images through direct manipulation of these score representations.
Causal Estimation for Text Data with (Apparent) Overlap Violations
Lin Gui, and Victor Veitch
The Eleventh International Conference on Learning Representations (ICLR), 2023
We formulated a formal causal estimand tailored to the causal inference of the text-attribute question, and verified its identifiability under minimal conditions. We provided a computationally efficient estimation of the uncertainty quantification of this causal estimand, supported by theoretical assurances.
Validity and Power of Heavy-Tailed Combination Tests under Asymptotic Dependence
Lin Gui, Tiantian Mao, Jingshu Wang, Ruodu Wang
Major Revision at The Annals of Statistics
We establish the asymptotic validity of the heavy-tailed combination test under a broad class of dependence structures, including both asymptotic independence and dependence. We further show that its power advantage over the Bonferroni's test is pronounced under asymptotic dependence, with the gain increasing as the level of dependence strengthens.
Statistical Inference for Cell Type Deconvolution
Dongyue Xie, Lin Gui, Jingshu Wang
Minor Revision at Journal of the Royal Statistical Society, Series B
We study the cell type deconvolution problem, which seeks to estimate the proportions of different cell types in a mixed sample from gene expression data. We identify key conditions required for the identifiability of these proportions. To address this problem, we propose MEAD, a comprehensive statistical framework for both estimating and conducting inference on cell type proportions. MEAD further enables the comparison of deconvolved proportions across individuals, accounting for both gene–gene correlations and biological variability.
Aggregating Dependent Signals with Heavy-Tailed Combination Tests
Lin Gui, Yuchao Jiang, Jingshu Wang
Biometrika, 2025
We introduce the heavy-tailed combination test encompassing the widely used Cauchy combination test and harmonic mean p-value as special cases. Under asymptotic independence, we establish its validity but show that it is equivalent to the Bonferroni test in this theoretical setting. In contrast, under stronger dependence, empirical results suggest that the heavy-tailed combination test can substantially outperform Bonferroni.
Detecting Multiple Replicating Signals Using Adaptive Filtering Procedures
Jingshu Wang, Lin Gui, Weijie J. Su, and Chiara Sabatti, Art B. Owen
The Annals of Statistics, 2022
We introduced an innovative multiple testing procedure that enhances detection power by adaptively filtering out unlikely candidates of PC nulls, and theoretically established the control of both Family-Wise Error Rate (FWER) and False Discovery Rate (FDR) for this method.
Software
EAD
An implementation of exploratory annealed decoding (EAD) for reinforcement learning fine-tuning of large language models.
View on GithubRubric construction
Rubric design and construction for rubric-based reward modeling of large language models.
View on Github