Lin Gui

I am a final-year PhD student in the Department of Statistics at the University of Chicago. I am fortunately advised by Professor Victor Veitch and Professor Jingshu Wang. Previously, I received my master's degree in statistics at the University of Chicago. Prior to UChicago, I was an undergraduate student at University of Science and Technology of China (USTC).

I am broadly interested in utilizing statistical methodologies and AI tools to solve real-world problems. In particular, I currently focus on: (1) developing post-training techniques for large generative models to enhance their task-specific capabilities and alignment with human preferences, and (2) designing reliable statistical inference methods for scientific discovery, with emphasis on multiple testing, replicability, and causal inference.

Research Interests

Artificial Intelligence

Developing post-training techniques for large generative models

Statistics

Designing reliable statistical inference methods, including selective inference and causal inference

Publications

Aggregating Dependent Signals with Heavy-Tailed Combination Tests

Lin Gui, Yuchao Jiang, Jingshu Wang

Biometrika, 2025

We introduce the heavy-tailed combination test encompassing the widely used Cauchy combination test and harmonic mean p-value as special cases. Under asymptotic independence, we establish its validity but show that it is equivalent to the Bonferroni test in this theoretical setting. In contrast, under stronger dependence, empirical results suggest that the heavy-tailed combination test can substantially outperform Bonferroni.

BoNBoN Alignment for Large Language Models and the Sweetness of Best-of-n Sampling

Lin Gui, Cristina Gârbacea, and Victor Veitch

Advances in Neural Information Processing Systems (NeurIPS), 2024

We prove that the underlying sampling distribution of the best-of-n sampling is essentially the optimal model (distribution) for post-training alignment of large language models. It strikes a balance between enhancing output quality, such as helpfulness and harmlessness, and preserving closeness to the base model. Based on this theoretical finding, we propose a fine-tuning method, BoNBoN, to explicitly approximate the best-of-𝑛 distribution.

Concept Algebra for (Score-Based) Text-Controlled Generative Models

Zihao Wang, Lin Gui, Jeffrey Negrea, and Victor Veitch

Advances in Neural Information Processing Systems (NeurIPS), 2023

We link the score representations of the text-controlled diffusion models to real-world concepts, enabling controlled concept shifts in generated images through direct manipulation of these score representations.

Causal Estimation for Text Data with (Apparent) Overlap Violations

Lin Gui, and Victor Veitch

The Eleventh International Conference on Learning Representations (ICLR), 2023

We formulated a formal causal estimand tailored to the causal inference of the text-attribute question, and verified its identifiability under minimal conditions. We provided a computationally efficient estimation of the uncertainty quantification of this causal estimand, supported by theoretical assurances.

Detecting Multiple Replicating Signals Using Adaptive Filtering Procedures

Jingshu Wang, Lin Gui, Weijie J. Su, and Chiara Sabatti, Art B. Owen

The Annals of Statistics, 2022

We introduced an innovative multiple testing procedure that enhances detection power by adaptively filtering out unlikely candidates of PC nulls, and theoretically established the control of both Family-Wise Error Rate (FWER) and False Discovery Rate (FDR) for this method.

Select Preprints

Let it Calm: Exploratory Annealed Decoding for Verifiable Reinforcement Learning

Chenghao Yang, Lin Gui, Chenxiao Yang, Victor Veitch, Lizhu Zhang, Zhuokai Zhao

preprint

We develop a temperature annealing decoding strategy: start with a higher-than-standard temperature to encourage exploration and gradually reduce to a lower temperature to ensure the quality of the final output. This simple strategy allows effective exploration for reinforcement fine tuning for LLMs.

Chasing the Tail: Effective Rubric-based Reward Modeling for Large Language Model Post-Training

Junkai Zhang, Zihao Wang, Lin Gui, Swarnashree Mysore Sathyendra, Jaehwan Jeong, Victor Veitch, Wei Wang, Yunzhong He, Bing Liu, Lifeng Jin

preprint

We theoretically point out the reward over-optimization stems from the inacuracy at the high-reward tail. Accordingly, we investigate whether rubric-based rewards can help mitigate the issue by: (1) leveraging high-quality off-policy responses, and (2) designing with a focus on differentiating among great and diverse responses.

Validity and Power of Heavy-Tailed Combination Tests under Asymptotic Dependence

Lin Gui, Tiantian Mao, Jingshu Wang, Ruodu Wang

preprint

We establish the asymptotic validity of the heavy-tailed combination test under a broad class of dependence structures, including both asymptotic independence and dependence. We further show that its power advantage over the Bonferroni's test is pronounced under asymptotic dependence, with the gain increasing as the level of dependence strengthens.

On the Significance of Softmax Geometry: Interpretability and Token Decoding

Yibo Jiang, Lin Gui, Sean M. Richardson, Mark Muchane, Yo Joong Choe, Victor Veitch

Submitted

Recent findings show that learned embeddings of large language models do not lie in a Euclidean space, questioning the default use of cosine similarity. We adopt a more geometry-aware similarity metric and investigate it with two tasks: (1) learning interpretable features via sparse autoencoders, and (2) efficiently retrieving top-𝑘 likely next tokens.

Statistical Inference for Cell Type Deconvolution

Dongyue Xie, Lin Gui, Jingshu Wang

preprint

We study the cell type deconvolution problem, which seeks to estimate the proportions of different cell types in a mixed sample from gene expression data. We identify key conditions required for the identifiability of these proportions. To address this problem, we propose MEAD, a comprehensive statistical framework for both estimating and conducting inference on cell type proportions. MEAD further enables the comparison of deconvolved proportions across individuals, accounting for both gene–gene correlations and biological variability.

Software

EAD

An implementation of exploratory annealed decoding (EAD) for reinforcement learning fine-tuning of large language models.

View on Github

Rubric construction

Rubric design and construction for rubric-based reward modeling of large language models.

View on Github

BoNBoN

A contrastive method for post-training alignment of large language models.

View on Github

TI-estimator

Causal inference for text data with apparent overlap violations.

View on Github

heavytailcombtest

This is a R package for heavy-tailed combination tests.

View on Github