Primary Submission Category: Machine Learning and Causal Inference
Leveraging Large Language Models to Improve Precision in Randomized Controlled Trials
Authors: Jaylin Lowe, Adam Sales, Johann Gagnon-Bartsch,
Presenting Author: Jaylin Lowe*
Large language models (LLMs) are increasingly used in statistical research and applications. However, they are also notorious for unreliable or biased information. Here, we explore whether LLMs can be used to improve the precision of randomized controlled trials (RCTs) in a safe and rigorous way. Following similar work on leveraging observational data, we incorporate LLM predictions in an RCT analysis. While this method of improving precision is not new, the value of using LLM predictions in this manner is an open question. We discuss how useful LLM predictions are and how different datasets and prompts impact their usefulness.
LLM predictions add little value when the RCT already includes highly predictive covariates. However, if few such covariates exist or the data is well-suited for LLMs—like text—LLM predictions become more beneficial. Familiar, easy-to-predict outcome variables also help. Our basic approach asks the LLM to predict outcomes for each observation, but this often produces overly similar results. Instead, we ask the LLM to compare pairs of observations and predict which will have a higher outcome. We use the selection frequency as a covariate. We can also extract additional covariates from the LLM, such as writing quality or creativity in text-based RCTs. We combine all covariates to generate a final prediction for each observation, achieving greater precision than either the single prediction or standard covariate adjustment without the LLM predictions.
