Skip to content

Abstract Search

Primary Submission Category: Machine Learning and Causal Inference

Causal Inference on Outcomes Learned from Text

Authors: Amar Venugopal, Iman Modarressi, Jann Spiess,

Presenting Author: Amar Venugopal*

We propose a machine-learning tool that yields causal inference on text in randomized trials.
Based on a simple econometric framework in which text may capture outcomes of interest, our
procedure addresses three questions: First, is the text affected by the treatment? Second,
which outcomes is the effect on? And third, how complete is our description of causal effects?
To answer all three questions, our approach uses large language models that suggest systematic
differences across documents that are reflective of the effect of the intervention and then provides
valid inference based on costly validation. Specifically, we highlight the need for sample splitting
to allow for statistical validation of LLM outputs, as well as the need for human labelling to
validate substantive claims about which outcomes effects are on. We illustrate the tool in a
proof-of-concept application using abstracts of academic manuscripts.