Primary Submission Category: Design of Experiments
Your LLM Is Too Big for Causal Inference with Text
Authors: Srikar Katta, Graham Tierney, Chris Bail, Sunshine Hillygus, Alexander Volfovsky,
Presenting Author: Srikar Katta*
Many modern social science questions ask how linguistic properties causally affect human behavior. Because text properties are often interlinked (e.g., angry reviews often use profane language), analysts must control for latent confounding to isolate causal effects. Recent literature proposes adaptations of large language models, transformers, and other deep learning techniques to learn latent representations in text that successfully predict both treatment status and the outcome. However, because the treatment is encoded in the text, these deep learning methods run the risk of learning representations that actually encode the treatment itself, inducing an overlap bias. Rather than depending on post-hoc adjustments of the text, we introduce a new experimental design that allows scientists to handle latent confounding, avoid the overlap issue, and unbiasedly estimate treatment effects. We run an experiment that showcases this experimental design to study how expressing humility in political statements affects readers’ beliefs. We leverage the ground truth information in our collected experimental data to demonstrate the failures in current language model approaches for causal inference in text. We then return to our implemented study and discover novel relationships between expressing humility and the perceived persuasiveness of political statements, offering important insights for social media platforms and social scientists.