Skip to content

Abstract Search

Primary Submission Category: Machine Learning and Causal Inference

Isolated Causal Effects of Natural Language

Authors: Victoria Lin, Louis-Philippe Morency, Eli Ben-Michael,

Presenting Author: Victoria Lin*

Recent advances in natural language processing have dramatically increased the availability of language data and models for common users. As language technologies become widespread, it is important to understand how changes in language affect reader perceptions and behaviors. For instance, as machine-generated text—including undesirable text like fake news and propaganda—proliferates in public spaces, we may wish to know whether misinformation propagated in these texts has impacts on readers’ behaviors.

In this work, we formalize these impacts as the *isolated causal effect* of some *focal* language-encoded intervention on an external outcome. We show that a core challenge of estimating isolated effects is the need to approximate all *non-focal* language outside of the intervention. To address this challenge, we introduce a formal estimation framework for isolated causal effects of language and explore how different approximations of non-focal language influence effect estimates. Drawing on the principle of *omitted variable bias*, we present metrics for evaluating the quality of isolated effect estimation and non-focal language approximation along the axes of *fidelity* and *overlap*. In experiments on semi-synthetic and real-world data, we validate the ability of our framework to recover ground truth isolated effects, and we demonstrate the utility of our proposed metrics as measures of quality for both isolated effect estimates and non-focal language approximations.