Skip to content

Abstract Search

Primary Submission Category: Heterogeneous Treatment Effects

Detecting Treatment Effect Disparities at Scale

Authors: Winston Chou, Nathan Kallus, Danielle Rich, William Nelson,

Presenting Author: Winston Chou*

Experimentation and causal inference increasingly drive innovation on digital platforms. Often, causal analyses focus on Average Treatment Effects (ATEs), which summarize the impact of a new product innovation across the user population. Yet, there is a growing recognition that such averages do not capture the full picture: ATEs can be dragged up or driven down by small user segments with a disproportionate reaction to the innovation, a positive ATE does not imply that a majority of members benefit from the treatment, and a null or statistically insignificant estimate of the ATE is not inconsistent with polarizing effects that lift metrics for some members and depress them for others.

In this paper, we describe a methodology, integrated into Netflix’s scaled experimentation platform, for estimating the range of Conditional Average Treatment Effects (CATEs) in an experiment. The bounds of this range correspond substantively to the treatment effects on the best- and worst- affected user segments in the experiment. We term the difference between these bounds the treatment effect disparity. In surfacing this disparity, our method identifies when product innovations have distinct and even polarizing effects on users and highlights opportunities to make product wins more equally distributed.