Skip to content

Abstract Search

Primary Submission Category: Bayesian Causal Inference

Estimating the Returns from an Experimentation Program

Authors: Simon Ejdemyr, Martin Tingley, Yian Shang, Travis Brooks,

Presenting Author: Simon Ejdemyr*

We describe the development, validation and implementation of a Bayesian hierarchical model used by Netflix for estimating the returns of innovation areas that leverage A/B testing. The model provides a trusted source for estimates of the cumulative returns from experiment launches, and integration into Netflix’s flagship experimentation UI has facilitated a better understanding of how different testing areas improve business outcomes at different rates. This understanding can help company leaders prioritize the most promising innovation programs, or pivot innovation strategies in areas that show diminishing returns. In fact, surfacing these views at Netflix has already streamlined previously time-consuming annual review processes and goal tracking.

The primary statistical challenge the model addresses is overestimation of cumulative treatment effects, due to selection on winners. This is accomplished via hierarchical shrinkage. The model’s first level facilitates information borrowing across tests within the same testing area, while the second level models the distribution of within-test effects. The model imposes weak parametric structure on the assumed distribution of the true treatment effects, allowing for Gaussian or fat-tailed structures. We show that this model validates well against holdback tests (large retests of launched treatments): across two critical testing areas at Netflix, the model removed upward bias, consistent with the holdbacks.