Primary Submission Category: Sequential Testing and Anytime-Valid Inference
Anytime-Valid F-Tests for Faster Sequential Experimentation Through Covariate Adjustment
Authors: Michael Lindon, Dae Woong Ham, Iavor Bojinov, Martin Tingley,
Presenting Author: Michael Lindon*
Multivariate linear regression models are commonly used to perform inference about average treatment effects. The experimentation platform at Netflix relies heavily on such models. We demonstrate that performing sequential “anytime-valid” inference is no harder than classical fixed-n inference. The confidence sequences and sequential p-values we provide depend on the same set of statistics as classical confidence intervals and p-values, that is, we provide drop-in replacements which generalize guarantees to hold uniformly across time. This enables Netflix to perform sequential covariate-adjusted A/B tests, enabling peeking and optional stopping through the time-uniform nature of the guarantees, and achieving tighter confidence sequences and faster stopping times through variance reduction. Formally, we introduce sequential F-tests and confidence sequences for subsets of coefficients of a linear model. In addition to treatment effect estimation, we present applications concerning sequential tests of treatment effect heterogeneity and model selection. Our approach is based on an invariant mixture martingale, which exploits group invariance properties of the linear model to provide time-uniform Type I error coverage guarantees regardless nuisance parameters. Our test statistic is based on a group invariant Bayes factor obtained from using a right-Haar prior over nuisance parameters, which bridges frequentist and Bayesian paradigms.