Primary Submission Category: Machine Learning and Causal Inference
Adapting Predictive Models to Distribution Shifts with Causal Structure and Rich Data
Authors: Alexander D’Amour, Ibrahim Alabdulmohsin, Nicole Chiou, Arthur Gretton, Sanmi Koyejo, Matt Kusner, Stephen Pfohl, Olawale Salaudeen, Jessica Schrouff, Katherine Tsai, Qingyao Sun, Sayna Ebrahimi, Kevin Murphy,
Presenting Author: Alexander D’Amour*
Transportability is a central challenge for applying predictive machine learning in the real world: we often need a model to make optimal predictions in populations that are distinct from its training population. This is called the domain adaptation problem. While several domain adaptation strategies currently exist (including some that mirror standard confounder adjustment), many real-world distribution shifts are too complex for these methods to handle. In this work, we describe new domain adaptation strategies that adapt to changes in (1) so-called spurious correlations, and (2) distributions of unobserved confounders. We highlight how this problem mirrors, and generalizes, causal identification. In both cases, the key idea is to train models that incorporate richer data at training time than will be available when the model is deployed; at prediction time, these submodels can be plugged into adjustment formulas that identify the optimal target predictor. Causal structure plays a key role in the derivations of these adjustment formulas. We demonstrate how these methods can be applied to modern machine learning pipelines, using examples of distribution shifts in Chest X-ray and text data.