Primary Submission Category: Generalizability/Transportability
Robust and causal-oriented prediction models from heterogeneous data
Authors: Armeen Taeb, Xinwei Shen, Peter Buehlmann,
Presenting Author: Xinwei Shen*
Despite being challenging for iid-based statistical learning, heterogeneous data provides opportunities for causal inference and for learning prediction models that generalize to unseen environments. Indeed, existing methods such as anchor regression exploit heterogeneous data arising from mean shifts to the features and the response variable of interest to learn distributionally robust predictions models. In many real-world settings, apart from mean shifts, the perturbations may also be affecting the variances of the relevant variables. Previous techniques however are not able to handle this richer perturbation class. We propose Distributionally Robust predictions via Invariant Gradients (DRIG), a method that leverages perturbations in the form of both mean and variance shifts for robust predictions. In a linear setting, we prove that DRIG produces prediction models that are robust against perturbations in strictly (and often much) more ‘directions’ than those protected by anchor regression, highlighting the additional gains from exploiting heterogeneity beyond mean shifts. Viewing causality as an extreme case of distributional robustness, we investigate the causal identifiability of DRIG under various scenarios. Moreover, we extend DRIG to the semi-supervised domain adaptation setting where a few labeled samples from the target domain are available and are used to further improve robustness. Finally, we illustrate the utility of our methods through numerical experiments.