Primary Submission Category: Machine Learning and Causal Inference
Doubly robust conformal prediction for missing data
Authors: Manit Paul, Arun Kumar Kuchibhotla, Eric J. Tchetgen Tchetgen,
Presenting Author: Manit Paul*
Conformal Prediction (CP) has seen growing attention in recent years, providing new tools for tackling missing data problems. However most of these applications of CP lack robustness as they remain largely disconnected from modern semi-parametric efficiency theory. In this paper we consider the general problem of obtaining distribution-free valid prediction regions for the outcome based on a coarsened version of the complete data. We do this by deriving the efficient influence function of the quantile of the outcome under a given semi-parametric model and then performing a conformal risk control procedure. We employ modern non-parametric methods (random forests etc.) to learn the underlying nuisance functions of the semi-parametric model. This general theory has several consequences — (i) Covariate-shift problem: provides the required coverage guarantee (without any O(root-n) slack) if at-least one of the nuisance functions (propensity score and conditional distribution of the outcome) is estimated exactly — an improvement over the earlier work by Yang et al. [2022] (ii) Monotone missingness: provides multiply robust prediction set for the outcome under the Missing at Random (MAR) assumption — this to our knowledge is one of the first results of this kind. Our theory also enables the construction of robust prediction regions for non-monotone missing data under MAR assumption. We further illustrate the performance of our methods through various simulation studies.