Skip to content

Abstract Search

Primary Submission Category: Machine Learning and Causal Inference

Debiasing in missing data models with inaccurate estimates of outcome and missingness parameters

Authors: Michael Celentano,

Presenting Author: Michael Celentano*

We consider the problem of (i) estimating linear model coefficients with data missing at random (MAR) and (ii) average treatment effect estimation with linear outcome models under strong ignorability. We study these problems in a high-dimensional regime in which the number of confounders $p$ is proportional to the sample size $n$ and the outcome and propensity/missingness models cannot be estimated consistently. A series of recent works (Jiang et al., 2022; Yadlowsky 2022) studied the behavior of the classical IPW, AIPW, and TMLE estimators in this regime and revealed several departures from the predictions of the classical theory, including, for example, a variance inflation of the AIPW estimator. Their analyses, however, are restricted to cases in which $n > p$, which allows for unbiased estimation of the outcome model. In this paper, we study instead the case that $n < p$ and regularization is used in estimating the outcome model and propensity/missingness models. In this case, the classical estimators of linear coefficients and average treatment effects fail to be unbiased or even consistent. We propose a debiased estimator that is provably consistent and provide confidence intervals for the estimated linear coefficients. As with classical AIPW estimator, our proposed estimator requires estimation of both the outcome and propensity/missingness models, but we combine these estimates in a non-standard way.