Skip to content

Abstract Search

Primary Submission Category: Missing Data

A Complete Multiple Imputation Algorithm for Missing Data Graphs

Authors: Trung Phung, Ilya Shpitser, Rohit Bhattacharya,

Presenting Author: Trung Phung*

Imputation is one of the most popular methods for analyzing data with missing values. However, the most widely used methods, such as Multiple Imputation with Chained Equations (MICE), operate under the Missing At Random (MAR) assumption, which may be incorrect in many real-world settings. Recently, much progress has been made in identification theory for Missing Not At Random (MNAR) models that can be represented graphically—missing data graphs provide an intuitive causal interpretation of missingness mechanisms and a concise representation of the statistical model. These results, however, have seen limited use in practice, in part due to the complexity of the identifying functionals for the propensity score and the existence of only a few bespoke estimation strategies. We remedy this issue by proposing a new imputation method that can be applied to any missing data graphical model whose full data law is identified. The algorithm is recursive—imputation for a data row uses all other rows whose missing variables are subsets of the current one, which is a direct consequence of the sound and complete identification theory for the full law. In contrast, MICE treats all rows as equal while performing Gibbs sampling due to its MAR assumption. We further show how computational and statistical efficiency of our method can be improved by employing graph sparsity. We evaluate our method against MICE, showing comparable results under MAR and superior, less biased results under MNAR.