Skip to content

Abstract Search

Primary Submission Category: missing data

Zero Inflation as a Missing Data Problem: a Proxy-based Approach

Authors: Trung Phung, Jaron J.R. Lee, Opeyemi Oladapo-Shittu, Eili Y. Klein, Ayse P. Gurses, Susan M. Hannum, Kimberly Weems, Jill Marsteller, Sara E. Cosgrove, Sara C. Keller, Ilya Shpitser,

Presenting Author: Trung Phung*

Zero-inflated data has values incorrectly recorded as zeros due to data recording conventions (rare outcomes assumed to be absent) or details of data recording equipment (artificial zeros in genomic data).
Common statistical models for zero-inflated data are parametric and generally assume at most missing-at-random. On the other hand, graphical missing data models are nonparametric and may handle missing-not-at-random, yet they require censored values to be marked by a special symbol like “?”, while “0” denotes both true and missing value in zero-inflated data.
This paper views zero-inflated data as a harder type of missing data, where a missingness indicator is unobserved whenever a zero is recorded. We show that in most cases, target parameters involving a zero-inflated variable are nonparametrically unidentified. However, if a proxy of the censoring indicator is observed, a modification of the Kuroki and Pearl’s effect restoration allows identification and estimation, given the proxy-indicator relationship is known.
If this relationship is unknown, our approach yields a partial identification for sensitivity analysis. Specifically, only certain proxy-indicator conditionals are compatible with the observed data distribution. We give an analytical bound for binary cases, while for more complex cases, Duarte et (2023)’s numerical bound should be computed.
We illustrate our method via simulation studies and a data application on Central Line-Associated Bloodstream Infections.