Primary Submission Category: Applications in Health and Biology
The Self-Masking Model for Imputing Missing Electronic Health Record Data
Authors: Yidan Zhang, Eric Slud, Razieh Nabi, Daniel Scharfstein,
Presenting Author: Yidan Zhang*
In the early hours of an emergency department encounter, laboratory tests are selectively ordered for patients who are suspected of having abnormal findings. As a result, laboratory measurements recorded in the electronic health record (EHR) are often subject to informative missingness, since the absence of a test may itself convey clinical information. This missing not at random (MNAR) process poses substantial challenges for downstream analyses that require complete laboratory profiles. To address this problem, we focus on binary laboratory variablescoded as normal/abnormal and develop a missing data imputation scheme under the followingassumptions: (1) the probability a laboratory value is missing depends only on the underlying(possibly unobserved) value of that variable, and (2) the joint distribution of laboratory resultsarises from a latent multivariate probit model, which captures dependence across laboratoriesthrough correlated latent Gaussian variables. We estimate the model parameters using an EM algorithm applied to a pairwise composite likelihood and then impute the missing laboratory values via MCMC sampling with adaptive tuning. We illustrate the proposed method using EHR data from a cohort of patients with suspected sepsis presenting to emergency departments within the Intermountain Healthcare system.
