Primary Submission Category: Generalizability/Transportability
Efficient Estimation of Causal Effects Under Two-Phase Sampling with Error-Prone Outcome and Treatment Measurements
Authors: Keith Barnatchez, Kevin Josey, Nima Hejazi, Bryan Shepherd, Giovanni Giovanni, Rachel Nethery,
Presenting Author: Keith Barnatchez*
In causal inference studies using electronic health record (EHR) data, clinical outcomes and treatments are commonly recorded with significant error. In practice, researchers can often validate error-prone measurements for a small, randomly selected subset of the full EHR dataset — a special case of two-phase sampling, where easily measured variables are collected for all subjects, and expensive-to-measure variables for a random subset. To improve efficiency, researchers frequently implement biased sampling designs, where validation probabilities depend on patients’ initial error-prone measurements. In this work, we address the specific challenge of causal inference with error-prone outcome and treatment measurements under biased validation sampling designs, and the broader problem of causal inference under two-phase sampling. We highlight two asymptotically equivalent approaches to constructing nonparametric doubly-robust estimators of counterfactual means under general two-phase sampling designs. We argue these approaches can yield estimators with meaningfully different behavior in finite samples. For our specific measurement error problem, we construct novel doubly-robust estimators through each approach, and propose modifications to improve one approach’s finite sample efficiency. Through simulation studies and data from the Vanderbilt Comprehensive Care Clinic, we demonstrate the efficiency gains our proposed methods can provide over current leading methods.