Skip to content

Abstract Search

Primary Submission Category: Generalizability/Transportability

Data Fusion for Prospective and Retrospective Studies

Authors: Ellen Graham, Andrea Rotnitzky, Marco Carone,

Presenting Author: Ellen Graham*

Previous work on data fusion has primarily focused on estimating parameters by leveraging data sources that align with variation-independent factors of the target population likelihood. In contrast, in this work, we introduce a general framework for debiased machine learning on smooth parameters by fusing a pair of data sources that align with variation-dependent components of the likelihood. Specifically, we consider the problem of data fusion when the distribution of the outcome given covariates (but not the covariate distribution) can be learned from a prospective cohort study and the distribution of the covariates given the outcome (but not the outcome distribution) can instead be learned from a retrospective case-control study. Our procedure allows for the identification of estimands that cannot be identified from either a prospective or retrospective study alone. We demonstrate how the dependence between these conditional distributions restricts the joint model, allowing for a reduction in the semiparametric efficiency bound. We characterize when estimators that achieve these bounds exist and provide a means to construct them. Finally, we provide examples of our proposed procedure for estimands of practical importance such as the average treatment effect and the average treatment effect on the treated.