Primary Submission Category: Generalizability/Transportability
Further Results On Selected Data Models For Data Fusion: Identification And Estimation
Authors: Jaron Lee, AmirEmad Ghassami, Ilya Shpitser,
Presenting Author: Jaron Lee*
Recently, there has been significant interest in graphical models for causal data fusion, whereby graphical causal inference is conducted from a collection of interventional and observational data sources. Despite recent advances, there continues to be a significant gap in extending this framework to real-world problems due to a lack of efficient estimation methodology for graphical data fusion models.
Previously, we proposed a new graphical model called the labelled conditional acyclic directed mixed graph (L-CADMG) suited for reasoning and working with such problems, which generalized existing data fusion diagrams by explicitly introducing a selection variable that indicates the selection mechanism among the different domains, and allows for cases that selection does not happen completely at random.
Building on this work, we offer two improvements.
Our first main contribution is proposing an efficient estimation methodology:
We define a model through a parameterization property, using the Mobius function which relates a set of parameters to a probability distribution. We also define a model through a factorization property, which relates the L-CADMG and its reachable subgraphs (appropriately defined) to a collection of probability distributions.
We then prove equivalence of the two models based on factorization and parameterization by showing that these two sets of distributions are equal, a result which will lead directly to a well specified likelihood (at least for discre