Primary Submission Category: Machine Learning and Causal Inference
Cross-Validated Decision Trees with Targeted Maximum Likelihood Estimation for Nonparametric causal mixtures analysis
Authors: David McCoy, Alan Hubbard, Alejandro Schuler, Mark van der Laan,
Presenting Author: David McCoy*
People often encounter multiple simultaneous exposures (e.g. several drugs or pollutants). Policymakers are interested in setting safe limits, interdictions, or recommended dosage combinations based on a combination of thresholds, one per exposure. Setting these thresholds is difficult because all relevant interactions between exposures must be accounted for. Previous statistical methods have used parametric estimators which don’t directly address the question of superadditive or subadditive effects in a mixture and rely on unrealistic assumptions. Here we present an estimator that a) automatically identifies thresholds that maximize the differential effect of self-selecting exposure within the thresholded exposure region vs. outside of it; and which b) unbiasedly and efficiently estimates the magnitude of that differential effect. This is done by combining a tree-based search algorithm with a targeted maximum likelihood estimator using cross-validation. We provide open-source software (CVtreeMLE) that implements the method.