Skip to content

Abstract Search

Primary Submission Category: Machine Learning and Causal Inference

Performance of Cross-Validated Targeted Maximum Likelihood Estimation

Authors: Matthew Smith, Camille Maringe, Miguel Angel Luque Fernandez,

Presenting Author: Matthew Smith*

Background: Estimating causal relationships in public health is often of interest. Evidence shows that targeted maximum likelihood estimation (TMLE) often performs better than other estimators. However, TMLE suffers from variance underestimation due to overfitting the outcome model in the absence of the Donsker class condition. In such cases, cross-validated TMLE (CV-TMLE) is considered a suitable alternative to prevent overfitting and enhance variance estimation and could be beneficial in cases of near-positivity violations.

Methods: Using simulations, we compared different CV-TMLE strategies involving outcome model cross-validation or both outcome and exposure model cross-validation. We updated the user-friendly ‘eltmle’ Stata package with options for CV-TMLE, choice for the number of folds, retention of nuisance variables, and reporting of covariate balance tables.

Results and conclusion: Our results show CV-TMLE as a valid, and preferable, alternative to TMLE for variance estimation in the presence of near-positivity violations or data sparsity. Finally, we illustrate the benefits of CV-TMLE using an example from cancer epidemiology.