Primary Submission Category: Heterogeneous Treatment Effects
Evaluating Finite-Sample Properties of Machine Learning Approaches for Assessing Heterogeneity of Treatment Effect in Clinical Trials
Authors: Lisa Levoir, Bryan Blette, Andrew Spieker,
Presenting Author: Lisa Levoir*
Inferring heterogeneity of treatment effect is a popular secondary aim of clinical trials. Recently, many trial analyses have moved from traditional subgroup analyses to more modern assessments of heterogeneity using machine learning. While there are several such methods available to estimate conditional average treatment effects (CATEs) in clinical trials, these methods are often applied in trial settings that have lower sample sizes than were considered in the simulations of corresponding seminal methodological work, making the validity of inference in these settings unclear. To provide guidance to practitioners, we conducted a simulation study to evaluate the performance of different regression and machine learning estimators for the CATE, including ordinary least squares (OLS) and causal forests, in a variety of settings across a range of sample sizes. We evaluated 95% confidence interval (CI) coverage, bias, and variance under linear and non-linear data generating mechanisms (DGM) in the presence of 0 to 100 nuisance covariates and 0 to 16 effect modifying covariates. We found that while tree-based ensembles like causal forests can be quite flexible to linear or nonlinear settings, they can have meaningfully impaired coverage in many settings at sample sizes which constitute most trial applications. As expected, OLS has superior performance under linear DGMs but has poor performance under nonlinear DGMs. We conclude with recommendations for practitioners.