Primary Submission Category: Machine Learning and Causal Inference
A Comparison of Missing Imputation Method for Covariates in Propensity Score Analysis Using Random Forests
Authors: Yongseok Lee, Walter Leite,
Presenting Author: Yongseok Lee*
Propensity Score Analysis (PSA) is a prominent method to alleviate selection bias in observational studies, but missing data in covariates is prevalent and must be dealt during propensity score estimation. Through Monte Carlo simulations, this study evaluates the use of imputation methods based on multiple random forests (RF) algorithms to handle missing data in covariates: MICE-RF (CALIBER), Proximity Imputation (PI), and missForest. The results indicated that PI and missForest outperformed other methods with respect to bias of average treatment effect (ATE) regardless of sample size and missing mechanisms. A demonstration of these five methods with PSA to evaluate the effect of participation in center-based care on children’s reading ability is provided using data from the Early Childhood Longitudinal Study (ECLS-K: 2011).