Skip to content

Abstract Search

Primary Submission Category: Propensity Scores

Reporting practices and guidelines for machine learning to estimate propensity scores

Authors: Walter Leite, Huibin Zhang, Zachary Collier, Kamal Chawala, Lingchen Kong, YongSeok Lee, Jia Quan, Olushola Soyoye,

Presenting Author: Walter Leite*

Non-parametric machine learning (ML) methods, such as generalized boosting and random forests, have been extensively used in propensity score analysis (PSA). Their key advantage over parametric methods, such as logistic regression, consists of automatically detecting complex relationships between a larger number of covariates and the treatment assignment mechanism. Also, ML can prevent multicollinearity problems that arise from using strongly correlated sets of covariates. However, detailed guidelines on reporting the use of ML for propensity score estimation in academic papers do not exist. This study developed guidelines for ML reporting in PSA based on a systematic review of over 150 peer-reviewed papers, dissertations, and theses published from 1983 to 2023 across social sciences, health sciences, and education. The guidelines are aligned with best practices in open science and research reproducibility and are organized by the following six steps of PSA: 1) data preparation, 2) propensity score estimation, 3) propensity score method implementation, 4) covariate balance evaluation, 5) treatment effect estimation, and 6) sensitivity analysis. The systematic review shows that few published papers provide enough details about their use of ML for PSA to allow replication of analyses.