Skip to content

Abstract Search

Primary Submission Category: Machine Learning and Causal Inference

Gaussian Processes for Social Scientists: A powerful tool for addressing model-dependency and uncertainty

Authors: Soonhong Cho, Doeun Kim, Chad Hazlett,

Presenting Author: Soonhong Cho*

The Gaussian Process (GP) is a highly flexible but easy-to-understand tool for non-linear regression with rigorous handling of uncertainty estimation.  Unlike a conventional parametric model, it accounts for uncertainty over model choice on the predicted values as we move away from the data, making it ideal for inference where poor overlap/model-dependency is an issue. GPs nevertheless remain underutilized in social science perhaps because (i) few resources for social scientists have explained them accessibly, and (ii) many existing software implementations of GPs are ill-suited to social science applications and require setting numerous hyperparameters. We begin by offering a simple but rigorous explanation for GPs rooted in a natural assumption—that observations that are closer in $X$ will be closer in $Y$. We also provide a new implementation that improves interpretability and performance while avoiding most user-chosen hyperparameters. Next, while GPs are demonstrably not the best tool for all purposes, we describe and illustrate their advantages in contexts of high model-dependency/extrapolation, showing their more appropriate confidence intervals for conditional estimates (or at the limit, pointwise estimates). We also illustrate their performance by simulation and empirical studies in (i) imputation-based treatment effect estimation where parametric models perform poorly, and (ii) regression discontinuity designs (RDD) under different causal assumptions.