Primary Submission Category: Instrumental Variables
Two-stage least squares with clustered data
Authors: Anqi Zhao, Peng Ding, Fan Li,
Presenting Author: Anqi Zhao*
Clustered data are common in empirical research. To estimate the causal effect of a possibly endogenous treatment, a common approach—which we call the canonical two-stage least squares (2sls)—is to fit a 2sls regression of the outcome on treatment status with instrumental variables (IVs) for point estimation, and apply cluster-robust standard errors in inference. When both the treatment and IVs have variation within clusters, a natural alternative—which we call the two-stage fixed effects (2sfe)—is to include cluster indicators in the 2sls specification, thereby incorporating cluster information in point estimation as well. This paper clarifies the trade-off between the canonical 2sls and 2sfe within the local average treatment effect (LATE) framework, and makes the following contributions. First, we establish the validity and relative efficiency of the canonical 2sls and 2sfe for large-sample Wald-type inference of the LATE when clusters are homogeneous. We show that, when the true outcome model includes a cluster fixed effect, 2sfe is more efficient than the canonical 2sls when the variation in cluster fixed effects dominates that in unit-level errors. Second, we show that with heterogeneous clusters, 2sfe recovers a weighted average of cluster-specific LATEs, whereas the canonical 2sls does not. Third, we develop a joint asymptotic analysis of the canonical 2sls and 2sfe under homogeneous clusters and propose a Wald-type test for detecting cluster heterogeneity.
