Skip to content

Abstract Search

Primary Submission Category: Design-Based Causal Inference

Selective inference for data-driven subgroups based on biomarkers

Authors: Zijun Gao,

Presenting Author: Zijun Gao*

In randomized experiments with heterogeneous treatment effects, subgroup analysis provides significant benefits, such as personalized treatment recommendations, but poses challenges for inference when subgroups are learned from data. Motivated by the German Breast Cancer Study, where subgroups are defined using a biomarker threshold—a common practice in clinical trials—we develop a design-based inference procedure tailored to this type of subgroup selection. The validity of our method relies solely on knowledge of the randomization mechanism, requiring no assumptions about the underlying model, making it particularly suitable for complex datasets. Compared to sample-splitting based inference, our approach is deterministic and avoids the power loss associated with reduced sample size for inference. The computation of our method is often similar to a standard randomization test without selection and requires no intricate sampling procedures to approximate conditional distributions. Furthermore, when predefined biomarkers are unavailable, we extend the method by incorporating a data-driven biomarker while maintaining the desirable properties of the approach. We demonstrate the validity and efficiency of our methods through the analysis of the GBCS dataset and simulated data.