Primary Submission Category: Design of Experiments
Flexible inference with split samples via data turnover
Authors: William Bekerman, Dylan Small,
Presenting Author: William Bekerman*
We introduce data turnover, a general framework enabling a single group of statisticians and domain experts to assess the strength of evidence gathered across multiple data splits, effectively integrating both qualitative and quantitative findings from data exploration. Data turnover can accommodate a wide range of statistical tasks, including inference and estimation, while ensuring the validity of certain data-driven decisions and providing a straightforward approach to evaluating replicability. As a motivating example, we study the effects of growing up with a father with an alcohol use disorder on later life outcomes. Data turnover allows us to augment our analysis with exploratory insights while leveraging the full dataset for confirmatory testing, avoiding stringent adjustments for post-selection inference that can erode power. We also apply our new technique to evaluate variable importance in a clinical prediction model of mortality in premature babies. We prove the theoretical validity of our procedure and examine its power in extensive simulation studies.
