Primary Submission Category: Generalizability/Transportability
Beyond the Experiment Window: Prospective Impacts Under Long-Term Ranking Dynamics
Authors: Lei Shi, Lo-hua Yuan, Peng Ding, Navin Sivanandam,
Presenting Author: Lo-hua Yuan*
Short A/B tests for ranking systems can be myopic when seasonality, user evolution, and feedback loops drive outcomes beyond the experiment window. We target the prospective long-term average treatment effect (PLATE): the cumulative effect of sustaining a new ranker versus the incumbent over a future horizon for the experimental population. Estimating PLATE from short experiments requires adjusting for time-varying post-treatment covariates, imputing long-run outcomes when the new ranker is not fully represented in historical logs, and transporting information across experimental and observational data under covariate shift. We propose BSTAR (Blip Surrogate TrAnsfeR), combining structural nested mean model blips, surrogate-index identification that treats the displayed result set as a mediator, and causal transfer learning for generalization across data sources. Under sequential randomization, surrogacy, and transferability assumptions, BSTAR identifies PLATE and yields a practical estimation pipeline with bootstrap inference. Simulations calibrated to marketplace ranking experiments show reduced bias and MSE versus inverse-propensity weighting a nd surrogate-index baselines, enabling earlier and more reliable long-term impact estimates.
