Primary Submission Category: Machine Learning and Causal Inference
Tradeoffs in Using Surrogate Variables for Decision Making with Delayed Outcomes
Authors: Steve Yadlowsky, Alexander D’Amour, Avi Feller,
Presenting Author: Steve Yadlowsky*
In many decision making problems, temporal delays in observing the outcome variable can hinder the ability of an agent to adjust their policy based on new information. To overcome this challenge, practitioners often rely on surrogate variables that they believe are related to the outcome, but are observed more quickly. However, this approach only works if the surrogates satisfy certain causal assumptions about their joint relationship with the agent’s actions and the outcome. In this work, we investigate the tradeoffs in using surrogates to update policies in a multi-armed bandit feedback problem, where the goal is to minimize cumulative regret, which requires learning both quickly and accurately. We parameterize the degree to which the surrogates violate the aforementioned assumptions and the length of the temporal delay. Across this parameterization, we compare two broad strategies: updating based on possibly invalid surrogates with only a short delay, and updated based on the true outcomes observed after a long delay. We characterize the range of parameters under which using the surrogates can still be beneficial in minimizing cumulative regret. Our findings provide guidance for practitioners on when and how to use surrogate variables in decision making problems with long delays in observing the outcome variable.