Primary Submission Category: Design of Experiments
Logging Policy Design for Efficient Off-Policy Evaluation
Authors: Connor Douglas, Joel Persson, Foster Provost,
Presenting Author: Connor Douglas*
Off-policy evaluation (OPE) estimates the value of a candidate “target” policy, such as a recommender system, using data logged by a different “logging” policy, enabling safe assessment without deploying changes live. While prior work emphasizes estimator guarantees under strong assumptions, in practice, the logging policy is a first-order driver of OPE quality. We demonstrate this and study how to optimally design logging policies to efficiently evaluate target policies. We characterize a reward–coverage tradeoff in choosing which actions to log and provide a sufficient condition for when an item should enter the logging support. We introduce a unifying framework that characterizes logging design settings by (i) what is known about the target policy and (ii) what is known about reward distributions. Within this space, we derive optimal logging policies in extremes where target policies and rewards are fully known or fully unknown, and show that OPE can actually improve policy value estimates compared to on-policy estimates. We extend to intermediate cases with probabilistic knowledge over target policies and noisy reward estimates, yielding optimal designs under each information regime. Our results provide actionable guidance for firms that must collect data to compare multiple candidate recommendation policies when the optimal logging policy may be infeasible.
Â
