Primary Submission Category: Generalizability/Transportability
Invariant Risk Minimization for Large Language Models
Authors: Marko Veljanovski, Zach Wood-Doughty,
Presenting Author: Marko Veljanovski*
Invariant Risk Minimization (IRM) is a leading approach for out-of-distribution (OOD) generalization, framing optimization as finding a data representation such that the optimal classifier on top matches for every environment. While IRM has been extensively tested with image data, text-based datasets remain underexplored, despite OOD generalization being crucial for LLM performance. Accordingly, we aim to evaluate the effectiveness of directly applying IRM to text-based datasets, exploring optimization adjustments to enhance IRM’s compatibility with LLM predictors. In particular, we adapt a data-generating process (DGP) by Wood-Doughty et al. (2021), originally created for evaluating causal inference methods, to create synthetic text to thoroughly evaluate IRM against Empirical Risk Minimization (ERM). Our DGP utilizes two parameters: tau, controlling the ordering correlation, and delta, controlling the ordering preference strength within a modified Zipfian distribution. Correspondingly, we compare IRM and ERM over varying values of tau and delta, revealing specific environments in which the otherwise optimal invariant predictor fails to achieve strong performance.
Zach Wood-Doughty, Ilya Shpitser, and Mark Dredze. 2021. Generating synthetic text data to evaluate causal inference methods. arXiv preprint arXiv:2102.05638.