Primary Submission Category: Machine Learning and Causal Inference
Causal Inference with High-Dimensional Unstructured Treatments
Authors: Kevin Christian Wibisono, Yixin Wang,
Presenting Author: Kevin Christian Wibisono*
Causal inference with high-dimensional treatments, such as texts, images, or medical treatment sequences, poses unique challenges: standard causal estimands like the average treatment effect (ATE) are often ill-defined due to overlap violations. Existing approaches typically assume that the treatment of interest is known a priori through pre-defined attributes such as topic or sentiment. In contrast, we propose a data-driven framework that learns the treatment itself. Specifically, we introduce the maximally influential feature (MIF), a latent binary treatment that maximizes the causal effect on the outcome while satisfying overlap. To ensure interventions are meaningful, we decompose each treatment into immutable content and mutable style components, intervening only on the latter. We establish theoretical identifiability of the learned causal estimand, propose a flexible estimator, and introduce a treatment budget that enables the discovery of multiple causal dimensions. Our approach further allows us to nudge or modify treatments in the direction of increased MIF, providing a principled way to causally improve the outcomes. Finally, we demonstrate the effectiveness of our framework across text, image, and treatment sequence applications.
