Primary Submission Category: Federated Learning
Federated Targeted Learning
Authors: Rachael Phillips, Mark van der Laan, Maya Petersen,
Presenting Author: Rachael Phillips*
In many industries, including government, health care, and social media, data reside in the form of isolated islands, with limited capacity for sharing between different organizations. Policies that prevent sensitive data from crossing established boundaries may consider individual-level data to be fundamentally different to aggregated data, so that information deemed non-identifying may be shared across institutions. Federated learning (FL) is primed for learning across many sites whose data is subject to such restrictions. It is a statistical estimation paradigm that aims to use aggregate-level information to collaboratively estimate a pooled parameter, without transferring the individual-level data to a central location. In this work, we contribute to the rapid growing field of FL by connecting it with statistical theory for semiparametric efficient estimation and causal inference. In particular, we introduce a framework for federated super learning (SL) and federated targeted minimum loss-based estimation (TMLE). The class of available federated machine learning algorithms, including federated maximum likelihood estimation for parametric models, provide a powerful library of candidates in the federated SL. We show that federated TMLE can attain similar performance as the centralized TMLE that is not subject such restrictions. Our results motivate the use of flexible federated estimators that are able to adapt to underlying similarity across sites and other factors.