Primary Submission Category: Applications in Health and Biology
Identifying Misreporting Rates in the Absence of Ground Truth Data
Authors: Dylan Zapzalka, Muskaan Mittal, Jenna Wiens, Maggie Makar,
Presenting Author: Dylan Zapzalka*
Strategic agents are often incentivized to misreport their features to obtain favorable outcomes from machine learning models. While prior research utilizes causal inference to estimate misreporting rates for binary features, these existing methods rely on the restrictive assumption of having access to a ground truth dataset. In this work, we relax this requirement by leveraging two datasets with directional misreporting: one where agents misreport features in only one direction, and another where they only misreport in the opposite direction. We give the conditions under which the misreporting rate is identifiable using causal effect estimation by integrating these two complementary data sources. For scenarios where exact identification is not possible, we provide sensitivity analysis bounds for the misreporting rate. Finally, we empirically validate our theoretical findings using both semi-synthetic data and a real-world Medicare dataset, demonstrating the practical utility of our method.
