Primary Submission Category: Machine Learning and Causal Inference
Landscape Analysis of the Causal Inference Literature: A Topic Modeling and Bibliometric Study
Authors: Gabrielle Gauthier-Gagné, Tibor Schuster,
Presenting Author: Gabrielle Gauthier-Gagné*
Background. The fast-growing causal inference literature makes reviews of methodological approaches and applications challenging and quickly outdated. To overcome this limitation, we used topic modeling and bibliometric analysis to synthesize the causal inference literature.
Methods. We retrieved 349,466 deduplicated records from OpenAlex using causal inference related terms. We applied BERTopic, which uses deep learning embeddings to cluster documents by semantic similarity, to uncover latent topics in article titles and abstracts. We used citation network centrality to estimate topic influence and calculated yearly topic trends.
Results. Topic modeling uncovered 335 topics. The most central topics were methodological including econometric causality, structural equation modeling, Mendelian randomization, and regression discontinuity. Trends emerged in application areas; blockchain technology, gut microbiome, and education technology are growing while agriculture, brain connectivity and tuberculosis show declining prevalence.
Conclusion. Topic modeling enabled quick, transparent, and updateable synthesis of hundreds of thousands of causal inference articles, revealing core methodological topics and fluctuating application domains.
