Skip to content

Abstract Search

Primary Submission Category: Causal Discovery

TarGene: Dispensing with unnecessary assumptions in population genetics analysis

Authors: Olivier Labayle, Kelsey Tetley-Campbell,

Presenting Author: Olivier Labayle*

Parametric assumptions in population genetics analysis are often made, yet a principled argument for their validity is not given. We present a unified statistical workflow, based on Targeted Learning, called TarGene, for the estimation of effect sizes, as well as two-point and higher-order epistatic interactions of genomic variants on polygenic traits, which dispenses with these unnecessary assumptions. We validate the effectiveness of our method by reproducing previously verified effect sizes on UK Biobank data, whilst also discovering non-linear effect sizes of additional allelic copies on trait or disease. We demonstrate that for the FTO variant rs1421085 effect size on weight, the addition of one copy of the C allele is associated with 0.77 kg (95% CI: 0.68 – 0.85) increase, while the addition of the second C copy non-linearly adds 1.31 kg (95% CI: 1.19 – 1.43). We further find 3 pairs of epistatic loci associated with skin colour that have been previously reported to be associated with hair colour. Finally, we illustrate how TarGene can be used to investigate higher-order interactions using 3 variants linked to the vitamin D receptor complex. TarGene thus extends the reach of current genome-wide association studies by enriching the set of parameters that can be estimated whilst data-adaptively incorporating complex non-linear relations between phenotype, genotype, and confounders, as well as accounting for strong population dependence such as island cohorts.