Genomes are composed of a mosaic of segments inherited from different ancestors,
each separated by past recombination events.
Consequently, genealogical relationships vary spatially across genomes.
The multispecies coalescent (MSC) describes the expected distribution of unlinked genealogies, as a function of demographic model parameters (N$_e$, $\tau$, topology).
The multispecies coalescent (MSC) describes the expected distribution of unlinked genealogies, as a function of demographic model parameters (N$_e$, $\tau$, topology).
The expected distribution of linked genealogical variation is poorly characterized.
(Martin & Belleghem 2017)
An approximation of the coalescent with recombination
Given a starting genealogy a change to the next genealogy is modeled as a Markov process — a single transition — which enables a tractable likelihood framework.
Process: recombination occurs w/ uniform probability anywhere on a tree (t$_{1}$), creating a detached subtree, which re-coalesces above t$_{1}$ with an ancestral lineage.
PSMC (Li & Durbin 2011), MSMC (Schiffels & Durbin 2014), use pairwise coalescent times between sequential genealogies to infer changes in N$_e$ through time.
ARGweaver (Rasmussen et al. 2014) and ARGweaver-D (Hubisz & Siepel 2020) use an SMC'-based conditional sampling method to infer ARGs from sequence data.
(a) no-change; (b-c) tree-change; and (d) topology-change.
(Deng et al. 2021)
Expected Tree and Topology Distances represent new spatial genetic information.
Expected Tree and Topology Distances represent new spatial genetic information.
Expected Tree and Topology Distances represent new spatial genetic information.
Barriers to coalescence and variable N$_e$ among species tree intervals.
Patrick McKenzie
PhD student
Genealogy embedding table with piecewise constant coal rates in
all intervals between coal events or population intervals.
Expected number of sites until a recombination event is observed.
Analytical results match expectation of stochastic coalescent simulations.
In single population model (Deng et al.) N$_e$ only affects edge lengths.
In an MSC model N$_e$ affects probability of tree/topology change as well as edge lengths.
Given an observed/proposed ARG (genealogies and interval lengths)
get expected waiting distance for each ($\lambda_i$)...
... and calculate likelihood of MSC model $\mathcal{(S)}$ from exponential probability densities.
Topology-changes are more informative than tree-changes; optima at true sim. values.
Example: loci=50, length=0.1Mb, recomb=2e-9, samples-per-lineage=4.
Metropolis Hastings MCMC converges on correct w/ increasing data.
Example: loci=50, length=0.1Mb, recomb=2e-9, samples-per-lineage=4.