and Genomic Scales

be used to reconstruct historical

Global research program contributing thousands of specimens to herbaria

Development of software tools for research **and education**

- 1. Biodiversity research using low-cost genomic genotyping.
- 2. Methods and software development
*: 'spatial' phylogenomics.* - 3. Reproductive and genomic diversity in a biodiversity hotspot.
- 4. Future directions of the Eaton lab in Texas.

Decreasing costs have made it relatively easy to generate large genomic datasets

Characterize whole genomes from * a subset* of sequenced markers.

It is important to examine evolutionary history across the entire genome.

Introgression is common throughout the history of many lineages.

Assemble and analyze RAD-seq type data for phylogenetic datasets.

- Eaton & Ree (2013)
*SysBio**** - Wang, Zhao, Eaton, Li & Guo (2013)
*Mol. Ecol. Res.* - Escudero, Eaton, Hahn & Hipp (2014)
*MPE* - Cavender-Bares, Gonzalez-Rodriguez, Eaton & Hipp (2015),
*Mol. Ecol.* - Eaton, Hipp, Gonzalez-Rodriguez, Cavender-Bares (2015),
*Evolution* - Eaton, Spriggs, Park & Donoghue (2017),
*SysBio* - Forsman, Knapp, Tisthammer, Eaton, Belcaid and Toonen (2017),
*MPE* - Federman, Donoghue, Daly & Eaton (2018),
*PLoS One* - Miller Quinzin, Edwards, Eaton, ... & Caccone (2018),
*Heredity* - Park, Sinnot-Armstrong, Schlutius, ... Eaton, & Donoghue (2019),
*Ann. Bot.* - Spriggs, Eaton, Sweeney, Schlutius, Edwards & Donoghue (2019),
*SysBio* - Spriggs, Schlutius, Eaton, Park, Sweeney, Edwards & Donoghue (2019),
*SysBio* - Paetzold, Wood, Eaton, Wagner & Appelhans (2019),
*Front Plant Sci.* - Satler, Herrre, Jander, Eaton, Machado, Heath & Nason (2019),
*Evolution.* - Bombonato, Amaral, Silva, ... Eaton, ... & Franco (2020),
*MPE.* - Landis, Eaton, Clement, Spriggs, Sweeney & Donoghue (2021),
*SysBio* - Zuluaga, van der Werff, Park, Eaton, ... & Donoghue (2021)
*AJB* - Guo, Ma, Yang, Ye, Guo, Liu, Eaton & Li (2021),
*SysBio* - Amaral, Yano, Oliveira, Brito, Bonatelli, ... Eaton & Franco (2021),
*J. BioGeog.* - [Donoghue, Eaton], Maya-Lastra, Landis, ... & Edwards (2022),
*Nat. Ecol. Evol..* - Satler, Herre, Heath, Machado, ... Eaton & Nason (2023),
*Ecol. & Evol.* - Stubbs, Theodoridis, Carrera, ... Eaton... & Conti (2023),
*New Phyt.*

- Eaton & Ree (2013)
*SysBio* - Wang, Zhao, Eaton, Li & Guo (2013)
*Mol. Ecol. Res.* - Escudero, Eaton, Hahn & Hipp (2014)
*MPE* - Cavender-Bares, Gonzalez-Rodriguez, Eaton & Hipp (2015),
*Mol. Ecol.* - Eaton, Hipp, Gonzalez-Rodriguez, Cavender-Bares (2015),
*Evolution**** - Eaton, Spriggs, Park & Donoghue (2017),
*SysBio* - Forsman, Knapp, Tisthammer, Eaton, Belcaid and Toonen (2017),
*MPE* - Federman, Donoghue, Daly & Eaton (2018),
*PLoS One**** - Miller Quinzin, Edwards, Eaton, ... & Caccone (2018),
*Heredity* - Park, Sinnot-Armstrong, Schlutius, ... Eaton, & Donoghue (2019),
*Ann. Bot.* - Spriggs, Eaton, Sweeney, Schlutius, Edwards & Donoghue (2019),
*SysBio* - Spriggs, Schlutius, Eaton, Park, Sweeney, Edwards & Donoghue (2019),
*SysBio* - Paetzold, Wood, Eaton, Wagner & Appelhans (2019),
*Front Plant Sci.* - Satler, Herrre, Jander, Eaton, Machado, Heath & Nason (2019),
*Evolution.* - Bombonato, Amaral, Silva, ... Eaton, ... & Franco (2020),
*MPE.* - Landis, Eaton, Clement, Spriggs, Sweeney & Donoghue (2021),
*SysBio* - Zuluaga, van der Werff, Park, Eaton, ... & Donoghue (2021)
*AJB* - Guo, Ma, Yang, Ye, Guo, Liu, Eaton & Li (2021),
*SysBio* - Amaral, Yano, Oliveira, Brito, Bonatelli, ... Eaton & Franco (2021),
*J. BioGeog.* - [Donoghue, Eaton], Maya-Lastra, Landis, ... & Edwards (2022),
*Nat. Ecol. Evol..* - Satler, Herre, Heath, Machado, ... Eaton & Nason (2023),
*Ecol. & Evol.* - Stubbs, Theodoridis, Carrera, ... Eaton... & Conti (2023),
*New Phyt.*

- Eaton & Ree (2013)
*SysBio* - Wang, Zhao, Eaton, Li & Guo (2013)
*Mol. Ecol. Res.* - Escudero, Eaton, Hahn & Hipp (2014)
*MPE* - Cavender-Bares, Gonzalez-Rodriguez, Eaton & Hipp (2015),
*Mol. Ecol.* - Eaton, Hipp, Gonzalez-Rodriguez, Cavender-Bares (2015),
*Evolution**** - Eaton, Spriggs, Park & Donoghue (2017),
*SysBio* - Forsman, Knapp, Tisthammer, Eaton, Belcaid and Toonen (2017),
*MPE* - Federman, Donoghue, Daly & Eaton (2018),
*PLoS One**** - Miller Quinzin, Edwards, Eaton, ... & Caccone (2018),
*Heredity* - Park, Sinnot-Armstrong, Schlutius, ... Eaton, & Donoghue (2019),
*Ann. Bot.* - Spriggs, Eaton, Sweeney, Schlutius, Edwards & Donoghue (2019),
*SysBio* - Spriggs, Schlutius, Eaton, Park, Sweeney, Edwards & Donoghue (2019),
*SysBio* - Paetzold, Wood, Eaton, Wagner & Appelhans (2019),
*Front Plant Sci.* - Satler, Herrre, Jander, Eaton, Machado, Heath & Nason (2019),
*Evolution.* - Bombonato, Amaral, Silva, ... Eaton, ... & Franco (2020),
*MPE.* - Landis, Eaton, Clement, Spriggs, Sweeney & Donoghue (2021),
*SysBio* - Zuluaga, van der Werff, Park, Eaton, ... & Donoghue (2021)
*AJB* - Guo, Ma, Yang, Ye, Guo, Liu, Eaton & Li (2021),
*SysBio* - Amaral, Yano, Oliveira, Brito, Bonatelli, ... Eaton & Franco (2021),
*J. BioGeog.* - [Donoghue, Eaton], Maya-Lastra, Landis, ... & Edwards (2022),
*Nat. Ecol. Evol..* - Satler, Herre, Heath, Machado, ... Eaton & Nason (2023),
*Ecol. & Evol.* - Stubbs, Theodoridis, Carrera, ... Eaton... & Conti (2023),
*New Phyt.*

- Eaton & Ree (2013)
*SysBio* - Wang, Zhao, Eaton, Li & Guo (2013)
*Mol. Ecol. Res.* - Escudero, Eaton, Hahn & Hipp (2014)
*MPE* - Cavender-Bares, Gonzalez-Rodriguez, Eaton & Hipp (2015),
*Mol. Ecol.* - Eaton, Hipp, Gonzalez-Rodriguez, Cavender-Bares (2015),
*Evolution* - Eaton, Spriggs, Park & Donoghue (2017),
*SysBio* - Forsman, Knapp, Tisthammer, Eaton, Belcaid and Toonen (2017),
*MPE* - Federman, Donoghue, Daly & Eaton (2018),
*PLoS One**** - Miller Quinzin, Edwards, Eaton, ... & Caccone (2018),
*Heredity* - Park, Sinnot-Armstrong, Schlutius, ... Eaton, & Donoghue (2019),
*Ann. Bot.* - Spriggs, Eaton, Sweeney, Schlutius, Edwards & Donoghue (2019),
*SysBio* - Spriggs, Schlutius, Eaton, Park, Sweeney, Edwards & Donoghue (2019),
*SysBio* - Paetzold, Wood, Eaton, Wagner & Appelhans (2019),
*Front Plant Sci.* - Satler, Herrre, Jander, Eaton, Machado, Heath & Nason (2019),
*Evolution.* - Bombonato, Amaral, Silva, ... Eaton, ... & Franco (2020),
*MPE.* - Landis, Eaton, Clement, Spriggs, Sweeney & Donoghue (2021),
*SysBio* - Zuluaga, van der Werff, Park, Eaton, ... & Donoghue (2021)
*AJB* - Guo, Ma, Yang, Ye, Guo, Liu, Eaton & Li (2021),
*SysBio* - Amaral, Yano, Oliveira, Brito, Bonatelli, ... Eaton & Franco (2021),
*J. BioGeog.* - [Donoghue, Eaton], Maya-Lastra, Landis, ... & Edwards (2022),
*Nat. Ecol. Evol..* - Satler, Herre, Heath, Machado, ... Eaton & Nason (2023),
*Ecol. & Evol.* - Stubbs, Theodoridis, Carrera, ... Eaton... & Conti (2023),
*New Phyt.*

- We run a 4-day (2-part) workshop to generate data and analyze it.
- RAD-seq as an introduction to genomics (wetlab and bionformatics).
- By sharing/multiplexing costs are very cheap (<$10/sample).
- COMPLETELY FREE for participants.
- Prioritize women and students from URM backgrounds.
- Funding from NSF, SSB, AGA, and SSE.

- 1. Biodiversity research using low-cost genomic genotyping.
- 2. Methods and software development
*: 'spatial' phylogenomics.* - 3. Reproductive and genomic diversity in a biodiversity hotspot.
- 4. Future directions of the Eaton lab in Texas.

I teach Programming for Biology ("Hack the Planet") to teach students basic coding skills while guiding them through the process of developing and distributing a software tool.

*Shadie*: A Python wrapper to perform SLiM simulations of plant life cycles.

*Traversome*: Hybrid PanGenome Assembler from Mixed Samples

*superMCC*: Iteratively applies BPP to calibrate node ages on large trees

*ipcoal*: integrates msprime coalescent simulations with species tree & network inference.

*toytree*: Python-based Tree object, manipulation, visualization, and evol. analysis library.

evolutionary history of organisms from *whole* genomes?

Genomes are composed of a mosaic of segments inherited from different ancestors,

each separated by past recombination events.

Consequently, genealogical relationships vary spatially across genomes.

The multispecies coalescent (MSC) describes the expected distribution
of *unlinked* genealogies,
as a function of demographic model parameters (N$_e$, $\tau$, topology).

The multispecies coalescent (MSC) describes the expected distribution
of *unlinked* genealogies,
as a function of demographic model parameters (N$_e$, $\tau$, topology).

The expected distribution of *linked*
genealogical variation is poorly characterized.

How does it relate to demographic model parameters?

- Subsampling unlinked loci effectively discards >99% genomic info.
- Ignoring linkage introduces bias (
*concatalescence*; Gatesy 2013). - Local ancestry is informative about selection and introgression.

(Martin & Belleghem 2017)

- Subsampling unlinked loci effectively discards >99% genomic info.
- Ignoring linkage introduces bias (
*concatalescence*; Gatesy 2013). - Local ancestry is informative about selection and introgression.
- We lack a null expectation for spatial genealogical variation.

- Background:
*SMC' and waiting distances.* - Our new extension:
*MS-SMC waiting distances.* - Validation of analytical results to simulations.
- New framework for MSC model likelhoods from linked genealogies.
- Future of 'spatial-genomic' phylogenetics.

*An approximation of the coalescent with recombination*

Given a starting genealogy a change to the next genealogy is modeled as a Markov process — a single transition — which enables a tractable likelihood framework.

Process: recombination occurs w/ uniform probability anywhere on a tree (t$_{1}$), creating a detached subtree, which re-coalesces above t$_{1}$ with an ancestral lineage.

*PSMC* (Li & Durbin 2011), *MSMC* (Schiffels & Durbin 2014),
use pairwise coalescent times between sequential genealogies to infer
changes in N$_e$ through time.

*ARGweaver* (Rasmussen et al. 2014) and *ARGweaver-D* (Hubisz & Siepel 2020)
use an SMC'-based conditional sampling method to infer ARGs from sequence data.

spatial information from genomes.

(a) no-change; (b-c) tree-change; and (d) topology-change.

(Deng et al. 2021)

*Expected Tree and Topology Distances represent new spatial genetic information.*

*Expected Tree and Topology Distances represent new spatial genetic information.*

*Expected Tree and Topology Distances represent new spatial genetic information.*

*Barriers to coalescence and variable N$_e$ among species tree intervals.*

Patrick McKenzie

PhD student

*Genealogy embedding table with piecewise constant coal rates in
all intervals between coal events or population intervals.*

\[
\mathbb{P}(\text{tree-unchanged} | \mathcal{S}, \mathcal{G}, b, t_r) =
\int_{t_r}^{t^u_b} \frac{1}{2N(\tau)} e^{-\int_{t_r}^\tau \frac{A(s)}{2N(s)}ds} d\tau
\]

\[
\mathbb{P}(\textrm{tree-unchanged} | \mathcal{S},\mathcal{G},b) =
\frac{1}{t^u_b-t^l_b} \int_{t_b^l}^{t_b^u}
\mathbb{P}(\textrm{tree-unchanged} | \mathcal{S},\mathcal{G},b,t)dt
\]

\[
\mathbb{P}(\textrm{tree-unchanged} | \mathcal{S},\mathcal{G}) =
\sum_{b \in \mathcal{G}}
\left[\frac{t^u_b - t^l_b}{L(\mathcal{G})}\right]
\mathbb{P}(\textrm{tree-unchanged} | \mathcal{S},\mathcal{G},b)
\]

Unlike single-pop models which exhibit monotonic probabilities over the length of a branch, MSC models exhibit variable rates (both $k$ and N$_e$ can change).

*Expected number of sites until a recombination event is observed.*

\[ \lambda_{r} = L(\mathcal{G}) \times r \]

\[
\lambda_{n} = L(\mathcal{G}) \times r \times
\mathbb{P}(\text{tree-unchanged} | \mathcal{S},\mathcal{G})
\]

\[
\lambda_{g} = L(\mathcal{G}) \times r \times
\mathbb{P}(\text{tree-changed} | \mathcal{S},\mathcal{G})
\]

\[
\lambda_{t} = L(\mathcal{G}) \times r \times
\mathbb{P}(\text{topology-changed} | \mathcal{S},\mathcal{G})
\]

*Analytical results match expectation of stochastic coalescent simulations.*

*Calculate the likelihood of an ARG given a species tree (S) *

*Topology-changes are more informative than tree-changes; optima at true sim. values.
Example: loci=50, length=0.1Mb, recomb=2e-9, samples-per-lineage=4.*

*Metropolis Hastings MCMC converges on correct w/ increasing data.*

Example: loci=50, length=0.1Mb, recomb=2e-9, samples-per-lineage=4.

- We extended method of Deng et al. (2021) to MSC models
- New likelihood framework to fit MSC models from waiting distances!
- Enables new continuous recombination-aware phylo. inference.
- Manuscript in review: McKenzie & Eaton (2023)
*Biorxiv* - Implemented at https://github.com/eaton-lab/ipcoal/
- Now writing NSF proposal to develop joint MSC-ARG-inference tool and to extend theory to networks.

- 1. Biodiversity research using low-cost genomic genotyping.
- 2. Methods and software development
*: 'spatial' phylogenomics.* - 3. Reproductive and genomic diversity in a biodiversity hotspot.
- 4. Future directions of the Eaton lab in Texas.

- >600 species globally, >300 endemic to Hengduan Mountains.
- Spectacular floral diversity; convergent evolution (Ree 2005)
- Occur in species rich assemblages (5-12)
- High potential for reproductive conflicts
- Very rarely form hybrids

Negative fitness consequences imposed by one organism on another by disrupting successful reproduction.

- We showed evidence of RI: phenotypic overdispersion in >200 assemblages.
- i.e., co-occurring species have more dissimilar flowers than expected.
- Eaton & Ree (2012)
*Ecology*

Have evolved multiple times independently (Ree 2005) and facilitate pollen tube competition (Tong and Huang 2016).

Transcription response in styles and pollen tubes during con- and heterospecific crosses in natural communities at RMBL in Colorado.

Linking Phylogenetic Inference at Genome-wide and Genealogical Scales

Linking Phylogenetic Inference at Genome-wide and Genealogical Scales

- 7 chromosome-scale genome assemblies (also for MS-SMC case study)

Linking Phylogenetic Inference at Genome-wide and Genealogical Scales

- 7 chromosome-scale genome assemblies (also for MS-SMC case study)
- Massive 3RAD sequencing of species and population diversity

(117 species, >1,000 specimens).

Linking Phylogenetic Inference at Genome-wide and Genealogical Scales

- 7 chromosome-scale genome assemblies (also for MS-SMC case study)
- Massive 3RAD sequencing of species and population diversity

(117 species, >1,000 specimens). - Flower and leaf transcriptomes for ~80 species representing convergent phenotypes: beak length, tube length, and color for PhyloGWAS.

Linking Phylogenetic Inference at Genome-wide and Genealogical Scales

- 7 chromosome-scale genome assemblies (also for MS-SMC case study)
- Massive 3RAD sequencing of species and population diversity

(117 species, >1,000 specimens). - Flower and leaf transcriptomes for ~80 species representing convergent phenotypes: beak length, tube length, and color for PhyloGWAS.
- 3D morphometric models of flowers constructed by photogrammetry.

- 1. Biodiversity research using low-cost genomic genotyping.
- 2. Methods and software development
*: 'spatial' phylogenomics.* - 3. Reproductive and genomic diversity in a biodiversity hotspot.
- 4. Future directions of the Eaton lab in Texas.

- To model the rate of fixation of genetic incompatibilities we should model a "time-to-fixation", during which purifying selection can potentially remove alleles causing DMIs within-species.
- We show that this leads to a strong correlation between rates of DMI evolution and Ne.
- Our demographic model can be tested by examining the distribution of shared versus unique DMIs across multiple species on a phylogeny (of a highly hybridizing clade).

Divergent selection is greater between populations in sympatry than allopatry (e.g., benthic/limnetic sticklebacks) to reduce competition for limited resources.

The challenge/opportunity in *Pedicularis* is that there are *many*
interacting species, and many have convergent phenotypes.
We need a community model
of character displacement.

Hypothesis: Differences among populations (within species) are a result of interspecific interactions driving character displacement in local communities.

- 110 individuals from 15 targeted locations.
- RAD-seq (original) PstI enzyme, ~5M reads per sample;
- ipyrad min50 assembly: 20K loci, 21% missing, 286K SNPs

Lande (1976):

Selection pulls
the mean phenotype towards a local optimum, while
Gene Flow homogenizes phenotypes among populations,
and they evolve by stochastic
Drift.

Phenotypic model is a poor fit compared to *phylogenetic nearest neighbor.*

*P. cranolopha* tends to have a longer style when co-occurring with a close relative.

*P. cranolopha* species complex is taxonomically challenging. Split into species/subspecies based on style
length, pubescence, and presence of a
"forked beak". But variation is relatively continuous.

Hybrid zones: contact between populations with *"forked beak"* and without.