1. Genetic drift
2. Wright-Fisher process
3. Genealogies
4. The coalescent
5. Structured coalescent
6. Neutral theory (debate)
The Hardy–Weinberg principle (HW) provides the solution to one of Darwin's biggest mysteries: how variation is maintained in a population. Under Mendelian inheritance (independent assortment) the frequencies of alleles (variants at a gene) will remain constant in the absence of selection, mutation, migration, and/or genetic drift.
By relaxing the assumption of infinite-sized populations, this becomes a random sampling process. Diploid genotypes can be resampled each generation by sampling two alleles from the previous generation based on their frequencies. The deviations (change) in allele frequency each generation under this process is an example of genetic drift (sampling error!).
By incorporating a finite effective population size (N) into our
sampling probabilities we can estimate the expected change in allele
frequencies over time (generations).
In addition to randomly sampling alleles from the previous generation, we
can *also keep track of which parental copies were passed down* (i.e., keep
track of the genealogy). This process is called the Wright-Fisher model.
A discrete time model in which each generation is composed of 2N copies of each gene. Each subsequent generation 2N new copies are randomly drawn from the previous generation.
https://en.wikipedia.org/wiki/Genetic_drift#Wright.E2.80.93Fisher_model
A neutral evolutionary process (no selection) can be modeled using the WF model in which allele frequencies change over time by genetic drift.
Source: Alexei Drummond
Genealogies are a representation of ancestor-descendant relationships. Individuals have genealogies and genes have genealogies, but their patterns are different. Whereas the number of ancestors of an individual increases in each generation back in time, the number of ancestors of a gene copy in a previous generation is always one.
Different genes have different genealogies tracing back through different ancestors. If you continue this process back further in time, to your great-great-great-great grandparents, and so on, you will find that some of your ancestors have left no copies at any of the genes in our genome. This the basis for the concept that ones' genealogical ancestry does not match their genetic ancestry.
Genes.
A gene (or locus) in this context refers to any non-recombined
region of the genome.
Gene copies.
A gene copy refers a single haploid copy a gene. Diploid individuals contain two copies of every gene. A gene copy in one generation may be replicated to leave many copies in the next generation.
We focus on haploids because of the assumptions of our model.
When mating is random within populations and no selection can act
on diploid genotypes, the diploid phase becomes irrelevant to our
model. A population of N diploids contains 2N gene copies.
Coalescence: the merging of multiple gene copies into their common ancestor looking backwards in time. Because the probability that two diploid samples share a common ancestor increases rapidly backwards in time, the probability that two gene copies are descended from the same ancestral copy also increases rapidly backwards in time.
In one generation these two gene copies either came from the same parent ($ \frac{1}{2N}$)
or they came from different parents ($1 - \frac{1}{2N}$)
The probability that these two gene copies coalesced t generations ago can be calculated from these two probability statements:
$$ \left(1 - \frac{1}{2N}\right)^{t - 1} \frac{1}{2N} $$
$$\mathrm{Pr}(\mathrm{coal}) = \binom{i}{2} \frac{1}{2N} = \frac{i(i-1)}{4N}$$
There are $\binom{i}{2}$ ways pairs of lineages can pick the same parent. Probability of coalescence scales quadratically with lineage count.
$$\mathrm{E}[T_i] = \frac{4N}{i(i-1)}$$
This is a geometric distribution.
If each generation there is a $\frac{1}{x}$ probability of an event occurring, we expect to
wait $x$ generations for the event to occur.
So far we have been working under the assumption that all samples are from a single panmictic population. But, we can also model the genealogy of sampling among multiple distinct populations.
This can be modeled as multiple single population coalescent models combined. Starting from the tips, we ask have these n samples coalesced in this time period, if not, include them in the next chunk of model up the tree.