Microproject - The Wright-Fisher Model of genetic drift
A simple Wright-Fisher simulator¶
The Wright-Fisher (WF) model is a simple model that describes the change in allele frequencies in populations through time. WF is a cornerstone of population genetics, yet it is simple enough that we can already implement a version of it in python using the tools that we know. First, to see an example of the behavior of genetic drift under the WF model, open this link in a new window:
An online WF simulator from phytools.org
In our model we will observe the following contstraints:
- The population size (N) is fixed and unchanging for the duration of the simulation
- Generations are non-overlapping (all individuals die and are replaced once per generation)
- All individuals have an equal probability of producing offspring
- Within this population we model a single bi-allelic locus
- There is no selection, e.g. ancestral (A) and derived (a) alleles have equal fitness
Assignment instructions:¶
- Launch a jupyter server on your local computer
- Navigate to
~/hack-5-python/notebooks
directory and create a new jupyter notebook calledwf-sim.ipynb
- For each section of this tutorial you may create several code blocks for testing, but the goal
will be to have one code block that contains both a function definition and a represetative function call, e.g.:
def my_cool_function(arg1, arg2): arg1 += 'bar' arg2 += 'buzz' return arg1 + arg2 my_cool_function('foo', 'fizz')
Define a function to initialize the population¶
In a new code cell define a function called init
which takes 2 arguments. The first argument
should be the size of the population in number of haploid individuals (N), and the second argument
should be the frequency of derived alleles (f).
We will model our population as a list of 0s and 1s, with 0 representing (A) and 1 representing (a).
Therefore, the init
function should return a list of length N composed of 0s and 1s (in any order),
with the proportion of 1s approximately equal to f
# Example call and desired result
init(N=10, f=0.5)
[0, 0, 0, 0, 0, 1, 1, 1, 1, 1]
Hint for init
You might use the fact that lists can be 'multiplied' to compose your population, and then use list concatenation (`+`) to join ancestral and derived groups.[0] * 5
[0, 0, 0, 0, 0]
f = 10/3
print(f)
round(f)
3.333333333
3
Define a function to perform one WF generation¶
In a new code cell define a function called step
, which takes 1 argument. This argument
(which we will call the ancestral population) should be a 'WF population' as we
have defined it above (a list of length N of 0s and 1s).
This function should return a new population of N individuals (the descendant population) with frequencies of 0s and 1s determined probabailisticly by the frequency of 0s and 1s in the ancestral pop.
- The order of individuals in the population doesn't matter (i.e. this model is 'spatially implicit'), so don't worry at all about preserving order.
- This is a stochastic process so you should use at least one fuction from the
random
module in the standard library.
# Example call and desired result
pop = [0, 0, 0, 0, 0, 1, 1, 1, 1, 1]
step(pop=pop)
[1, 0, 0, 1, 0, 1, 0, 0, 0, 0]
Hint for step()
A very simple way to do this is using `random.choices` from the standard library. Look at the [random.choices() documentation](https://docs.python.org/3/library/random.html#random.choices).Define a wf() function to bring it all together¶
Now we will create a full WF simulation by combining the init() and step()
functions from above. In a new code cell define a function called wf
which
takes 3 arguments. The first argument is N (population size), the second argument is f
(derived allele frequency), and the third argument is # of generations to run.
The result should print to the screen the number of generations run and the final frequency of the drived allele.
# Example call and desired result
wf(N=100, f=0.2, ngens=10)
generations: 10; freq(a): 0.4565
Hint for wf()
Inside your function call `init` with the N and f arguments and save the result to a new variable called 'pop'. Create a new for loop using `range()` to determine the number of iterations (number of wf generations), and inside the for loop call the `step` function.* To calculate the frequency of the derived allele you will need to count the number of derived alleles. Consider using the `sum()` function from the standard library.
* After the for loop use the print function and a format string to return the results (e.g. `print(f"steps{nsteps}")`)
Further challenges if time remains¶
- Modify your
wf()
function to test for fixation or loss of allele 1. If allele 1 is fixed or lost,break
the for loop early and modify the print statement to indicate this result in some way. - Track allele frequency through time. Create a new list inside the wf() function and use this
list to accumulate allele frequency after each call to
step()
inside the for loop. - Advanced: Wrap the
wf()
call inside aniterate_wf()
function that callswf
many times. The goal is to test the expectation that the probability of fixation of the derived allele in WF equals the initial allele frequency (e.g. runwf(N=100, f=0.2, gens=1000)
100 times and record how many times allele 1 is fixed, then divide by 100. It should be ~0.2 in this example).
Wrap-up¶
- Add, commit, and push changes back to your
origin main
copy of the repo.