Microproject - The Wright-Fisher Model of genetic drift

A simple Wright-Fisher simulator¶

The Wright-Fisher (WF) model is a simple model that describes the change in allele frequencies in populations through time. WF is a cornerstone of population genetics, yet it is simple enough that we can already implement a version of it in python using the tools that we know. First, to see an example of the behavior of genetic drift under the WF model, open this link in a new window:

An online WF simulator from phytools.org

In our model we will observe the following contstraints:

The population size (N) is fixed and unchanging for the duration of the simulation
Generations are non-overlapping (all individuals die and are replaced once per generation)
All individuals have an equal probability of producing offspring
Within this population we model a single bi-allelic locus
There is no selection, e.g. ancestral (A) and derived (a) alleles have equal fitness

Assignment instructions:¶

Launch a jupyter server on your local computer
Navigate to ~/hack-5-python/notebooks directory and create a new jupyter notebook called wf-sim.ipynb
For each section of this tutorial you may create several code blocks for testing, but the goal will be to have one code block that contains both a function definition and a represetative function call, e.g.:
```
def my_cool_function(arg1, arg2):
    arg1 += 'bar'
    arg2 += 'buzz'
    return arg1 + arg2

my_cool_function('foo', 'fizz')
```

Define a function to initialize the population¶

In a new code cell define a function called init which takes 2 arguments. The first argument should be the size of the population in number of haploid individuals (N), and the second argument should be the frequency of derived alleles (f).

We will model our population as a list of 0s and 1s, with 0 representing (A) and 1 representing (a). Therefore, the init function should return a list of length N composed of 0s and 1s (in any order), with the proportion of 1s approximately equal to f

# Example call and desired result
init(N=10, f=0.5)

[0, 0, 0, 0, 0, 1, 1, 1, 1, 1]

Hint for init

You might use the fact that lists can be 'multiplied' to compose your population, and then use list concatenation (`+`) to join ancestral and derived groups.

[0] * 5

[0, 0, 0, 0, 0]

You will also almost certainly need to use the `round` function from the standard library, to convert fractional numbers to integers.

f = 10/3
print(f)
round(f)

3.333333333
3

Define a function to perform one WF generation¶

In a new code cell define a function called step, which takes 1 argument. This argument (which we will call the ancestral population) should be a 'WF population' as we have defined it above (a list of length N of 0s and 1s).

This function should return a new population of N individuals (the descendant population) with frequencies of 0s and 1s determined probabailisticly by the frequency of 0s and 1s in the ancestral pop.

The order of individuals in the population doesn't matter (i.e. this model is 'spatially implicit'), so don't worry at all about preserving order.
This is a stochastic process so you should use at least one fuction from the random module in the standard library.

# Example call and desired result
pop = [0, 0, 0, 0, 0, 1, 1, 1, 1, 1]
step(pop=pop)

[1, 0, 0, 1, 0, 1, 0, 0, 0, 0]

Hint for step()

A very simple way to do this is using `random.choices` from the standard library. Look at the [random.choices() documentation](https://docs.python.org/3/library/random.html#random.choices).

Define a wf() function to bring it all together¶

Now we will create a full WF simulation by combining the init() and step() functions from above. In a new code cell define a function called wf which takes 3 arguments. The first argument is N (population size), the second argument is f (derived allele frequency), and the third argument is # of generations to run.

The result should print to the screen the number of generations run and the final frequency of the drived allele.

# Example call and desired result
wf(N=100, f=0.2, ngens=10)

generations: 10; freq(a): 0.4565

Hint for wf()

Inside your function call `init` with the N and f arguments and save the result to a new variable called 'pop'. Create a new for loop using `range()` to determine the number of iterations (number of wf generations), and inside the for loop call the `step` function.
* To calculate the frequency of the derived allele you will need to count the number of derived alleles. Consider using the `sum()` function from the standard library.
* After the for loop use the print function and a format string to return the results (e.g. `print(f"steps{nsteps}")`)

Further challenges if time remains¶

Modify your wf() function to test for fixation or loss of allele 1. If allele 1 is fixed or lost, break the for loop early and modify the print statement to indicate this result in some way.
Track allele frequency through time. Create a new list inside the wf() function and use this list to accumulate allele frequency after each call to step() inside the for loop.
Advanced: Wrap the wf() call inside an iterate_wf() function that calls wf many times. The goal is to test the expectation that the probability of fixation of the derived allele in WF equals the initial allele frequency (e.g. run wf(N=100, f=0.2, gens=1000) 100 times and record how many times allele 1 is fixed, then divide by 100. It should be ~0.2 in this example).

Wrap-up¶

Add, commit, and push changes back to your origin main copy of the repo.