Implementing the Wright-Fisher Model as a Class
Wrapping our Wright-Fisher simulator in a Class¶
In previous sessions we implemented the Wright-Fisher (WF) model using a few simple python functions. This works great for running a few simulations by hand, but for projects of even modest complexity this 'flat' structure of unorganized function calls can quickly become very difficult to understand or operate. In this session we will convert the WF model into a python 'Class' which is a structured way to organize data and code.
A word about Object Oriented programming¶
Object Oriented programming (OOP) is based on the idea that we can structure our computer programs in a more natural way that aligns with how we interact with everyday 'things'. In OOP we define 'Classes' of objects, and these classes encapsulate attributes and functions that define how we interact with them. Here is a cartoon example of how Classes can usefully encapsulate functions and help us organize our interactions.
Let us assume that we are interested in "dressing" things, so we will define a function called dress()
.
If we make a salad it's easy to see how we could define a procedure to dress(salad)
, you just pour
it on. However, after you eat the salad and your baby wakes up from a nap, it's time to dress()
the
baby. We probably would want an entirely different procedure for this! Without OOP we would need to do
something like this:
def dress(thing_to_dress):
if thing_to_dress == "Caesar":
# Pour the dressing on
elif thing_to_dress == "Cupie": # <- The baby's name
# Put on the bonnet and onesie
else:
print("I don't know how to dress that.")
With OOP we can encapsulate the logic of how to dress()
something inside a 'class method'.
This has numerous benefits, most importantly that you won't accidentally cover your
baby with ranch.
Alert
For now ignore the self
argument to the dress()
methods. This is a formal requirement of python class methods that we will explain in a moment.
class Baby:
def dress(self):
print("Dress the baby")
class Salad:
def dress(self):
print("Dress the salad")
b = Baby()
b.dress()
Dress the baby
Assignment instructions¶
- Launch a jupyter server on your local computer
- Navigate to
~/hacks/hack-5-python/notebooks
directory and open yourwf-sim.ipynb
notebook - At the top of your notebook make sure that you have a cell that calls
import random
, we will need this for the future. - In this exercise we will define a new class to implement the WF simulation using the functions we defined previously.
Class definition and the class
keyword¶
The class
keyword indicates to python your intention to define a new class. Choosing the name
of classes is important for reducing the cognitive burden of understanding and manipulating your code.
It might be tempting to call our new class WF
because this is the name of the model, but this
conflates the 'object' that we want to manipulate (which is the population), with the process that
we want to implement (the discrete time, birth/death process with non-overlapping generations). Let's
call our class 'Population' because this more precisely captures the object-ness of the concept we
want to represent.
class Population:
pass
Population
object,
but it doesn't do much yet because pass
is a python keyword that means "don't do anything".
p = Population()
Initializing class attributes with the __init__()
method¶
Recall that when we defined the init()
function for our WF model we specified two required
arguments: N
as the size of the population in number of individuals, and f
as the frequency
of the derived allele. Let's maintain this requirement in our Population
class, and we can
do this by defining the __init__()
method, which is a special class method called upon
class instantiation (when you create an instance of a new class).
class Population:
def __init__(self, N, f):
self.N = N
self.f = f
derived_count = round(N*f)
self.pop = [0] * (N - derived_count) + [1] * derived_count
__init__()
method
is declared with an explicit first argument representing the object which is canonically and universally
called self
. This argument is provided implicitly by the call to a given class method. It is
confusing at first, but you will get the hang of it.
In this __init__()
method we do a few things. First we define the self.N
and self.f
attributes
and set them equal to the passed in values. Then we copy in the code from our previous init()
function
and define another class attribute called self.pop
to contain our list of ancestral (0) and derived (1)
alleles.
What happens when you try to instantiate a new instance of this Population()
class without passing
arguments? In this case you do get a helpful error message:
p = Population()
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[10], line 5
3 self.N = N
4 self.f = f
----> 5 p = Population()
TypeError: Population.__init__() missing 2 required positional arguments: 'N' and 'f'
Because we specified two arguments without defaults to the __init__()
method, the Population
class knows that these are required and forbids creating new instances without passing these
arguments. We have two options here: 1) Pass in values for N
and f
; or 2) Set 'reasonable'
default values. Reasonable default values are helpful in many cases, so let's do it that way.
class Population:
def __init__(self, N=10, f=0.2):
self.N = N
self.f = f
derived_count = round(N*f)
self.pop = [0] * (N - derived_count) + [1] * derived_count
p = Population()
print(f"N={p.N} f={p.f}")
N=10 f=0.2
Now we can create a new Population
instance without passing arguments, and it will
take the default values, if we don't specify alternate values.
Creating printable representations of an object with __repr__()
¶
It's often useful (and good practice) to allow objects to be printed to the
screen in a meaningful way. In the previous code we had to format our own string
representation of the Population
object (which we assigned to the variable p
) and
this is annoying if you have to do it many times. Python classes have another special
method called __repr__()
which you may define to provide functionality to print objects
in a human readable way. If you define __repr__()
then this method is invoked when an
object is evaluated or when it is passed to print()
.
class Population:
def __init__(self, N=10, f=0.2):
self.N = N
self.f = f
derived_count = round(N*f)
self.pop = [0] * (N - derived_count) + [1] * derived_count
def __repr__(self):
return f"Population(N={self.N}, f={self.f})"
p = Population()
p
Population(N=10, f=0.2)
Defining a class method to implement the WF step()
function¶
Recall that our previous implementation of the WF function took several arguments
in the form of wf(N, f, ngens)
and that it internally called the init()
and
step()
functions. Because our Population
class is aware of N
and f
we can simplify things by condensing the argument list for the Population.step()
method to include only the ngens
parameter. Notice that we gain additional benefits of
simplification here because of the encapsulation of the N
, f
, and pop
member
attributes of the Population
object.
class Population:
def __init__(self, N=10, f=0.2):
self.N = N
self.f = f
derived_count = round(N*f)
self.pop = [0] * (N - derived_count) + [1] * derived_count
def __repr__(self):
return f"Population(N={self.N}, f={self.f})"
def step(self, ngens=1):
for i in range(ngens):
self.pop = random.choices(self.pop, k=len(self.pop))
Now with our Population
class defined, we can access all the code that we had previously
called using many different functions in a very simple way:
p = Population()
p.step(ngens=10)
p.pop
[0, 0, 1, 0, 0, 1, 0, 1, 0, 0]
Searching for efficiencies in code¶
When we originally defined the step()
function we only passed in the pop
argument
which was the list of ancestral and derived alleles. Because our original step()
function
didn't know anything about this pop we had to calculate len(pop)
to set the value for
the k
argument for random.choices()
. This is wildly inefficient, as we know the size
of the population is fixed and constant for a given simulation, so we are needlessly recalculating
len(self.pop)
over and over again. We can take advantage of the N
attribute of the Population
object to release ourselves from having to do this pointless recalculation.
class Population:
def __init__(self, N=10, f=0.2):
self.N = N
self.f = f
derived_count = round(N*f)
self.pop = [0] * (N - derived_count) + [1] * derived_count
def __repr__(self):
return f"Population(N={self.N}, f={self.f})"
def step(self, ngens=1):
for i in range(ngens):
self.pop = random.choices(self.pop, k=self.N)
And we can use a special jupyter 'magic' command to time the difference between class attribute
access (p.N
) and the length calculation (len(p.pop)
). The %%timeit
magic command is placed
at the beginning of a jupyter notebook cell to evaluate performance of the code inside a cell.
By default it will run several iterations of the code to calculate runtime (but this can be
modified by including information about how many iterations you want to run, e.g. %%timeit -n 2
).
# Make the population large to evaluate performance for large populations
p = Population(1000000000)
%%timeit
p.N
27.9 ns ± 0.0758 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
%%timeit
len(p.pop)
60.7 ns ± 1.32 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
More than twice as fast with this small modification!
Further challenges if time remains¶
- Create a new
isMonomorphic()
method for yourPopulation()
class to test whether the population is fixed for either the ancestral or derived state. This method should returnTrue
if the population is monomorphic for either the ancestral or derived state, andFalse
otherwise. - Modify your
step()
method to test for fixation with theisMonomorphic()
method and terminate further execution if the test passes. You will need to use conditional branching and thebreak
keyword in the case that fixation has happened. - Track allele frequencies through time. Create a new attribute of the
Population
class calledfreqs
inside the__init__()
method. This attribute should be an empty list to begin with (e.g.[]
), and it should be updated within the for loop of thestep()
method to append the current frequency of the derived allele.
Wrap-up¶
- Add, commit, and push changes back to your
origin main
copy of the repo.