Skip to content

Implementing the Wright-Fisher Model as a Class

Wrapping our Wright-Fisher simulator in a Class

In previous sessions we implemented the Wright-Fisher (WF) model using a few simple python functions. This works great for running a few simulations by hand, but for projects of even modest complexity this 'flat' structure of unorganized function calls can quickly become very difficult to understand or operate. In this session we will convert the WF model into a python 'Class' which is a structured way to organize data and code.

A word about Object Oriented programming

Object Oriented programming (OOP) is based on the idea that we can structure our computer programs in a more natural way that aligns with how we interact with everyday 'things'. In OOP we define 'Classes' of objects, and these classes encapsulate attributes and functions that define how we interact with them. Here is a cartoon example of how Classes can usefully encapsulate functions and help us organize our interactions.

Let us assume that we are interested in "dressing" things, so we will define a function called dress(). If we make a salad it's easy to see how we could define a procedure to dress(salad), you just pour it on. However, after you eat the salad and your baby wakes up from a nap, it's time to dress() the baby. We probably would want an entirely different procedure for this! Without OOP we would need to do something like this:

def dress(thing_to_dress):
    if thing_to_dress == "Caesar":
        # Pour the dressing on
    elif thing_to_dress == "Cupie": # <- The baby's name
        # Put on the bonnet and onesie
    else:
        print("I don't know how to dress that.")
Hopefully it is obvious that as the number of things you want to dress increases, the size of this function will grow, and also the logic for how to interact with these very different things will get all tangled up.

With OOP we can encapsulate the logic of how to dress() something inside a 'class method'. This has numerous benefits, most importantly that you won't accidentally cover your baby with ranch.

Alert

For now ignore the self argument to the dress() methods. This is a formal requirement of python class methods that we will explain in a moment.

class Baby:
    def dress(self):
        print("Dress the baby")

class Salad:
    def dress(self):
        print("Dress the salad")

b = Baby()
b.dress()
Dress the baby

Assignment instructions

  1. Launch a jupyter server on your local computer
  2. Navigate to ~/hacks/hack-5-python/notebooks directory and open your wf-sim.ipynb notebook
  3. At the top of your notebook make sure that you have a cell that calls import random, we will need this for the future.
  4. In this exercise we will define a new class to implement the WF simulation using the functions we defined previously.

Class definition and the class keyword

The class keyword indicates to python your intention to define a new class. Choosing the name of classes is important for reducing the cognitive burden of understanding and manipulating your code. It might be tempting to call our new class WF because this is the name of the model, but this conflates the 'object' that we want to manipulate (which is the population), with the process that we want to implement (the discrete time, birth/death process with non-overlapping generations). Let's call our class 'Population' because this more precisely captures the object-ness of the concept we want to represent.

class Population:
    pass
That's it, you have defined your first python class. You can instantiate a new Population object, but it doesn't do much yet because pass is a python keyword that means "don't do anything".

p = Population()

Initializing class attributes with the __init__() method

Recall that when we defined the init() function for our WF model we specified two required arguments: N as the size of the population in number of individuals, and f as the frequency of the derived allele. Let's maintain this requirement in our Population class, and we can do this by defining the __init__() method, which is a special class method called upon class instantiation (when you create an instance of a new class).

class Population:
    def __init__(self, N, f):
        self.N = N
        self.f = f

        derived_count = round(N*f)
        self.pop = [0] * (N - derived_count) + [1] * derived_count
Like all class methods (functions that reside within and operate on classes), the __init__() method is declared with an explicit first argument representing the object which is canonically and universally called self. This argument is provided implicitly by the call to a given class method. It is confusing at first, but you will get the hang of it.

In this __init__() method we do a few things. First we define the self.N and self.f attributes and set them equal to the passed in values. Then we copy in the code from our previous init() function and define another class attribute called self.pop to contain our list of ancestral (0) and derived (1) alleles.

What happens when you try to instantiate a new instance of this Population() class without passing arguments? In this case you do get a helpful error message:

p = Population()
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[10], line 5
      3         self.N = N
      4         self.f = f
----> 5 p = Population()

TypeError: Population.__init__() missing 2 required positional arguments: 'N' and 'f'

Because we specified two arguments without defaults to the __init__() method, the Population class knows that these are required and forbids creating new instances without passing these arguments. We have two options here: 1) Pass in values for N and f; or 2) Set 'reasonable' default values. Reasonable default values are helpful in many cases, so let's do it that way.

class Population:
    def __init__(self, N=10, f=0.2):
        self.N = N
        self.f = f

        derived_count = round(N*f)
        self.pop = [0] * (N - derived_count) + [1] * derived_count

p = Population()
print(f"N={p.N} f={p.f}")
N=10 f=0.2

Now we can create a new Population instance without passing arguments, and it will take the default values, if we don't specify alternate values.

Creating printable representations of an object with __repr__()

It's often useful (and good practice) to allow objects to be printed to the screen in a meaningful way. In the previous code we had to format our own string representation of the Population object (which we assigned to the variable p) and this is annoying if you have to do it many times. Python classes have another special method called __repr__() which you may define to provide functionality to print objects in a human readable way. If you define __repr__() then this method is invoked when an object is evaluated or when it is passed to print().

class Population:
    def __init__(self, N=10, f=0.2):
        self.N = N
        self.f = f

        derived_count = round(N*f)
        self.pop = [0] * (N - derived_count) + [1] * derived_count

    def __repr__(self):
        return f"Population(N={self.N}, f={self.f})"

p = Population()
p
Population(N=10, f=0.2)

Defining a class method to implement the WF step() function

Recall that our previous implementation of the WF function took several arguments in the form of wf(N, f, ngens) and that it internally called the init() and step() functions. Because our Population class is aware of N and f we can simplify things by condensing the argument list for the Population.step() method to include only the ngens parameter. Notice that we gain additional benefits of simplification here because of the encapsulation of the N, f, and pop member attributes of the Population object.

class Population:
    def __init__(self, N=10, f=0.2):
        self.N = N
        self.f = f

        derived_count = round(N*f)
        self.pop = [0] * (N - derived_count) + [1] * derived_count

    def __repr__(self):
        return f"Population(N={self.N}, f={self.f})"

    def step(self, ngens=1):
        for i in range(ngens):
            self.pop = random.choices(self.pop, k=len(self.pop))

Now with our Population class defined, we can access all the code that we had previously called using many different functions in a very simple way:

p = Population()
p.step(ngens=10)
p.pop
[0, 0, 1, 0, 0, 1, 0, 1, 0, 0]

Searching for efficiencies in code

When we originally defined the step() function we only passed in the pop argument which was the list of ancestral and derived alleles. Because our original step() function didn't know anything about this pop we had to calculate len(pop) to set the value for the k argument for random.choices(). This is wildly inefficient, as we know the size of the population is fixed and constant for a given simulation, so we are needlessly recalculating len(self.pop) over and over again. We can take advantage of the N attribute of the Population object to release ourselves from having to do this pointless recalculation.

class Population:
    def __init__(self, N=10, f=0.2):
        self.N = N
        self.f = f

        derived_count = round(N*f)
        self.pop = [0] * (N - derived_count) + [1] * derived_count

    def __repr__(self):
        return f"Population(N={self.N}, f={self.f})"

    def step(self, ngens=1):
        for i in range(ngens):
            self.pop = random.choices(self.pop, k=self.N)

And we can use a special jupyter 'magic' command to time the difference between class attribute access (p.N) and the length calculation (len(p.pop)). The %%timeit magic command is placed at the beginning of a jupyter notebook cell to evaluate performance of the code inside a cell. By default it will run several iterations of the code to calculate runtime (but this can be modified by including information about how many iterations you want to run, e.g. %%timeit -n 2).

# Make the population large to evaluate performance for large populations
p = Population(1000000000)

%%timeit
p.N
27.9 ns ± 0.0758 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

%%timeit
len(p.pop)
60.7 ns ± 1.32 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

More than twice as fast with this small modification!

Further challenges if time remains

  1. Create a new isMonomorphic() method for your Population() class to test whether the population is fixed for either the ancestral or derived state. This method should return True if the population is monomorphic for either the ancestral or derived state, and False otherwise.
  2. Modify your step() method to test for fixation with the isMonomorphic() method and terminate further execution if the test passes. You will need to use conditional branching and the break keyword in the case that fixation has happened.
  3. Track allele frequencies through time. Create a new attribute of the Population class called freqs inside the __init__() method. This attribute should be an empty list to begin with (e.g. []), and it should be updated within the for loop of the step() method to append the current frequency of the derived allele.

Wrap-up

  • Add, commit, and push changes back to your origin main copy of the repo.