Notebook 2.2: Python functions

This notebook will correspond with chapter 4 in the official Python tutorial https://docs.python.org/3/tutorial/.

Learning objectives:

By the end of this exercise you should:

  1. Be able to write conditional Python clauses.
  2. Become familiar with Python functions.
  3. Understand the use of tabs in structuring Python code.

What are functions?

A function is used to perform a task based on a particular input. Functions are the bread and butter of any programming language. We have used many functions already that are builtin to the objects we have interacted with. For example, we saw that string objects have functions to capitalize letters, or add spacing, or query their length. Similarly, list objects have functions to search for elements in them, or to sort.

The next step in our journey is to begin writing our own functions. This is only an introduction, as we will continue over time to learn many new ways to write more advanced functions.

The basic structure of a function

In Python functions are defined using the keyword def. Optionally we can have the function return a result by ending it with the return operator. This is not required, but is usually desirable if we want to want to assign the result of the function to a variable

In [1]:
## a simple function to add 100 to the input object
def myfunc(x):
    return x + 100
In [2]:
## let's run our function on an integer
myfunc(200)
Out[2]:
300

More structure: doc string

So the basic elements of a function include an input variable and a return variable. The next important thing is to add some documentation to our function. This is to explain what the function is for, and to let other users know how to use it. A documentation string, or docstring, should be entered as a string on the first line of the definition of a function.

In [3]:
def myfunc2(x):
    "This function adds 100 to an int or float and returns"
    return x + 100
In [4]:
myfunc2(300.3)
Out[4]:
400.3

Multiple inputs

Of course we often want to write functions that take multiple inputs. This is easy.

In [5]:
def sumfunc1(arg1, arg2):
    "returns the sum of two input args"
    return arg1 + arg2
In [6]:
sumfunc1(10, 20)
Out[6]:
30

Writing a useful function

Let's write a function that will calcuate the frequency of each base in a DNA string or genome. In addition to the docstring of a function, which is intended for the user to see, you can also still add comments to the function code to remind yourself what each element of the code is doing. You can find many comments describing the detailed action of the function below.

In [7]:
def base_frequency(string):
    "returns the frequency of A, C, G, and T as a list"
    
    # create an empty list to store results
    freqs = []
    
    # get the total length of the input string
    slen = len(string)
    
    # iterate over each letter in A,C,G,T
    for base in "ACGT":
        
        # count the letter's occurrence in the input string
        # divided by the total length of the input string
        frequency = string.count(base) / slen
        
        # store the measured frequency in the result list
        freqs.append(frequency)
        
    # return the result list
    return freqs
In [8]:
# test the function
base_frequency("ACACTGATCGACGAGCTAGCTAGCTAGCTGAC")
    
Out[8]:
[0.28125, 0.28125, 0.25, 0.1875]

Many ways to accomplish the same task

The task above can actually be accomplished in many possible ways. There is not only a single way to count the frequency of an element in a list. Among the many ways to accomplish a task some might be faster than others, but a good rule of thumb is to make your code as easily readable and comprehendable as possible. This is the best way to avoid mistakes.

Below is alternative implementation of our base_frequency() function which I name base_frequency2(). It returns the same result though the code runs in a slightly different way.

In [9]:
def base_frequency2(string):
    "returns the frequence of A,C,G and T in order"
    slen = len(string)
    freqA = string.count("A") / slen
    freqC = string.count("C") / slen
    freqG = string.count("G") / slen
    freqT = string.count("G") / slen
    return [freqA, freqC, freqG, freqT]
In [10]:
# test the function
base_frequency2("ACACTGATCGACGAGCTAGCTAGCTAGCTGAC")
    
Out[10]:
[0.28125, 0.28125, 0.25, 0.25]

Reading and understanding functions

It can be a very useful exercise to look at code and functions that are written by others to try to learn common and useful techniques, and to try to understand what they are trying to accomplish and how they go about it. As an example, try to understand the function below and answer the questions following the demonstrated example of the function.

In [11]:
def mystery_function(string):
    "no hint on this one"
    
    # code block 1
    ag = 0
    ct = 0
    
    # code block 2
    for element in string:
        if element in ["A", "G"]:
            ag += 1
        elif element in ["C", "T"]:
            ct += 1
            
    # code block 3
    freq_ag = ag / len(string)
    freq_ct = ct / len(string)
    
    return [freq_ag, freq_ct]
In [12]:
# test the function
mystery_function("ACACTGATCGACGAGCTAGCTAGCTAGCTGAC")
Out[12]:
[0.53125, 0.46875]
Question: Describe the mystery_function() function above by describing the purpose of each chunk of code separated into differnt commented code blocks above. What is the purpose of the function and what does it return as a result. Be explicit about object types in your answer.

Response:

1. Create two integer variables equal to 0 that will be used as counters. 2. Iterate over each element in the input string object and if the item is an A or G then we increase the 'ag' counter, otherwise increase the 'ct' counter. 3. Calculate the frequency of the input string that was an AG versus CT by using our counter integer variables and diving by the total length. The purpose of the function is to measure the frequency of purines and pyramidines in a string of DNA.

The standard library

You can optionally read chapter 6.2 if you wish, but otherwise we will just discuss it here because I think it covers a bit too much irrelevant details. This chapter introduces the Python standard library, and also what it means to import a library. The take home message is that there exists a large library of packages that are included in Python that can be accessed by importing them. We will learn about several common packages in the next few weeks. Let's learn about one of these package now by using it: the random library.

In [13]:
import random
In [14]:
# draw a random number between 0 and 3
random.randint(0, 3)
Out[14]:
1
In [15]:
# draw 10 random numbers between 0 and 3
[random.randint(0, 3) for i in range(10)]
Out[15]:
[1, 2, 0, 0, 0, 0, 1, 1, 1, 3]
In [16]:
# draw a random element from an iterable
random.choice("Columbia University")
Out[16]:
'u'
In [17]:
# draw 10 random elements from an iterable
[random.choice("Columbia University") for i in range(10)]
Out[17]:
['l', 'b', 'n', 'r', 'a', 'n', 'U', 'U', 't', 'e']
Action: Write a function that uses the random package to randomly draw values to generate a random sequence of DNA (As, Cs, Gs, and Ts) of a length that is supplied as an argument. **It should return the results as a string object**. Demonstrate the function by generating a 20 base pair long sequence of DNA.
In [19]:
# one way
def random_dna1(length):
    return "".join(random.choice("ACTG") for i in range(length))


# another way
def random_dna2(length):
    random_string = ""
    for i in range(length):
        random_string += random.choice("ACTG")
    return random_string
In [20]:
random_dna1(20)
Out[20]:
'CCCTGACGCAGGTCTGCATC'
In [21]:
random_dna2(20)
Out[21]:
'CAGCCACCTGTTGTCAACCG'
Action: Save this notebook and download as HTML to upload to courseworks.