This notebook will correspond with chapter 4 in the official Python tutorial https://docs.python.org/3/tutorial/.
By the end of this exercise you should:
A function is used to perform a task based on a particular input. Functions are the bread and butter of any programming language. We have used many functions already that are builtin to the objects we have interacted with. For example, we saw that
string objects have functions to capitalize letters, or add spacing, or query their length. Similarly,
list objects have functions to search for elements in them, or to sort.
The next step in our journey is to begin writing our own functions. This is only an introduction, as we will continue over time to learn many new ways to write more advanced functions.
In Python functions are defined using the keyword
def. Optionally we can have the function return a result by ending it with the
return operator. This is not required, but is usually desirable if we want to want to assign the result of the function to a variable
## a simple function to add 100 to the input object def myfunc(x): return x + 100
## let's run our function on an integer myfunc(200)
So the basic elements of a function include an input variable and a return variable. The next important thing is to add some documentation to our function. This is to explain what the function is for, and to let other users know how to use it. A documentation string, or docstring, should be entered as a string on the first line of the definition of a function.
def myfunc2(x): "This function adds 100 to an int or float and returns" return x + 100
def sumfunc1(arg1, arg2): "returns the sum of two input args" return arg1 + arg2
Let's write a function that will calcuate the frequency of each base in a DNA string or genome. In addition to the docstring of a function, which is intended for the user to see, you can also still add comments to the function code to remind yourself what each element of the code is doing. You can find many comments describing the detailed action of the function below.
def base_frequency(string): "returns the frequency of A, C, G, and T as a list" # create an empty list to store results freqs =  # get the total length of the input string slen = len(string) # iterate over each letter in A,C,G,T for base in "ACGT": # count the letter's occurrence in the input string # divided by the total length of the input string frequency = string.count(base) / slen # store the measured frequency in the result list freqs.append(frequency) # return the result list return freqs
# test the function base_frequency("ACACTGATCGACGAGCTAGCTAGCTAGCTGAC")
[0.28125, 0.28125, 0.25, 0.1875]
The task above can actually be accomplished in many possible ways. There is not only a single way to count the frequency of an element in a list. Among the many ways to accomplish a task some might be faster than others, but a good rule of thumb is to make your code as easily readable and comprehendable as possible. This is the best way to avoid mistakes.
Below is alternative implementation of our
base_frequency() function which I name
base_frequency2(). It returns the same result though the code runs in a slightly different way.
def base_frequency2(string): "returns the frequence of A,C,G and T in order" slen = len(string) freqA = string.count("A") / slen freqC = string.count("C") / slen freqG = string.count("G") / slen freqT = string.count("G") / slen return [freqA, freqC, freqG, freqT]
# test the function base_frequency2("ACACTGATCGACGAGCTAGCTAGCTAGCTGAC")
[0.28125, 0.28125, 0.25, 0.25]
It can be a very useful exercise to look at code and functions that are written by others to try to learn common and useful techniques, and to try to understand what they are trying to accomplish and how they go about it. As an example, try to understand the function below and answer the questions following the demonstrated example of the function.
def mystery_function(string): "no hint on this one" # code block 1 ag = 0 ct = 0 # code block 2 for element in string: if element in ["A", "G"]: ag += 1 elif element in ["C", "T"]: ct += 1 # code block 3 freq_ag = ag / len(string) freq_ct = ct / len(string) return [freq_ag, freq_ct]
# test the function mystery_function("ACACTGATCGACGAGCTAGCTAGCTAGCTGAC")
You can optionally read chapter 6.2 if you wish, but otherwise we will just discuss it here because I think it covers a bit too much irrelevant details. This chapter introduces the Python standard library, and also what it means to import a library. The take home message is that there exists a large library of packages that are included in Python that can be accessed by importing them. We will learn about several common packages in the next few weeks. Let's learn about one of these package now by using it: the
# draw a random number between 0 and 3 random.randint(0, 3)
# draw 10 random numbers between 0 and 3 [random.randint(0, 3) for i in range(10)]
[1, 2, 0, 0, 0, 0, 1, 1, 1, 3]
# draw a random element from an iterable random.choice("Columbia University")
# draw 10 random elements from an iterable [random.choice("Columbia University") for i in range(10)]
['l', 'b', 'n', 'r', 'a', 'n', 'U', 'U', 't', 'e']
# one way def random_dna1(length): return "".join(random.choice("ACTG") for i in range(length)) # another way def random_dna2(length): random_string = "" for i in range(length): random_string += random.choice("ACTG") return random_string