Programming and Data Science for Biology
(EEEB G4050)

Lecture 6: Python dictionaries, if/else, and functions

Expanding your toolkit: python built-in functions

We've already learned several built-in functions like `range()` and `print()` and `len()`, but there are a whole suite of other useful Python built-in functions


# Other example built-in functions
# Mathematical operations on lists: sum()/min()/max()
max([1,3,6,7,19])
# Truth-value operations on lists: all()/any()
all([1,1,1,True,True])
                            

Expanding your toolkit: The Python Standard Library

Python standard library reference

A collection of libraries that are part of base Python, but not imported by default. Call import at the top of your script/nbotebook to load the library into the namespace. For example, the 'Random' module is a very useful way to sample random values from various distributions.



import random
# random() samples a value Uniform[0,1)
random.random()
# choices() takes 'k' items from a collection with replacement
random.choices([0, 1, 2, 3, 4, 5], k=5)
            

0.4044054358860246
[5, 5, 1, 1, 1]
            

More Standard Library: itertools module

Another example is the 'Itertools' module which provides tools for fast and efficient iteration of collections. My favorite itertools function is 'combinations'


import itertools
combs = itertools.combinations(range(4), 2)
for pair in combs:
    print(pair)
              

(0, 1)
(0, 2)
(0, 3)
(1, 2)
(1, 3)
(2, 3)
                            

Dictionaries: Unordered, mutable, key/value pair collections

Dictionaries are collections of items that are organized into key/value pairs. Unlike lists, which you can index or slice, dictionaries are accessed by 'key' values.


# Dictionary with data values
my_dict = {'key1':'value1', 'key2':'value2',
            'key3':'value3', 'key4':'value4'}

# Adding a new key/value pair
my_dict['key5'] = 'value5'

# Modifying a value for a key
my_dict['key1'] = 33

# Dictionary values can store anything, even lists or other dictionaries!
my_dict['key2'] = ['list', 'of', 'items']

print(my_dict)
                                

{'key1': 33, 'key2': ['list', 'of', 'items'], 'key3': 'value3', 'key4': 'value4', 'key5': 'value5'}
                                

Dictionary use case: Mapping species IDs to lists of samples

Lets say we have a metadata file with sample IDs in one column and species IDs in the second column. Often, something we normally want to know is "What are all the samples from a given population?". In this tabular format it is somewhat tedious, but if we transform to a dictionary it becomes trivial.


# Contents of the pops.txt file
1A_0    pop1
1B_0    pop1
1C_0    pop1
2E_0    pop2
2F_0    pop2
2G_0    pop2
3I_0    pop3
3J_0    pop3
3K_0    pop3
        

Dictionary use case: Mapping species IDs to lists of samples

In this example I am showing a function that takes a file with sample IDs in one column and species IDs in the second column and returns a dictionary mapping species IDs to samples.


def get_populations(pops_file):
    with open(pops_file, 'r') as popsfile:
        pops = {}

        ## Ignore blank lines
        lines = [line for line in popsfile.readlines() if line.strip()]
        ## Get all the populations
        for line in lines:
            pops.setdefault(line.split()[1], [])

            for line in lines:
                ind = line.split()[0]
                pop = line.split()[1]
                pops[pop].append(ind)
        return pops
        

Dictionary use case: Mapping species IDs to lists of samples


# Contents of the pops.txt file
1A_0    pop1
1B_0    pop1
1C_0    pop1
2E_0    pop2
2F_0    pop2
2G_0    pop2
3I_0    pop3
3J_0    pop3
3K_0    pop3
        

# Calling our new function and passing in this file we now get a nice dictionary
get_populations(pops_file)
                            

{'pop1': ['1A_0', '1B_0', '1C_0'],
'pop2': ['2E_0', '2F_0', '2G_0'],
'pop3': ['3I_0', '3J_0', '3K_0']}
                            

if/else: Conditional branching for control flow

Normally, python code is executed sequentially from top to bottom. Python (and all other programming languages) allow for Conditional execution, where a block of one or more statements will be executed if a certain expression is true, and something else happens if the expression is false.


count = 0
seq = "ACTGGCGAGAA"
for base in seq:
    if base == "A":
        count += 1
print(count)
                            

4
                            

else statements: explicit alternatives to failed 'if' condition

In the above, we only add 1 to the counter if the base in question is A, otherwise we do nothing. We can do more than nothing in the case of a failed 'if' condition by defining an 'else' block. If an 'else' is defined then it captures all conditions that do not explicitly pass the `if` test.


count = 0
seq = "ACTGGCGAGAA"
for base in seq:
    if base == "A" or base == "T":
        count += 1
    else:
        print("Not A or T")
print(f"AT count: {count}")
            

Not A or T
Not A or T
Not A or T
Not A or T
Not A or T
Not A or T
AT count: 5

                            

elif statements: Multiple conditional branching

If we have a multi-outcome condition we can use `elif` to introduce an additional explicit condition. With if/elif statements one can optionally include or exclude the final 'else', in which case if it is excluded then nothing happens if the `if` test fails.


gc_count = 0
seq = "ACTGGCGAGAA"
for base in seq:
    if base == "G":
        gc_count += 1
    elif base == "C":
        gc_count += 1
    else:
        print("Not GC")
print(f"GC content: {gc_count/len(seq)}")
            

Not GC
Not GC
Not GC
Not GC
Not GC
GC content: 0.545454

                            

Expanding your toolkit: Defining your own functions

A function is a reusable block of code or programming statements designed to perform a certain task. Python provides the `def` keyword for defining functions.

Functions can take arguments, they can execute code, and they can return values to the code that called them. Here is a simple function that takes no arguments, prints a string, and returns `33`.


# Declaring a function
def my_function():
    print("Inside my function")
    return 33
# Calling a function
my_function()
          

33
                            

Function arguments w/ & w/o defualt values

Functions can take zero or more parameters and these can have default values (in the case that sensible defaults can be defined) or they can not have default values (in which case they must be specified or an exception is raised).


def my_function(name="wat")
    return f"Hello {name}"
def my_function2(name)
    return f"Hello {name}"

my_function()
my_function(name="Isaac")
my_function2(name="Isaac")
my_function2()
                            

Hello wat
Hello Isaac
Hello Isaac
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[154], line 10
---> 10 print(my_function2())
TypeError: my_function2() missing 1 required positional argument: 'name'

                            

prototyping functions with jupyter notebooks

The interactivity and simple, fast feedback of working in jupyter notebooks makes it a great environment for 'prototyping' when developing your own functions. For me, I will begin by writing code to do a task using small values for parameters so I can easily see what's happening. I will get this working, and then abstract the parts I want to change by introducing a variable. Then I will continue developing and testing, and once the code works well I will copy/paste it into a new function definition.


# prototype using small values to get it working
dat = np.random.lognormal(10, size=10)
dat = np.concatenate([dat, np.zeros(10)])
print(dat)
                            

# Function definition which accepts 2 arguments which I can now specify
# as I choose when calling the function
def samp(size=10, nzeros=10):
    dat = np.random.lognormal(10, size=size)
    dat = np.concatenate([dat, np.zeros(nzeros)])
    return dat
result = samp(size=500, nzeros=100)