This notebook will correspond with content from chapters 3 and 4 in the official Python tutorial https://docs.python.org/3/tutorial/. Please read before completing this notebook.
By the end of this exercise you should:
In the first notebook we learned about several Python objects:
list. Of these, the
list types differ in that they are iterable. This means that the objects have a built-in structure that allows us to easily look at each of their individual data points in order.
for statement in Python can be used to iterate over iterable objects. Here, it creates a new variable in each iteration and assigns it the next value from the iterable object. Below, we use the variable name
element to store the value in each iteration, but this variable can be named whatever you want.
# create a list (a type of iterable object) mylist = list('lists-are-iterable') # iterate over elements in mylist printing each one for element in mylist: print(element)
l i s t s - a r e - i t e r a b l e
If you're familiar with other languages like R you may be asking, but where are the curly brackets or other characters to show when the for loop starts and ends? In Python there are no brackets! Instead, it uses tabs. The indentation of one tab in the line under the
for loop clause indicates to Python that this line should be repeated in each iteration of the loop.
# create a dna string (also iterable) dna = "AACTCGCTAAAG" # iterate over the list operating on each element for element in dna: print(element)
A A C T C G C T A A A G
When iterating over elements this is a very natural place to insert a conditional statement in order to operate on only a subset of the iterable results. This can be done using
if statements. Again, indentation is used to show that the line following the
if statement should only be executed if the statement is True.
# create a dna list dna = "AACTCGCTAAAG" # iterate over the list operating on each element for element in dna: # apply the conditional if element == "A": # the code only reaches here if the conditional returned True print(element)
A A A A A
You can select the elements from an iterable by iterating over the element itself, as we did above, or, another useful procedure is to iterate over a range of integers that can be used to index elements of the iterable object. Remember that indexing is used in Python to select a subset of elements from a string or list (e.g.,
Below is an example of iterating through an iterable, and iterating through an index.
# an iterable list dna = "AACCTTGG" # iterating through the iterable itself for letter in dna: print(letter)
A A C C T T G G
# iterating through an index for i in [0, 1, 2, 3, 4, 5]: print(dna[i])
A A C C T T
The sequence object
range is a special highly efficient operator for iterating over numeric values. It has the form
range(start, stop, step), and returns an object that generates numbers on the fly as they are sampled. This makes it highly efficient since if you tell it to generate a billion numbers it doesn't need to generate them ahead of time but instead generates them only as they are needed.
range is important since it is often used in conjunction with sequence type objects to sample their index.
Another useful function that is often used it tandem with
range is the function
len, which stands for length. It is used to measure the length of an iterable object, meaning it will tell you how many elements are inside of it. One way that these two functions are combined is to ask
range to return a sequence of integers that is the same length as an iterable object. This will allow you to iterate over the entire index of an iterable object.
In the example below we iterate over every element in dna by calling
len(dna) which returns an integer and wrapping that inside of range which returns a sequence for that entire length.
# example calling len() function on dna object len(dna)
# example calling the range() function for idx in range(5): print(idx)
0 1 2 3 4
# iterate over the full range of the dna list object for i in range(len(dna)): print(i, dna[i])
0 A 1 A 2 C 3 C 4 T 5 T 6 G 7 G
dnalist = list("GCATCGATCGACTAGCATCGAT") for idx in range(len(dnalist)): if dnalist[idx] == 'A': dnalist[idx] = dnalist[idx].lower() print(dnalist)
['G', 'C', 'a', 'T', 'C', 'G', 'a', 'T', 'C', 'G', 'a', 'C', 'T', 'a', 'G', 'C', 'a', 'T', 'C', 'G', 'a', 'T']
A natural progression from making
if statements is to also add an operation if the statement is False. This can be done using else, or, to write more complex statements, we can also add in
elif which means else if.
mylist = ['a', 'b', 'c', 'd', 'e'] for letter in mylist: if letter == 'a': print('lower case a', letter) elif letter == 'b': print('upper case b', letter.upper()) else: print("some other letter")
lower case a a upper case b B some other letter some other letter some other letter
A common use of lists is to store objects that have been passed through some type of filter process. Lists are nice for this because you can start with an empty list and sequentially add objects to it to build it up. Example below.
vowels =  for item in "abcdefghijklmnopqrstuvwxyz": if item in "aeiou": vowels.append(item) print(vowels)
['a', 'e', 'i', 'o', 'u']
A more compact way to assign values to a list while iterating over a for-loop or conditional statement is to use a method called list comprehension. This is essentially a way of rewriting a multi-line for-loop statement into a single line. The point of list comprehension is to make your code more compact and easier to read. It may look a little funny at first but once you become familiar with the list-comprehension syntax it can actually be very elegant.
vowels = [i for i in "abcdefghi" if i in "aeiou"] vowels
['a', 'e', 'i']
dna1 = "AACTCGCTAAAGCCTCGCGGATCGATAAGCTAG" dna2 = "AAGTCGCTAAAGCAACGCGGAACGATAACCTGG"
count = 0 for i in range(len(dna1)): if dna1[i] != dna2[i]: count += 1 print(count)