This notebook will correspond with content from chapters 3 and 4 in the official Python tutorial https://docs.python.org/3/tutorial/. Please read before completing this notebook.
By the end of this exercise you should:
In the first notebook we learned about several Python objects: int
, float
, bool
, str
, and list
. Of these, the str
and list
types differ in that they are iterable. This means that the objects have a built-in structure that allows us to easily look at each of their individual data points in order.
The for
statement in Python can be used to iterate over iterable objects. Here, it creates a new variable in each iteration and assigns it the next value from the iterable object. Below, we use the variable name element
to store the value in each iteration, but this variable can be named whatever you want.
# create a list (a type of iterable object)
mylist = list('lists-are-iterable')
# iterate over elements in mylist printing each one
for element in mylist:
print(element)
If you're familiar with other languages like R you may be asking, but where are the curly brackets or other characters to show when the for loop starts and ends? In Python there are no brackets! Instead, it uses tabs. The indentation of one tab in the line under the for loop
clause indicates to Python that this line should be repeated in each iteration of the loop.
# create a dna string (also iterable)
dna = "AACTCGCTAAAG"
# iterate over the list operating on each element
for element in dna:
print(element)
When iterating over elements this is a very natural place to insert a conditional statement in order to operate on only a subset of the iterable results. This can be done using if statements
. Again, indentation is used to show that the line following the if statement
should only be executed if the statement is True.
Example:
# create a dna list
dna = "AACTCGCTAAAG"
# iterate over the list operating on each element
for element in dna:
# apply the conditional
if element == "A":
# the code only reaches here if the conditional returned True
print(element)
You can select the elements from an iterable by iterating over the element itself, as we did above, or, another useful procedure is to iterate over a range of integers that can be used to index elements of the iterable object. Remember that indexing is used in Python to select a subset of elements from a string or list (e.g., dnalist[5:15]
).
Below is an example of iterating through an iterable, and iterating through an index.
# an iterable list
dna = "AACCTTGG"
# iterating through the iterable itself
for letter in dna:
print(letter)
# iterating through an index
for i in [0, 1, 2, 3, 4, 5]:
print(dna[i])
range
and len
functions¶The sequence object range
is a special highly efficient operator for iterating over numeric values. It has the form range(start, stop, step)
, and returns an object that generates numbers on the fly as they are sampled. This makes it highly efficient since if you tell it to generate a billion numbers it doesn't need to generate them ahead of time but instead generates them only as they are needed. range
is important since it is often used in conjunction with sequence type objects to sample their index.
Another useful function that is often used it tandem with range
is the function len
, which stands for length. It is used to measure the length of an iterable object, meaning it will tell you how many elements are inside of it. One way that these two functions are combined is to ask range
to return a sequence of integers that is the same length as an iterable object. This will allow you to iterate over the entire index of an iterable object.
In the example below we iterate over every element in dna by calling len(dna)
which returns an integer and wrapping that inside of range which returns a sequence for that entire length.
# example calling len() function on dna object
len(dna)
# example calling the range() function
for idx in range(5):
print(idx)
# iterate over the full range of the dna list object
for i in range(len(dna)):
print(i, dna[i])
dnalist = list("GCATCGATCGACTAGCATCGAT")
for idx in range(len(dnalist)):
if dnalist[idx] == 'A':
dnalist[idx] = dnalist[idx].lower()
print(dnalist)
A natural progression from making if
statements is to also add an operation if the statement is False. This can be done using else, or, to write more complex statements, we can also add in elif
which means else if.
mylist = ['a', 'b', 'c', 'd', 'e']
for letter in mylist:
if letter == 'a':
print('lower case a', letter)
elif letter == 'b':
print('upper case b', letter.upper())
else:
print("some other letter")
A common use of lists is to store objects that have been passed through some type of filter process. Lists are nice for this because you can start with an empty list and sequentially add objects to it to build it up. Example below.
vowels = []
for item in "abcdefghijklmnopqrstuvwxyz":
if item in "aeiou":
vowels.append(item)
print(vowels)
A more compact way to assign values to a list while iterating over a for-loop or conditional statement is to use a method called list comprehension. This is essentially a way of rewriting a multi-line for-loop statement into a single line. The point of list comprehension is to make your code more compact and easier to read. It may look a little funny at first but once you become familiar with the list-comprehension syntax it can actually be very elegant.
vowels = [i for i in "abcdefghi" if i in "aeiou"]
vowels
dna1 = "AACTCGCTAAAGCCTCGCGGATCGATAAGCTAG"
dna2 = "AAGTCGCTAAAGCAACGCGGAACGATAACCTGG"
count = 0
for i in range(len(dna1)):
if dna1[i] != dna2[i]:
count += 1
print(count)