Notebook 2.0: Python objects

This notebook will correspond with chapters 1 and 3 from the official Python tutorial: https://docs.python.org/3/tutorial/. You are welcome to read chapter 2 as well, but it is mostly about how to open, install, and run Python. Since we have Python running interactively in jupyter the details of starting a Python Interpreter from chapter 2 are not so important. The challenges in this notebook are meant to reinforce the material from the readings. Feel free to use this notebook as a scratch pad as well in which to write and test code from the readings.

Learning objectives:

By the end of this exercise you should:

  1. Understand the use of variables in Python to store values.
  2. Comprehend the difference between returning and printing variables.
  3. Become familiar with int, float, str, and list objects.

Python as a calculator

One of the simplest and most common uses of Python involves operations on numeric values to perform mathematical operations. Addition, subtraction, division, and most other operators act as you would expect using any calculator. Parentheses can be used to group the order of operations.

This is our first execution of Python code, and so we will also spend some time learning about how the results of executed code are dealt with. In particular, we will introduce the concept of variables.

A variable is a named object that is used to store a value. In the first code block below we perform mathematical operations on a number of integer values. This code block does not include any variables. In the second code block we create a new variable called x by storing the value 3 to it. In the third code block below we substitute the variable x into the code to perform the same operation as in the first code block. Because the value of the variable x is 3, the result of executing the third code block is 25, just like in the first code block.

In [ ]:
# you can perform math operations in Python
(3 / 3) + (3 * 5) + (3 ** 2)
In [ ]:
# create a new variable named x with the integer value 3
x = 3
In [ ]:
# substitute named variables to represent a value or object
(x / 3) + (x * 5) + (x ** 2)

Creating and accessing variables

Notice how in the cells above the first and third block returned a result that was printed below the cell, while the second block did not return anything. This is because in the second code block we stored the result as a variable by using the = operator. The default behavior is for a value to be returned if it not stored to a variable, meaning that it will simply be shown in the output of the code cell.

To retiterate: the object on the last line of a code cell will be returned when the cell is executed if it is not stored to a variable. This behavior may be preferred in some situations, such as in the 1st code cell above where we simply wanted to see the result of the mathematic operation. In many cases, though, it is useful to store values to variables so that they can be reused in other operations.

In [ ]:
# the value of x will be returned (shown)
x
In [ ]:
# the value in the cell will not be shown because it is stored to p
p = x - 3

The print() function

Notice that above when we return a value it is shown in the output cell next to the red signature Out[N]:, this indicates that a value was returned (not stored to a variable).

The print() function is a more explicit and broadly useful way to view the value of an object or variable. It prints the value to a special variable called stdout, which is the standard location for printing values that you want a user to be able to see. In a jupyter notebook, text printed to stdout shows up in the output area below an executed code cell.

A key difference between viewing a returned value and a printed value is that a code cell will only return a value that is on the final executed line in a code block. By contrast, the print() function can be called on any line within a block of code to print the value of an object at any time. This makes it very useful as a tool for checking the value of variables during your code execution (a process called debugging).

In [ ]:
# the print() function sends the value to the stdout of a cell
print(x)

This example shows how returned values only include the final line of a code block. It does not show a returned value for the two lines before the final line. In the next code cell the print() function shows the value on all three lines.

In [ ]:
# a returned value is only the final line
x
x + 3
x + 6
In [ ]:
print(x)
print(x + 3)
y = x + 10
print(y)
Action: In a code cell below write three lines of Python code. On line 1 create a new variable called y with the value 30. On line 2 create another new variable z with the value 5.5. On line 3 use the print function to print the value of y / z. (See Chapter 3 if you need help).
In [1]:
y = 30
z = 5.5
print(y / z)
5.454545454545454

Objects and Types

In the example above you have already used two different types of objects in Python, an Integer and a Float (decimal value). We'll learn about object Types in more detail now.

Integers and Floats

There is little practical difference between integers and floats (particularly in Python3 as opposed to Python2) except when you get down to the details of their memory use. Integers are whole numbers and Floats store floating point decimal values. The objects can be compared or combined in mathematical operations.

In [ ]:
# assigns an integer value
y0 = 3

# also assigns an integer value
y1 = float(5.0)

# also assigns an integer variable
y2 = int(5)

# return whether the two variables are equal
y1 == y2

Boolean type

A boolean type is a simple True or False statement. For example, you just saw above that the returned value of the comparison we performed was a value of True. That's a boolean. This type is used when comparing objects or values. Binary statements of this type are very common in programming so expect that you will see boolean types very often.

In [ ]:
## True can be stored as True or as 1
x = True
y = 1
x == y
In [ ]:
## False can be stored as False or as 0
x = False
y = 0
x == y

Comparisons

As you can see above we used the = character to assign values to a variable and we used the == character to ask if two variables were equal. There are several other comparison expressions available in addition to ==.

In [ ]:
x = 10
y = 3
z = "orange"
In [ ]:
print(x > y)
print(x >= y)
print(y < x)
print(x == z)
print(z != y)

Not everything can be compared, though. For example, asking whether "orange" is greater than 3 does not make any sense. When you do this Python will raise an error. It is important to be aware of the Type of each of your variables. We expect the code below will raise an error, just go ahead with it. We will describe how to interpret and deal with Error messages more later.

In [ ]:
print(z > y)

Strings

A "string" is the name used in Python for words, sentences, or paragraphs of text that are joined together. It is one of the most basic data types and one that Python is very good at dealing with. In fact, the ease with which Python can be used to manipulate text is one of the primary reasons it bas become such a popular language for both scientific programming as well as web development.

Strings as variables

Let's work with a string representation of a sequence of DNA. A string is created by wrapping any text in single or double quotes.

In [2]:
# note that here we use double quotes
dna = "ACGCAGACGATTTGATGATGAGCATCGACTAGCTACACAAAGACTCAGGGCATATA"
In [3]:
# note that here we use single quotes. You can use either one.
dna = 'ACGCAGACGATTTGATGATGAGCATCGACTAGCTACACAAAGACTCAGGGCATATA'

Another difference between using the print() function and the return value of a string is that when you use print special characters in the text will be rendered. This is particularly apparent for newline characters, which are used to represent line breaks, as well as many other types of characters like tabs. In the example below the string includes the special characters "\t" which is used to represent a tab, and "\n" to represent a line break.

In [4]:
# return the string 
mystring = "hello\tworld\nhello world"
mystring
Out[4]:
'hello\tworld\nhello world'
In [5]:
# print the string
print(mystring)
hello	world
hello world

Indexing and slicing

A string is an indexed datatype that is immutable. This means that we can select portions of the text using indexed numbering, but we cannot change/mutate individual elements of it.

The example below selects the elements in the string starting at the 5th character up until the 15th character.

In [6]:
# return an indexed portion of the dna string
dna[5:15]
Out[6]:
'GACGATTTGA'
Action: Use indexing to return only the first 10 characters of the string variable named 'dna'. See Chapter 3.1.2 if you need help.
In [7]:
dna[:10]
Out[7]:
'ACGCAGACGA'
Action: Use indexing to return only the last 5 characters of the string variable 'dna'. See Chapter 3.1.2 if you need help.
In [8]:
dna[-5:]
Out[8]:
'ATATA'

Object-oriented Python

Python is called an object-oriented programming language, which refers to the fact that everything in Python is an object. What does that mean? Well, it means that everything you interact with has a hidden structure within it that it uses to store its values, and in addition, objects typically have built-in functions associated with them that are designed to interact with its data. We'll learn more about functions soon.

This is one of the most exciting things about using Python in jupyter, which uses an interactive version of Python, called IPython. In this framework it is really easy to access and see all of the attributes and functions associated with an object.

This can be done by typing a variable name followed by a dot, then, while your cursor is still sitting after the dot, press the <tab> key on your keyboard. The animated GIF below shows an example of this. A pop-up shows a list of functions associated with the string object dna.

Functions always end with a set of parentheses. By placing your cursor inside of the parentheses at the end of a function and holding shift and then pressing tab you can pull up a help menu with instructions on how to use the function. This is also shown in the GIF below. This is really useful. We will revisit it again later.

https://eaton-lab.org/slides/genomics-2019/session-2/data/jupyter-tab-complete.gif

Strings as Objects

Action: Below, try interactively viewing the functions associated with the variable `dna`, which is a string object. You should see a temporary popup that will display the names of many functions. Select the one called `lower` and type parentheses after it to execute it as a function: `dna.lower()`. If done correctly it should print the value of dna all in lower case letters.
In [10]:
dna.lower()
Out[10]:
'acgcagacgatttgatgatgagcatcgactagctacacaaagactcagggcatata'

Many functions are associated with object types

It is interesting that the string variable has a function called .lower() to print its value in lower case. But doesn't that seem a little idiosyncratic. How could you have known that that function exists out of the thousands of functions in Python?

The answer is in two parts. First, it is something that you learn over time. As you read more Python code and see this function used repeatedly you will eventually memorize that this and other functions exist. But second, Python actually helps you to learn about these functions by way of its object-oriented design. The .lower() function is attached to a string object because it is meant to be used on a string object. The association of functions with the object types that they are meant to be used on is a way in which the language itself provides tips to you while you use it.

If you look back at one of our integer variables that we created earlier, like x, you will notice that it has a different set of functions associated with it (which you can see by putting a dot after the variable and using tab-completion). The .lower() function is not among the functions associated with x because it doesn't make sense to convert the value of an Integer object to lower case. It is only something that can be done to Strings.

Learning to use functions

Again, using the interactivity of Python is useful here. As mentioned earlier, functions always have a set of parentheses at the end. This is because some functions require additional arguments about how they should be executed, and these arguments are passed to the function by entering them in the parentheses.

As mentioned earlier, you can find more information about a function by selecting your cursor inside the parentheses, holding the shift key down, and then pressing tab.

In the example below we use a function that takes an argument. When we provide a string as an argument to the .split() function it separates the object's string value into multiple smaller strings that are delimited by the input string argument. For example, the string value stored to DNA can be split on the character "TTT" to yeild two strings composing the string before TTT and after TTT. Try it below.

In [11]:
dna.split("TTT")
Out[11]:
['ACGCAGACGA', 'GATGATGAGCATCGACTAGCTACACAAAGACTCAGGGCATATA']
Action: Please read the instructions here carefully. I am asking you to complete several distinct actions. To get full points you must complete all of them as directed. (1) Use the split() function to split the dna variable on the characters "CG". (2) Store the returned result of step 1 to a new variable called dnalist. (3) Then use the print function on the dnalist variable to show its contents.
In [14]:
dnalist = dna.split("CG")
print(dnalist)
['A', 'CAGA', 'ATTTGATGATGAGCAT', 'ACTAGCTACACAAAGACTCAGGGCATATA']

List objects

One of the most flexible and useful data objects in Python is the list. Lists are containers that can store any other type of data object, they can even store other lists. Lists are represented by values inside of square brackets. Seem familiar? That's right, we created a list above when we split the string into multiple objects. The returned value was a list containing multiple strings.

In [15]:
# create a list 
letters1 = ['a', 'b', 'c', 'd', 'e', 'f', 'g']

# another way to create a list
letters2 = list("abcdefg")
In [16]:
print(letters1)
['a', 'b', 'c', 'd', 'e', 'f', 'g']
In [17]:
print(letters2)
['a', 'b', 'c', 'd', 'e', 'f', 'g']
In [18]:
# test that the two lists are identical
letters2 == letters1
Out[18]:
True

Indexing a list

A list can be indexed just like a string, however, a big difference is that lists are mutable, meaning that we can replace individual elements of a list without having to create a new variable. This is shown below (we expect the error to be raised in the one example.)

In [19]:
# index a string
dna[5:15]
Out[19]:
'GACGATTTGA'
In [20]:
# *try* to mutate part of a string (this won't work)
dna[5] = "T"
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-20-3a1d80b5d241> in <module>
      1 # *try* to mutate part of a string (this won't work)
----> 2 dna[5] = "T"

TypeError: 'str' object does not support item assignment
In [21]:
# make a list of DNA
dnalist = list(dna)
In [22]:
# index the dna list from 5 to 15
dnalist[5:15]
Out[22]:
['G', 'A', 'C', 'G', 'A', 'T', 'T', 'T', 'G', 'A']
In [23]:
# mutate part of the list 
dnalist[5] = "T"

# print the list from 5 to 15 to show it changed relative to above
dnalist[5:15]
Out[23]:
['T', 'A', 'C', 'G', 'A', 'T', 'T', 'T', 'G', 'A']

List functions

Again, just like strings lists are also objects in Python, and as such they have functions accessible that can be used to operate on lists. You can see all of the functions associated with a list by using tab-completion after the object as described earlier.

In [24]:
# example: count how many "A" are in the list
dnalist.count("A")
Out[24]:
20
Action: In the cell below create two new variables, one called fiveprime that contains the first ten 10 elements in dnalist, and another called threeprime that contains the last 10 elements in dnalist.
In [25]:
fiveprime = dna[:10]
threeprime = dna[-10:]
Action: Save this notebook and download as HTML to upload to courseworks when all of your notebooks are finished.