importing

Understanding imports and modules¶

Learning objectives¶

By the end of this tutorial you should:

Understand that Python packages are just folders full of Python scripts.
Understand that sys.path lists the locations of importable packages.
Be able to import custom Python packages.
Be familiar with how to organize Python scripts into a package.

The `import` statement¶

The import statement is one of the first things located in any Python script. It is used to load Python code from other files located on your system. But what is it actually importing? What do those files and folders look like?

In your last tutorial you learned how to write a single Python script that contains code that can be imported. This is often referred to as a module. Here we will learn about writing a collection of modules together in a folder, which is called a package. Both modules and packages are very similar in the way that import statements are used to access code from Python files to make it accessible in other places.

Organizing a package¶

Python packages are only useful when they are organized in a way that makes it easy to understand how they should be used. Because GitHub has become a standard place to store code, we will discuss the organization of our code more broadly in terms of how it should be organized in a git repository. It is useful to follow a similar set of conventions whether the repo is intended as a Python software package, or if it is simply an archive of a research project.

In either case, we will usually have the following:

a README file in the top level directory describing the project.
a code directory (that can take different names) containing Python code.
a notebooks directory containing demonstrations/analyses of the code.
a data directory containing example data to be analyzed in the notebooks.

Here we will focus on the structure of your code directory. You have already cloned the repo hack-7-python which currently contains only a README file and notebooks directory. Let's create an additional folder to contain our scripts, called mypackage, and add an empty file to this folder called mymodule.py. You can do this from your terminal by following the code block below.

# make sure we are located in the repo dir
cd ~/hacks/hack-7-python

# make a new subdirectory 
mkdir -p mypackage/

# make an empty file 
touch ./mypackage/mymodule.py

File trees¶

As an aside, let's install and use an interesting tool for visualizing the filestructure of our repository. This will make it easier to keep track of and understand how our files are organized, especially as we continue to make more complex modules with many files. Use conda from your terminal to install the program tree:

conda install tree -c conda-forge

We can now use the tree command from within our repo to view the file structure in a nicely formatted "file tree" design. In the next sections our goal will be to put Python code into the mymodule.py file located in the mypackage folder, and to be able to import that code into a notebook located in the notebooks directory.

tree .

.
├── mypackage
│   └── mymodule.py
├── notebooks
│   └── nb-7.0-subprocess.ipynb
└── README.md

2 directories, 7 files

TLDR; a video demonstration¶

Watch the video below for a visual demonstration of what we plan to accomplish, and then follow along with the rest of the tutorial for a slower paced explanation. Click to make video larger.

Write a Python module¶

Let's add a simple function to the mymodule.py file. In the video example above I wrote a short py script. Here you can just copy the code below and paste it into the file using nano or another text editor.

# open the myscript.py file in the nano text editor
nano ./mypackage/mymodule.py

Copy and paste the code below into the myscript.py file and save and close it.

#!/usr/bin/env python
"magic eight ball function to tell your future."

import random

def magic_eight_ball():
    """
    Returns a random statement from a magic eight ball containing
    a 10 sided die (I was too lazy to write all 20 typical answers)
    https://en.wikipedia.org/wiki/Magic_8-Ball
    """
    RESPONSES = {
        0: "It is certain.",
        1: "It is decidedly so.",
        2: "Without a doubt.",
        3: "Yes – definitely.",
        4: "You may rely on it.",
        5: "Reply hazy, try again.",
        6: "Better not tell you now.",
        7: "Cannot predict now.",
        8: "My reply is no",
        9: "Outlook not so good",
        10: "Very doubtful",
    }
    return RESPONSES[random.choice(range(10))]

Importing design¶

So far you have learned how to use the import statement to import code from Python packages and/or modules that are part of the standard library. These are a collection of Python scripts organized into folders, similar to what you will be creating here. As an example, we learned about the os.path module in an earlier tutorial, which is used to format file path strings. This module is part of the os package. The functions located in the path module can be accessed in several ways from the os package:

import os
os.path.join

from os import path
path.join

from os import *
path.join

from os.path import join
join

Our goal here will be to design our package in the same way, so that you can import the function magic_eight_ball() from mypackage.mymodule in all of these same ways.

Packages and modules¶

A module is a script that can be imported. A package is a folder full of modules. The dot format in the example above, where the file or function names are nested within another name, is meant to recapitulate the file structure in which these files or functions are written.

So far we have a folder (package) and a file (module) and within it a function. Does this mean that we can now import this code from any other Python file? No. We need to do a few more steps to make it possible for Python to know that this package can be imported. For this, we need to learn about sys.path: the location where Python looks for modules. This is simply a list of filepaths represented as strings.

import can only import packages or modules from folders listed in sys.path. By default this will include only the location of standard library packages in ~/miniconda3/python3.8/, of other installed packages (e.g., by conda or pip) in ~/miniconda3/python3.8/site-packages, as well as your current directory (./). Here we will learn how to add additional paths to the sys.path variable so that we an import any code. This is particularly useful (1) during code development; or (2) for importing a small number of scripts that do not compose a full library. Later we will learn to design packages that are installable, meaning that they will be copied into the (~/miniconda3/python3.8/site-packages) dir where other packages are located.

Open a new notebook from in the notebooks dir and rename it import-test and follow along. From inside the notebook import the sys package from the standard library and examine the sys.path variable. (see the video tutorial above to make sure you are following as intended.)

# import sys from the standard lib
import sys
print(sys.path)

To add new locations for Python to find packages you can append a new string to the sys.path list. This new string should point to the parent directory of your package (the directory containing the package directory). The path can be written as either a full path or relative path. Because we are currently located within a notebook in the notebooks/ dir, the mypackage dir is located up one directory (in the parent directory of our current dir). See the tree output above to confirm this. Therefore we can add our current parent dir to the sys.path to make the mypackage/ folder importable. (Using a relative path as opposed to a full path here is actually preferred, since if someone else cloned our repo and ran the code in this notebook, it would be able to find and import the code from the parent dir (..) without requiring them to change the path, which they otherwise would need to do if writing a fullpath.)

import sys

# append your current parent dir to the sys.path list
sys.path.append("../")

# show the updated sys.path 
print(sys.path)

['/home/deren/miniconda3/envs/dev/bin',
 '/home/deren/miniconda3/envs/dev/lib/python38.zip',
 '/home/deren/miniconda3/envs/dev/lib/python3.8',
 '/home/deren/miniconda3/envs/dev/lib/python3.8/lib-dynload',
 '/home/deren/miniconda3/envs/dev/lib/python3.8/site-packages',
 '/home/deren/miniconda3/envs/dev/lib/python3.8/site-packages/IPython/extensions',
 '/home/deren/.ipython',
 '..']

Why does this make `mypackage` importable?¶

Any folder that is located inside of one of the folders listed above can be imported. The mypackage folder is located in the filepath that we appended to the end of the list (../). This will make the following import statements available to us that will allow us to access the magic_eight_ball() function in the mymodule script. You can test this from your notebook, and you can also explore what is accessible to import from each object by using tab-completion.

import mypackage.mymodule
mypackage.mymodule.magic_eight_ball()

from mypackage import mymodule
mymodule.magic_eight_ball()

from mypackage.mymodule import magic_eight_ball
magic_eight_ball()

The only method that is not yet supported is to be able to import the package name alone and access all objects nested within it. This is slightly different from the first example above, and would look like this:

# this workflow is not yet supported
import mypackage
mypackage.mymodule.magic_eight_ball()

This last method is particularly convenient, since it allows the user to explore the entire package themselves to find any objects that might be useful. So how do we support this last method?

The `init.py` script¶

To support this last mode for import we need to learn about a special file called __init__.py. You can tell it is special because it uses the dunder naming convention. An init file is a file that is automatically run when a package is imported. It is placed inside of a folder and used to import other files or folders that are nested within this folder. By using an __init__.py file to only select some of the subfolders or files in a folder you can limit or expand the scope of what the user will see when using tab-completion to search for possible importable modules. This is a useful design feature that can be used to organize your code so that all of the most useful class and function objects are accessible from the top level package name, or from particular modules. Let's create an init file and edit its contents:

# add an __init__.py file to the mypackage dir
cd ~/hacks/hack-7-python/
touch mypackage/__init__.py
nano mypackage/__init__.py

In the __init__.py file we will add a shebang and docstring, and then add an import statement. In this case the import is making it so that the package will automatically import the module. In other words, mypackage.mymodule will be automatically imported. Here we use the convention from . to tell it where to the mymodule module is located, where . means the current directory.

#!/usr/bin/env python
"""
The mypackage package is used to learn about package filestructure.
"""
from . import mymodule

Now if we restart the notebook and update our sys.path variable as before, we should be able to access all contents of the mypackage folder from the top level name. In addition, we can view a docstring for the package which we defined in the init file.

# add our local package scope to sys.path
import sys
sys.path.append("..")

# access our module function from the package-level import
import mypackage
mypackage.mymodule.magic_eight_ball()

# show the package-level docstring
mypackage?

Summary¶

modules are Python scripts located inside folders.
packages are folders containing one or more modules.
Both of these things can be imported, allowing you to access folders, files, or code objects within them.
An __init__.py file can be used to make objects nested within a package (modules or their contents) accessible from the higher-level imported object (package).
Packages or modules can be imported if they are in your sys.path variable.
You can edit sys.path to add new paths to it to make your code importable.
We will learn later how to make packages 'installable', such that they will automatically be added to your sys.path.

Assessment¶

Ensure that your code is working and can successfully import and run the code in the block above. If it is working then save and close your notebook. Use git to add, commit, and push your mypackage/ dir and your test notebook to your forked git repo for grading.