importing
Understanding imports and modules¶
Learning objectives¶
By the end of this tutorial you should:
- Understand that Python packages are just folders full of Python scripts.
- Understand that
sys.path
lists the locations of importable packages. - Be able to import custom Python packages.
- Be familiar with how to organize Python scripts into a package.
The import
statement¶
The import
statement is one of the first things located in
any Python script. It is used to load Python code from other
files located on your system. But what is it actually importing?
What do those files and folders look like?
In your last tutorial you learned how to write a single
Python script that contains code that can be imported. This
is often referred to as a module. Here we will learn about writing
a collection of modules together in a folder, which is called a
package. Both modules and packages are very similar in the
way that import
statements are used to access code from
Python files to make it accessible in other places.
Organizing a package¶
Python packages are only useful when they are organized in a way that makes it easy to understand how they should be used. Because GitHub has become a standard place to store code, we will discuss the organization of our code more broadly in terms of how it should be organized in a git repository. It is useful to follow a similar set of conventions whether the repo is intended as a Python software package, or if it is simply an archive of a research project.
In either case, we will usually have the following:
- a README file in the top level directory describing the project.
- a code directory (that can take different names) containing Python code.
- a notebooks directory containing demonstrations/analyses of the code.
- a data directory containing example data to be analyzed in the notebooks.
Here we will focus on the structure of your code directory.
You have already cloned the repo hack-7-python
which currently contains
only a README file and notebooks directory. Let's create an additional folder
to contain our scripts, called mypackage
, and add an empty file
to this folder called mymodule.py
. You can do this from your terminal
by following the code block below.
# make sure we are located in the repo dir
cd ~/hacks/hack-7-python
# make a new subdirectory
mkdir -p mypackage/
# make an empty file
touch ./mypackage/mymodule.py
File trees¶
As an aside, let's install and use an interesting tool for visualizing the
filestructure of our repository. This will make it easier to keep track of
and understand how our files are organized, especially as we continue to
make more complex modules with many files. Use conda from your terminal
to install the program tree
:
conda install tree -c conda-forge
We can now use the tree
command from within our repo to view the
file structure in a nicely formatted "file tree" design. In the
next sections our goal will be to put Python code into the mymodule.py
file located in the mypackage folder, and to be able to import that
code into a notebook located in the notebooks directory.
tree .
. ├── mypackage │ └── mymodule.py ├── notebooks │ └── nb-7.0-subprocess.ipynb └── README.md 2 directories, 7 files
TLDR; a video demonstration¶
Write a Python module¶
Let's add a simple function to the mymodule.py file. In the video example
above I wrote a short py script. Here you can just copy the code
below and paste it into the file using nano
or another text editor.
# open the myscript.py file in the nano text editor
nano ./mypackage/mymodule.py
#!/usr/bin/env python
"magic eight ball function to tell your future."
import random
def magic_eight_ball():
"""
Returns a random statement from a magic eight ball containing
a 10 sided die (I was too lazy to write all 20 typical answers)
https://en.wikipedia.org/wiki/Magic_8-Ball
"""
RESPONSES = {
0: "It is certain.",
1: "It is decidedly so.",
2: "Without a doubt.",
3: "Yes – definitely.",
4: "You may rely on it.",
5: "Reply hazy, try again.",
6: "Better not tell you now.",
7: "Cannot predict now.",
8: "My reply is no",
9: "Outlook not so good",
10: "Very doubtful",
}
return RESPONSES[random.choice(range(10))]
Importing design¶
So far you have learned how to use the import
statement to import code
from Python packages and/or modules that are part of the standard library.
These are a collection of Python scripts organized into folders,
similar to what you will be creating here. As an example, we learned about
the os.path
module in an earlier tutorial, which is used to format file
path strings. This module is part of the os
package. The functions located
in the path
module can be accessed in several ways from the os
package:
import os
os.path.join
from os import path
path.join
from os import *
path.join
from os.path import join
join
Our goal here will be to design our package in the same way, so that you
can import the function magic_eight_ball()
from mypackage.mymodule
in all of these same ways.
Packages and modules¶
A module is a script that can be imported. A package is a folder full of modules. The dot format in the example above, where the file or function names are nested within another name, is meant to recapitulate the file structure in which these files or functions are written.
So far we have a folder (package) and a file (module) and within it
a function. Does this mean that we can now import this code from
any other Python file? No. We need to do a few more steps to make it
possible for Python to know that this package can be imported.
For this, we need to learn about sys.path
: the location where
Python looks for modules. This is simply a list of filepaths
represented as strings.
import
can only import packages or modules from folders
listed in sys.path
. By default this will include only the
location of standard library packages in
~/miniconda3/python3.8/
,
of other installed packages (e.g., by conda or pip) in
~/miniconda3/python3.8/site-packages
,
as well as your current directory
(./
).
Here we will learn how to add additional paths to the
sys.path
variable so that we an import any code.
This is particularly useful (1) during code development; or (2)
for importing a small number of scripts that do not compose a
full library.
Later we will learn to design packages that
are installable, meaning that they will be copied into the
(~/miniconda3/python3.8/site-packages
) dir
where other packages are located.
Open a new notebook from in the notebooks dir and rename it import-test
and follow along. From inside the notebook import the sys
package from
the standard library and examine the sys.path
variable. (see the video
tutorial above to make sure you are following as intended.)
# import sys from the standard lib
import sys
print(sys.path)
To add new locations for Python to find packages you can append a new string
to the sys.path
list. This new string should point to the parent directory of
your package (the directory containing the package directory). The path can be written
as either a full path or relative path. Because we are currently located within
a notebook in the notebooks/
dir, the mypackage
dir is located up one directory
(in the parent directory of our current dir). See the tree
output above to
confirm this. Therefore we can add our current parent dir to the sys.path
to make the mypackage/
folder importable. (Using a relative path as opposed
to a full path here is actually preferred, since if someone else cloned our repo and
ran the code in this notebook, it would be able to find and import the code from
the parent dir (..
) without requiring them to change the path, which they otherwise
would need to do if writing a fullpath.)
import sys
# append your current parent dir to the sys.path list
sys.path.append("../")
# show the updated sys.path
print(sys.path)
['/home/deren/miniconda3/envs/dev/bin', '/home/deren/miniconda3/envs/dev/lib/python38.zip', '/home/deren/miniconda3/envs/dev/lib/python3.8', '/home/deren/miniconda3/envs/dev/lib/python3.8/lib-dynload', '/home/deren/miniconda3/envs/dev/lib/python3.8/site-packages', '/home/deren/miniconda3/envs/dev/lib/python3.8/site-packages/IPython/extensions', '/home/deren/.ipython', '..']
Why does this make mypackage
importable?¶
Any folder that is located inside of one of the folders listed above can
be imported. The mypackage
folder is located in the filepath that we
appended to the end of the list (../
). This will make the following
import
statements available to us that will allow us to access the
magic_eight_ball()
function in the mymodule
script. You can test this
from your notebook, and you can also explore what is accessible to import
from each object by using tab-completion.
import mypackage.mymodule
mypackage.mymodule.magic_eight_ball()
from mypackage import mymodule
mymodule.magic_eight_ball()
from mypackage.mymodule import magic_eight_ball
magic_eight_ball()
The only method that is not yet supported is to be able to import the package name alone and access all objects nested within it. This is slightly different from the first example above, and would look like this:
# this workflow is not yet supported
import mypackage
mypackage.mymodule.magic_eight_ball()
The __init__.py
script¶
To support this last mode for import
we need to learn about a special
file called __init__.py
. You can tell it is special because
it uses the dunder naming convention. An init file is a file that is
automatically run when a package is imported. It is placed inside of
a folder and used to import
other files or folders that are nested
within this folder. By using an __init__.py
file to only select some
of the subfolders or files in a folder you can limit or expand
the scope of what the user will see when using tab-completion to search
for possible importable modules. This is a useful design feature that
can be used to organize your code so that all of the most useful class
and function objects are accessible from the top level package name, or
from particular modules. Let's create an init file and edit its contents:
# add an __init__.py file to the mypackage dir
cd ~/hacks/hack-7-python/
touch mypackage/__init__.py
nano mypackage/__init__.py
In the __init__.py
file we will add a shebang and docstring,
and then add an import statement. In this case the import is making
it so that the package will automatically import the module. In other
words, mypackage.mymodule
will be automatically imported. Here we
use the convention from .
to tell it where to the mymodule module
is located, where .
means the current directory.
#!/usr/bin/env python
"""
The mypackage package is used to learn about package filestructure.
"""
from . import mymodule
Now if we restart the notebook and update our sys.path
variable as
before, we should be able to access all contents of the mypackage
folder from the top level name. In addition, we can view a docstring
for the package which we defined in the init file.
# add our local package scope to sys.path
import sys
sys.path.append("..")
# access our module function from the package-level import
import mypackage
mypackage.mymodule.magic_eight_ball()
# show the package-level docstring
mypackage?
Summary¶
- modules are Python scripts located inside folders.
- packages are folders containing one or more modules.
- Both of these things can be imported, allowing you to access folders, files, or code objects within them.
- An
__init__.py
file can be used to make objects nested within a package (modules or their contents) accessible from the higher-level imported object (package). - Packages or modules can be imported if they are in your
sys.path
variable. - You can edit
sys.path
to add new paths to it to make your code importable. - We will learn later how to make packages 'installable', such that they will
automatically be added to your
sys.path
.