On Conda and Sand Mandalas¶
What is conda?¶
Conda is an open-source software management
tool for installing software packages, as well as their dependencies, and
creating sandboxed environments for executing code. Using the conda
command
line tool you can use simple commands to search for software packages, select
specific versions, and install them locally on your machine. This automated
process makes installing and removing software simple and reproducible which makes
it easier to design, distribute and use working software.
Why use conda?¶
The many advantages of using conda include:
-
command-line convenience: the conda command line program allows you to search for and install tools with simple commands that can even be written as scripts for automation. This makes it easy to replicate the set of software tools installed on one computer onto another machine.
-
finding dependencies: Almost every software program builds on and requires other software packages as dependencies. Rather than telling a user to go find and install each of these dependencies on their own (a sure sign of a poorly developed tool by today's standards) a software package manager can instead fetch and install of the dependencies for them. This might even include different dependencies or versions depending on their specific operating system. This is a very complex task and something conda does very well.
-
sandboxed directory: conda installs software into a sandboxed location on your computer (usually a directory within HOME), which is done purposefully to keep your conda software completely separate and isolated from your system-wide software (which is usually in /bin or /usr/bin). This gives you peace of mind to install, update, and remove packages as much as you want inside of your conda directory without having to worry that it might ever impact your system programs.
-
environments: In addition to allowing you to install software programs into a sandboxed location, conda also allows you to keep many separate environments, where you can keep different sets of software or versions of them. This makes it easy to test software tools across different version of dependencies, or to keep software separate that uses different conflicting dependencies.
Install conda (miniconda3)¶
There are two main flavors of conda that you can install: Anaconda and Miniconda. Both include a version of Python and the conda program (which is written in Python) as well as a few dependencies of conda for fetching information about packages online. However, the two flavors differ in terms of which other tools come pre-loaded with these base resources. Anaconda comes fully loaded with dozens of commonly used Python packages, whereas Miniconda in totally minimal, and doesn't come with anything extra at all. I always recommend installing Miniconda, and then adding to it any software that you want to install.
To install Miniconda you can google search 'Miniconda install' and it will point you to the following miniconda install page. Here you will see installation instructions different versions of Miniconda. First, there is a version for different operating systems (Window, MacOSX, Linux). If you are on a Mac then select from the Mac section, if you are on Linux or Windows Subsystem for Linux then select the Linux version. Do not install the Windows version. We can download and install conda directly from the command line following these steps (be sure to choose the instructions for your OS):
# cd to your HOME directory
cd ~
# download Miniconda installer
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
# call bash to install conda from the .sh install script in 'batch' (auto) mode
bash Miniconda3-latest-Linux-x86_64.sh -b
# check that the miniconda3 folder now appears in HOME
ls -l ~
# cd to your HOME directory
cd ~
# download Miniconda installer (if you have an older intel mac you will need
# to use a different link by replacing `arm64` with `x86_64`)
curl -O https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-arm64.sh
# call bash to install conda from the .sh install script in 'batch' (auto) mode
bash Miniconda3-latest-MacOSX-x86_64.sh -b
# check that the miniconda3 folder now appears in HOME
ls -l ~
When you execute the bash
command above it will start running the installer
script. By default, this will select ~/miniconda3
as the location to install
conda and its associated folders. That's good. It should take a minute or less to finish.
Adding conda
to your PATH¶
At this point we have now created a new folder containing conda, but it is not yet added to our PATH, meaning that we can't easily use this software yet. Our goal here will be set the miniconda3/ directory at the front of our PATH variable. This means that our shell will look here first for any software, and only look in the other folders in the PATH for software if it was not already found here. One specific reason this is useful is that your shell will find this new version of Python (3.8) in Miniconda and use it instead of some stodgy old system-wide version that is likely lurking deep in your computer somewhere. Before we proceed let's look at both the PATH environment variable (to see where programs are being searched for), and alsowhich version of Python is currently set as your default (i.e., at the front of you PATH).
# Print the PATH environment variable
echo $PATH
# IF one is in your path this will show it.
which python
# this will show all versions of Python on your system
whereis python
Let's now add conda to your PATH by editing the dotfile (e.g., .bashrc)
in your HOME directory. While we could do this by hand, conda has
developed a convenient script that can do it for us, and do it in kind
of a fancy way. So let's use this tool called conda init
. To use it
we will need to provide the full path to the binary file since it is
not yet in our PATH. Run the command below and you should see an example
output similar to below.
~/miniconda3/condabin/conda init $SHELL
no change /home/deren/miniconda3/condabin/conda no change /home/deren/miniconda3/bin/conda no change /home/deren/miniconda3/bin/conda-env no change /home/deren/miniconda3/bin/activate no change /home/deren/miniconda3/bin/deactivate no change /home/deren/miniconda3/etc/profile.d/conda.sh no change /home/deren/miniconda3/etc/fish/conf.d/conda.fish no change /home/deren/miniconda3/shell/condabin/Conda.psm1 no change /home/deren/miniconda3/shell/condabin/conda-hook.ps1 no change /home/deren/miniconda3/lib/python3.8/site-packages/xontrib/conda.xsh no change /home/deren/miniconda3/etc/profile.d/conda.csh modified /home/deren/.bashrc ==> For changes to take effect, close and re-open your current shell <==
You can see that this edited the dotfile in my home directory. Remember,
this is the file that is run every time you open a terminal which loads
a bunch of variables including the PATH where software is found and
to set the style and colors of the prompt. When you run conda init $SHELL
it writes a new block at the end of the dotfile for your specific shell
telling it to make your terminal aware of conda whenever it starts up.
If you wanted to stop it from doing this you would only need to remove that block of text
from the dotfile. The code in this block does two things:
(1) it tells your prompt to show the name of the conda environment;
and (2) it adds the filepath of your miniconda3 directory to the front
of your PATH variable so that you can find all of the
tools there. For this to go into effect close and reopen your terminal.
First let's check that the PATH has actually been modified, which should show something like this:
echo $PATH
/home/jovyan/miniconda3/bin:/opt/conda/condabin:/opt/conda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
You can further test whether the conda
command line tool is in your PATH
by typing the following command, which will print info about your conda
directory.
conda info
active environment : base active env location : /home/deren/miniconda3 shell level : 1 user config file : /home/deren/.condarc populated config files : conda version : 4.9.2 conda-build version : not installed python version : 3.8.5.final.0 virtual packages : __glibc=2.31=0 __unix=0=0 __archspec=1=x86_64 base environment : /home/deren/miniconda3 (writable) channel URLs : https://repo.anaconda.com/pkgs/main/linux-64 https://repo.anaconda.com/pkgs/main/noarch https://repo.anaconda.com/pkgs/r/linux-64 https://repo.anaconda.com/pkgs/r/noarch package cache : /home/deren/miniconda3/pkgs /home/deren/.conda/pkgs envs directories : /home/deren/miniconda3/envs /home/deren/.conda/envs platform : linux-64 user-agent : conda/4.9.2 requests/2.25.1 CPython/3.8.5 Linux/5.8.0-38-generic ubuntu/20.04.1 glibc/2.31 UID:GID : 1000:1000 netrc file : None offline mode : False
Let's also look again for where Python is installed on our system. We should see a version of Python located in our miniconda3 directory at the front of the PATH:
# this should show the miniconda Python path
which python
/home/deren/miniconda3/bin/python
Info
Take note of this result. This is the path to the version of Python that we will be using extensively throughout this course. It is located inside of your miniconda3/ directory, and then inside of a directory called bin/. This latter subdirectory is where all binaries (executable callable programs) installed by conda will be located.
Using conda¶
The command line tool is called conda
. This is a binary
(executable program) written in Python and installed in your
miniconda directory. Because this directory is in your PATH the
conda
binary is also in your PATH. This is why you can call it
from your terminal without needing to write the full path to the
location of the binary. Let's try installing some other binaries
using conda. The syntax for installing a program with conda is
conda {program_name} -c {channel_name}
. Try the example below:
conda install cowpy -c conda-forge
This installed a goofy Python program called cowsay that can be
used to make funny ASCII drawings in your terminal. Under the hood
it installed a Python package, and also a binary command line tool
that ships with this Python library. The Python library is installed
into miniconda3/pkgs/
and the binary is installed into miniconda3/bin/
Let's try out the cowpy binary by writing some text in the terminal
with echo
and piping it to the cowpy
program. For more details on the
cowpy program you can find it on GitHub
echo "hello world" | cowpy
Conda channels¶
Different software packages are found on different channels on conda.
Channels are just big online folders where conda recipes are stored.
Most software is available on one of three channels:
"default", "conda-forge", and "bioconda". The bioconda channel includes
many bioinformatics specific tools. The conda-forge channel contains the
cutting-edge versions of almost all software, and is very actively community
maintained. The default channel is maintained by the anaconda organization, and
tends to be a little out of date, and has the least software. You can find out
which channel a software package is available on by googling conda {software_name}
,
or by using the anaconda-client
package to search all available channels.
Let's practice installing packages by installing anaconda-client
.
conda install -c conda-forge anaconda-client
Let's test it out by searching for vcftools
, a commonly used tool for manipulating
VCF files ("variant call format") which are used primarily for SNP data.
# `anaconda` is the tool we can use for searching channels for software
anaconda search vcftools
anaconda search vcftools Using Anaconda API: https://api.anaconda.org Packages: Name | Version | Package Types | Platforms | Builds -------------------------------- | ------ | ----------------- | --------------- | ---------- BioBuilds/vcftools | 0.1.15 | Conda | linux-64, osx-64, linux-ppc64le | pl522h7632db0_0, pl522h49bf30a_0, pl522hf9702e9_0, pl5.22.0_0 bioconda/gvcftools | 0.17.0 | Conda | linux-64, osx-64 | boost1.60_0, pl5.22.0_2, he941832_3, pl5.22.0_1 bioconda/perl-vcftools-vcf | 0.1.16 | Conda | linux-64, osx-64, noarch | pl526_1, pl5321hdfd78af_4, pl526_2, pl526_0, 2, pl5.22.0_1, pl5.22.0_0, 3 : cpanm ready distribution of VCFtools Perl libraries bioconda/vcftools | 0.1.16 | Conda | linux-64, osx-64, linux-aarch64, osx-arm64 | 5, pl5321h1e84f2d_12, pl5321h6057758_11, pl5321h66d0458_8, pl5321hdcf5f25_11, h9a82719_5, h87af4ef_5, he860b03_3, 4, 0, pl526h8b12597_1, pl5321hdcf5f25_8, pl5321h7f4e536_11, pl5.22.0_2, 2, pl5321hd03093a_8, pl5321hda5e58c_12, he941832_2, pl526hdbcaa40_0, pl5321h6151dfb_7, pl5321h077b44d_12, pl5321hdcf5f25_10, pl5321hdcf5f25_9, he513fc3_4, pl5262hfd59bb5_2, ha92aebf_2, pl526hd174df1_1, h7475705_4, pl5321h447d7a5_11, pl5.22.0_1, pl5321hdf58011_9, pl526hd9629dc_0, pl5321h7f4e536_10, 1, pl5262h2e03b76_2, pl5321h87af4ef_6, pl5321hdf58011_10, pl5321hd03093a_7, pl5321h2ec61ea_12, pl5321h9a82719_6, pl5.22.0_0, pl5321h2e03b76_3, 3, h5c9b4e4_3 : A set of tools written in Perl and C++ for working with VCF files. This package only contains the C++ libraries whereas the package perl-vcftools-vcf contains the perl libraries brown-data-science/vcftools | 0.1.15 | Conda | linux-64 | 1 compbiocore/perl-vcftools-vcf | 0.840 | Conda | linux-64 | pl526_1 : cpanm ready distribution of VCFtools Perl libraries compbiocore/vcftools | 0.1.15 | Conda | linux-64 | h1d3419f_0, 1 pstey/vcftools | 0.1.15 | Conda | linux-64 | 1 Found 8 packages Run 'anaconda show' to get installation details
Specifying -c conda-forge
when we installed anaconda-client
tells conda to search
for the package in this channel (rather than others). After years of using conda
I strongly recommend setting the conda-forge channel as your default channel. Rather
than typing in conda install -c conda-forge <package>
every time (which is tedious), you can
set conda-forge as the default using conda config
. This means that it will
always look here first for a requested package or any of its dependencies, and
only look at other channels after first looking here. This is a good thing.
conda config --add channels conda-forge
Here we are setting a configuration preference. So how do you think that
was done? That's right, it wrote it to a dotfile. In this case it simply
added the preferred conda channel order to a file called ~/.condarc
.
Let's test this out by installing another package. Now we no longer
need to tell it -c conda-forge
since it will look there by default.
(Personally, I still usually write it anyways just out of habit, so
you may see it in future instructions).
# install the Python 'requests' packages
conda install requests
And you can see that the order of channels to search through as been modified by our change of the conda configuration.
Channels: - conda-forge - defaults Platform: linux-64
Conda environments¶
In this class we will probably mostly use only the 'base' conda environment. However, you can create and load many separate conda environments, where each contains a different isolated set of software. Let's go ahead and install vcftools in a new, clean conda environment.
# create a new Python environment names 'vcftools' and activate it
conda create -n vcftools
conda activate vcftools
Notice that when you activate a new environment conda by default will change your command line prompt to indicate that you are now in the (vcftools) environment. This is useful!
Now install vcftools in this new environment, remembering that when we searched
for vcftools earlier that we found it in the bioconda
channel.
# Add -y to the command line to skip the prompt of whether to proceed with the install
conda install -c bioconda vcftools -y
After the install completes you can test to verify that vcftools is installed.
vcftools --version
VCFtools (0.1.16)
Also, take a look at your PATH environment variable. Conda is manipulating this behind the scenes to isolate installed packages within the environments you specify.
echo $PATH
/home/jovyan/miniconda3/envs/vcftools/bin:/opt/conda/condabin:/opt/conda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
Notice that the first element of the PATH is now ~/miniconda3/envs/vcftools/bin
.
Switch back to the base conda environment.
conda activate base
#Another alternative is to 'deactivate' the current environment, which will
#fall back to the previous environment by default.
conda deactivate
When you switch back to base
two things happen: 1) The prompt changes again to reflect
that you are back in the (base) environment; and 2) your PATH also changes, which you can
verify:
echo $PATH
/home/jovyan/miniconda3/bin:/opt/conda/condabin:/opt/conda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
Verify that vcftools is indeed no longer available in the base environment:
which vcftools
Right now you only have two environments, so it's not hard to remember which is which,
but if you are doing lots of data science you might have numerous environments for all the
different analytical tools in your toolkit. You can see which environments are available to you
by asking conda to list
them.
conda env list
base * /home/deren/miniconda3 vcftools /home/deren/miniconda3/envs/vcftools
On my machine I have lots of conda environments, so my output looks like this:
# conda environments: # base /Users/isaac/miniconda3 PTA /Users/isaac/miniconda3/envs/PTA bart /Users/isaac/miniconda3/envs/bart bci /Users/isaac/miniconda3/envs/bci ee * /Users/isaac/miniconda3/envs/ee ipyrad /Users/isaac/miniconda3/envs/ipyrad mess /Users/isaac/miniconda3/envs/mess tmp /Users/isaac/miniconda3/envs/tmp
The purposeful impermanence of conda¶
In Tibetan Buddhism there is a tradition involving the creation of large complex mosaic mandalas carefully constructed from colored sand, often involving many weeks of work. Upon completion of the artwork it is then destroyed. This meditative exercise is intended to make one reflect on the impermanence of life, and to experience the act of letting go.
The sand mandala tradition has a lot to teach us about our relationships with our conda software directory. You may feel that after working with conda for several weeks that you have installed the perfect set of software tools that includes everything you will ever need. And you become very attached to it. But, you should feel free to let it go.
Software becomes outdated quickly, and updating many many software packages to new versions can sometimes conflict with other programs in your environment to cause problems. Usually conda can fix these problems and find the right versions that work together. But in some cases it can't. It simply worked itself into a corner where it basically needs to uninstall and reinstall everything again. This is the point when you should consider destroying it.
Remove a conda environment¶
You can remove a specific environment by name, if you are no longer using it.
conda env remove -n vcftools
Check that this indeed removed your vcftools environment:
conda env list
# conda environments: # base * /home/jovyan/miniconda3
Remove the entire conda directory¶
Or, you can also remove the entire conda directory altogether.
Remember, we just installed it above, and took just a single command. It's very
fast and easy. To remove the conda directory you can use the rm
command along
with the options -r
and -f
.
The -r
option tells it that the thing we want to remove is a
folder. The -f
command means 'force'. This is a slightly dangerous command
if you were to tell it to delete something that you shouldn't. So take care that
the file path after this command is the one that you actually want to remove
and not something else (like your entire filesystem).
# this command would remove your conda directory completely
# rm -rf ~/miniconda3
Practice creating environments and installing software¶
For the next exercise you'll practice creating new environments, installing software, changing environments, and removing environments.
- Create a new conda environment called
bedtools
- Change to this new environment and verify that your PATH has changed
- Install bedtools from the bioconda channel
- Verify that bedtools is installed:
bedtools --version
- Deactivate the bedtools environment
- Check that bedtools is no longer available
- Create a new environment called
msprime
and activate it - Install msprime from the conda-forge channel
- Check that it is installed with
msp -V
- Activate the base environment
- Verify your current PATH and check that bedtools and msp are not available here
- Remove the
bedtools
andmsprime
environments - Blow away your entire conda install with
rm -rf ~/miniconda3
- Reinstall miniconda following the directions above
- In the new clean conda base environment install anaconda-client
- Now you're ready for the next exercise