Skip to content

On Conda and Sand Mandalas

What is conda?

Conda is an open-source software management tool for installing software packages, as well as their dependencies, and creating sandboxed environments for executing code. Using the conda command line tool you can use simple commands to search for software packages, select specific versions, and install them locally on your machine. This automated process makes installing and removing software simple and reproducible which makes it easier to design, distribute and use working software.

Why use conda?

The many advantages of using conda include:

  • command-line convenience: the conda command line program allows you to search for and install tools with simple commands that can even be written as scripts for automation. This makes it easy to replicate the set of software tools installed on one computer onto another machine.

  • finding dependencies: Almost every software program builds on and requires other software packages as dependencies. Rather than telling a user to go find and install each of these dependencies on their own (a sure sign of a poorly developed tool by today's standards) a software package manager can instead fetch and install of the dependencies for them. This might even include different dependencies or versions depending on their specific operating system. This is a very complex task and something conda does very well.

  • sandboxed directory: conda installs software into a sandboxed location on your computer (usually a directory within HOME), which is done purposefully to keep your conda software completely separate and isolated from your system-wide software (which is usually in /bin or /usr/bin). This gives you peace of mind to install, update, and remove packages as much as you want inside of your conda directory without having to worry that it might ever impact your system programs.

  • environments: In addition to allowing you to install software programs into a sandboxed location, conda also allows you to keep many separate environments, where you can keep different sets of software or versions of them. This makes it easy to test software tools across different version of dependencies, or to keep software separate that uses different conflicting dependencies.

Install conda (miniconda3)

There are two main flavors of conda that you can install: Anaconda and Miniconda. Both include a version of Python and the conda program (which is written in Python) as well as a few dependencies of conda for fetching information about packages online. However, the two flavors differ in terms of which other tools come pre-loaded with these base resources. Anaconda comes fully loaded with dozens of commonly used Python packages, whereas Miniconda in totally minimal, and doesn't come with anything extra at all. I always recommend installing Miniconda, and then adding to it any software that you want to install.

To install Miniconda you can google search 'Miniconda install' and it will point you to the following miniconda install page. Here you will see installation instructions different versions of Miniconda. First, there is a version for different operating systems (Window, MacOSX, Linux). If you are on a Mac then select from the Mac section, if you are on Linux or Windows Subsystem for Linux then select the Linux version. Do not install the Windows version. We can download and install conda directly from the command line following these steps (be sure to choose the instructions for your OS):

# cd to your HOME directory
cd ~

# download Miniconda installer
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh

# call bash to install conda from the .sh install script in 'batch' (auto) mode
bash Miniconda3-latest-Linux-x86_64.sh -b

# check that the miniconda3 folder now appears in HOME
ls -l ~
# cd to your HOME directory
cd ~

# download Miniconda installer (if you have an older intel mac you will need
# to use a different link by replacing `arm64` with `x86_64`)
curl -O https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-arm64.sh

# call bash to install conda from the .sh install script in 'batch' (auto) mode
bash Miniconda3-latest-MacOSX-x86_64.sh -b

# check that the miniconda3 folder now appears in HOME
ls -l ~

When you execute the bash command above it will start running the installer script. By default, this will select ~/miniconda3 as the location to install conda and its associated folders. That's good. It should take a minute or less to finish.

Adding conda to your PATH

At this point we have now created a new folder containing conda, but it is not yet added to our PATH, meaning that we can't easily use this software yet. Our goal here will be set the miniconda3/ directory at the front of our PATH variable. This means that our shell will look here first for any software, and only look in the other folders in the PATH for software if it was not already found here. One specific reason this is useful is that your shell will find this new version of Python (3.8) in Miniconda and use it instead of some stodgy old system-wide version that is likely lurking deep in your computer somewhere. Before we proceed let's look at both the PATH environment variable (to see where programs are being searched for), and alsowhich version of Python is currently set as your default (i.e., at the front of you PATH).

# Print the PATH environment variable
echo $PATH

# IF one is in your path this will show it.
which python

# this will show all versions of Python on your system
whereis python

Let's now add conda to your PATH by editing the dotfile (e.g., .bashrc) in your HOME directory. While we could do this by hand, conda has developed a convenient script that can do it for us, and do it in kind of a fancy way. So let's use this tool called conda init. To use it we will need to provide the full path to the binary file since it is not yet in our PATH. Run the command below and you should see an example output similar to below.

~/miniconda3/condabin/conda init $SHELL
no change     /home/deren/miniconda3/condabin/conda
no change     /home/deren/miniconda3/bin/conda
no change     /home/deren/miniconda3/bin/conda-env
no change     /home/deren/miniconda3/bin/activate
no change     /home/deren/miniconda3/bin/deactivate
no change     /home/deren/miniconda3/etc/profile.d/conda.sh
no change     /home/deren/miniconda3/etc/fish/conf.d/conda.fish
no change     /home/deren/miniconda3/shell/condabin/Conda.psm1
no change     /home/deren/miniconda3/shell/condabin/conda-hook.ps1
no change     /home/deren/miniconda3/lib/python3.8/site-packages/xontrib/conda.xsh
no change     /home/deren/miniconda3/etc/profile.d/conda.csh
modified     /home/deren/.bashrc

==> For changes to take effect, close and re-open your current shell <==

You can see that this edited the dotfile in my home directory. Remember, this is the file that is run every time you open a terminal which loads a bunch of variables including the PATH where software is found and to set the style and colors of the prompt. When you run conda init $SHELL it writes a new block at the end of the dotfile for your specific shell telling it to make your terminal aware of conda whenever it starts up. If you wanted to stop it from doing this you would only need to remove that block of text from the dotfile. The code in this block does two things: (1) it tells your prompt to show the name of the conda environment; and (2) it adds the filepath of your miniconda3 directory to the front of your PATH variable so that you can find all of the tools there. For this to go into effect close and reopen your terminal.

First let's check that the PATH has actually been modified, which should show something like this:

echo $PATH
/home/jovyan/miniconda3/bin:/opt/conda/condabin:/opt/conda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin

You can further test whether the conda command line tool is in your PATH by typing the following command, which will print info about your conda directory.

conda info
     active environment : base
    active env location : /home/deren/miniconda3
            shell level : 1
       user config file : /home/deren/.condarc
 populated config files : 
          conda version : 4.9.2
    conda-build version : not installed
         python version : 3.8.5.final.0
       virtual packages : __glibc=2.31=0
                          __unix=0=0
                          __archspec=1=x86_64
       base environment : /home/deren/miniconda3  (writable)
           channel URLs : https://repo.anaconda.com/pkgs/main/linux-64
                          https://repo.anaconda.com/pkgs/main/noarch
                          https://repo.anaconda.com/pkgs/r/linux-64
                          https://repo.anaconda.com/pkgs/r/noarch
          package cache : /home/deren/miniconda3/pkgs
                          /home/deren/.conda/pkgs
       envs directories : /home/deren/miniconda3/envs
                          /home/deren/.conda/envs
               platform : linux-64
             user-agent : conda/4.9.2 requests/2.25.1 CPython/3.8.5 Linux/5.8.0-38-generic ubuntu/20.04.1 glibc/2.31
                UID:GID : 1000:1000
             netrc file : None
           offline mode : False

Let's also look again for where Python is installed on our system. We should see a version of Python located in our miniconda3 directory at the front of the PATH:

# this should show the miniconda Python path
which python
/home/deren/miniconda3/bin/python

Info

Take note of this result. This is the path to the version of Python that we will be using extensively throughout this course. It is located inside of your miniconda3/ directory, and then inside of a directory called bin/. This latter subdirectory is where all binaries (executable callable programs) installed by conda will be located.

Using conda

The command line tool is called conda. This is a binary (executable program) written in Python and installed in your miniconda directory. Because this directory is in your PATH the conda binary is also in your PATH. This is why you can call it from your terminal without needing to write the full path to the location of the binary. Let's try installing some other binaries using conda. The syntax for installing a program with conda is conda {program_name} -c {channel_name}. Try the example below:

conda install cowpy -c conda-forge

This installed a goofy Python program called cowsay that can be used to make funny ASCII drawings in your terminal. Under the hood it installed a Python package, and also a binary command line tool that ships with this Python library. The Python library is installed into miniconda3/pkgs/ and the binary is installed into miniconda3/bin/ Let's try out the cowpy binary by writing some text in the terminal with echo and piping it to the cowpy program. For more details on the cowpy program you can find it on GitHub

echo "hello world" | cowpy

Conda channels

Different software packages are found on different channels on conda. Channels are just big online folders where conda recipes are stored. Most software is available on one of three channels: "default", "conda-forge", and "bioconda". The bioconda channel includes many bioinformatics specific tools. The conda-forge channel contains the cutting-edge versions of almost all software, and is very actively community maintained. The default channel is maintained by the anaconda organization, and tends to be a little out of date, and has the least software. You can find out which channel a software package is available on by googling conda {software_name}, or by using the anaconda-client package to search all available channels. Let's practice installing packages by installing anaconda-client.

conda install -c conda-forge anaconda-client

Let's test it out by searching for vcftools, a commonly used tool for manipulating VCF files ("variant call format") which are used primarily for SNP data.

# `anaconda` is the tool we can use for searching channels for software
anaconda search vcftools
anaconda search vcftools                                                                                                     
Using Anaconda API: https://api.anaconda.org                                                                                                               
Packages:                                                                                                                                                  
     Name                             |  Version | Package Types     | Platforms       | Builds                                                            
     -------------------------------- |   ------ | ----------------- | --------------- | ----------                                                        
     BioBuilds/vcftools               |   0.1.15 | Conda             | linux-64, osx-64, linux-ppc64le | pl522h7632db0_0, pl522h49bf30a_0, pl522hf9702e9_0, pl5.22.0_0                                                                                                                                                
     bioconda/gvcftools               |   0.17.0 | Conda             | linux-64, osx-64 | boost1.60_0, pl5.22.0_2, he941832_3, pl5.22.0_1                  
     bioconda/perl-vcftools-vcf       |   0.1.16 | Conda             | linux-64, osx-64, noarch | pl526_1, pl5321hdfd78af_4, pl526_2, pl526_0, 2, pl5.22.0_1, pl5.22.0_0, 3                                                                                                                                           
                                          : cpanm ready distribution of VCFtools Perl libraries                                                            
     bioconda/vcftools                |   0.1.16 | Conda             | linux-64, osx-64, linux-aarch64, osx-arm64 | 5, pl5321h1e84f2d_12, pl5321h6057758_11, pl5321h66d0458_8, pl5321hdcf5f25_11, h9a82719_5, h87af4ef_5, he860b03_3, 4, 0, pl526h8b12597_1, pl5321hdcf5f25_8, pl5321h7f4e536_11, pl5.22.0_2, 2, pl5321hd03093a_8, pl5321hda5e58c_12, he941832_2, pl526hdbcaa40_0, pl5321h6151dfb_7, pl5321h077b44d_12, pl5321hdcf5f25_10, pl5321hdcf5f25_9, he513fc3_4, pl5262hfd59bb5_2, ha92aebf_2, pl526hd174df1_1, h7475705_4, pl5321h447d7a5_11, pl5.22.0_1, pl5321hdf58011_9, pl526hd9629dc_0, pl5321h7f4e536_10, 1, pl5262h2e03b76_2, pl5321h87af4ef_6, pl5321hdf58011_10, pl5321hd03093a_7, pl5321h2ec61ea_12, pl5321h9a82719_6, pl5.22.0_0, pl5321h2e03b76_3, 3, h5c9b4e4_3                  
                                          : A set of tools written in Perl and C++ for working with VCF files. This package only contains the C++ libraries whereas the package perl-vcftools-vcf contains the perl libraries
     brown-data-science/vcftools      |   0.1.15 | Conda             | linux-64        | 1         
     compbiocore/perl-vcftools-vcf    |    0.840 | Conda             | linux-64        | pl526_1   
                                          : cpanm ready distribution of VCFtools Perl libraries
     compbiocore/vcftools             |   0.1.15 | Conda             | linux-64        | h1d3419f_0, 1
     pstey/vcftools                   |   0.1.15 | Conda             | linux-64        | 1         
Found 8 packages

Run 'anaconda show ' to get installation details

Specifying -c conda-forge when we installed anaconda-client tells conda to search for the package in this channel (rather than others). After years of using conda I strongly recommend setting the conda-forge channel as your default channel. Rather than typing in conda install -c conda-forge <package> every time (which is tedious), you can set conda-forge as the default using conda config. This means that it will always look here first for a requested package or any of its dependencies, and only look at other channels after first looking here. This is a good thing.

conda config --add channels conda-forge

Here we are setting a configuration preference. So how do you think that was done? That's right, it wrote it to a dotfile. In this case it simply added the preferred conda channel order to a file called ~/.condarc. Let's test this out by installing another package. Now we no longer need to tell it -c conda-forge since it will look there by default. (Personally, I still usually write it anyways just out of habit, so you may see it in future instructions).

# install the Python 'requests' packages
conda install requests

And you can see that the order of channels to search through as been modified by our change of the conda configuration.

Channels:
 - conda-forge
 - defaults
Platform: linux-64

Conda environments

In this class we will probably mostly use only the 'base' conda environment. However, you can create and load many separate conda environments, where each contains a different isolated set of software. Let's go ahead and install vcftools in a new, clean conda environment.

# create a new Python environment names 'vcftools' and activate it
conda create -n vcftools
conda activate vcftools

Notice that when you activate a new environment conda by default will change your command line prompt to indicate that you are now in the (vcftools) environment. This is useful!

Now install vcftools in this new environment, remembering that when we searched for vcftools earlier that we found it in the bioconda channel.

# Add -y to the command line to skip the prompt of whether to proceed with the install
conda install -c bioconda vcftools -y

After the install completes you can test to verify that vcftools is installed.

vcftools --version
VCFtools (0.1.16)

Also, take a look at your PATH environment variable. Conda is manipulating this behind the scenes to isolate installed packages within the environments you specify.

echo $PATH
/home/jovyan/miniconda3/envs/vcftools/bin:/opt/conda/condabin:/opt/conda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin

Notice that the first element of the PATH is now ~/miniconda3/envs/vcftools/bin.

Switch back to the base conda environment.

conda activate base

#Another alternative is to 'deactivate' the current environment, which will
#fall back to the previous environment by default.
conda deactivate

When you switch back to base two things happen: 1) The prompt changes again to reflect that you are back in the (base) environment; and 2) your PATH also changes, which you can verify:

echo $PATH
/home/jovyan/miniconda3/bin:/opt/conda/condabin:/opt/conda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin

Verify that vcftools is indeed no longer available in the base environment:

which vcftools

Right now you only have two environments, so it's not hard to remember which is which, but if you are doing lots of data science you might have numerous environments for all the different analytical tools in your toolkit. You can see which environments are available to you by asking conda to list them.

conda env list
base                  *  /home/deren/miniconda3
vcftools                      /home/deren/miniconda3/envs/vcftools

On my machine I have lots of conda environments, so my output looks like this:

# conda environments:
#
base                     /Users/isaac/miniconda3
PTA                      /Users/isaac/miniconda3/envs/PTA
bart                     /Users/isaac/miniconda3/envs/bart
bci                      /Users/isaac/miniconda3/envs/bci
ee                    *  /Users/isaac/miniconda3/envs/ee
ipyrad                   /Users/isaac/miniconda3/envs/ipyrad
mess                     /Users/isaac/miniconda3/envs/mess
tmp                      /Users/isaac/miniconda3/envs/tmp

The purposeful impermanence of conda

In Tibetan Buddhism there is a tradition involving the creation of large complex mosaic mandalas carefully constructed from colored sand, often involving many weeks of work. Upon completion of the artwork it is then destroyed. This meditative exercise is intended to make one reflect on the impermanence of life, and to experience the act of letting go.

The sand mandala tradition has a lot to teach us about our relationships with our conda software directory. You may feel that after working with conda for several weeks that you have installed the perfect set of software tools that includes everything you will ever need. And you become very attached to it. But, you should feel free to let it go.

Software becomes outdated quickly, and updating many many software packages to new versions can sometimes conflict with other programs in your environment to cause problems. Usually conda can fix these problems and find the right versions that work together. But in some cases it can't. It simply worked itself into a corner where it basically needs to uninstall and reinstall everything again. This is the point when you should consider destroying it.

Remove a conda environment

You can remove a specific environment by name, if you are no longer using it.

conda env remove -n vcftools

Check that this indeed removed your vcftools environment:

conda env list

# conda environments:
#
base                 * /home/jovyan/miniconda3

Remove the entire conda directory

Or, you can also remove the entire conda directory altogether. Remember, we just installed it above, and took just a single command. It's very fast and easy. To remove the conda directory you can use the rm command along with the options -r and -f. The -r option tells it that the thing we want to remove is a folder. The -f command means 'force'. This is a slightly dangerous command if you were to tell it to delete something that you shouldn't. So take care that the file path after this command is the one that you actually want to remove and not something else (like your entire filesystem).

# this command would remove your conda directory completely
# rm -rf ~/miniconda3

Practice creating environments and installing software

For the next exercise you'll practice creating new environments, installing software, changing environments, and removing environments.

  • Create a new conda environment called bedtools
  • Change to this new environment and verify that your PATH has changed
  • Install bedtools from the bioconda channel
  • Verify that bedtools is installed: bedtools --version
  • Deactivate the bedtools environment
  • Check that bedtools is no longer available
  • Create a new environment called msprime and activate it
  • Install msprime from the conda-forge channel
  • Check that it is installed with msp -V
  • Activate the base environment
  • Verify your current PATH and check that bedtools and msp are not available here
  • Remove the bedtools and msprime environments
  • Blow away your entire conda install with rm -rf ~/miniconda3
  • Reinstall miniconda following the directions above
  • In the new clean conda base environment install anaconda-client
  • Now you're ready for the next exercise