Programming and Data Science for Biology
(EEEB G4050)

Lecture 4: conda package management & intro to python

Lecture 4.0 Outline:

  • Troubleshooting git
  • Conda for package management
  • Introduction to python for data science

Troubleshooting git

Version control systems (particularly git) are useful for managing development of a codebase that is shared across multiple local repositories. Examples of when this could be useful are if you are working on a large project with multiple co-developers, each with your own copy of the code, and each responsible for updating specific parts of it.
It's simplest to think of the remote copy of the repository on GitHub as the 'authoritative' copy (but this isn't true in a strict sense).

Troubleshooting git

If you and a partner are working on a repository that shares a 'remote' the git workflow ensures that there are not conflicts introduced in the codebase. If you change something and 'push', your partner needs to 'pull' and merge any changes (fixing any conflicts) before they can push their own changes.

Troubleshooting git

In practice, if you have a local repository and you are also modifying files on Github.com through the UI, then you are essentially 'collaborating with yourself', modifying two different copies of the code (local and remote), with the potential to introduce merge conflicts.


To https://github.com/iao2122/hack-2-shell.git
 ! [rejected]        main -> main (fetch first)
error: failed to push some refs to 'https://github.com/iao2122/hack-2-shell.git'
hint: Updates were rejected because the remote contains work that you do
hint: not have locally. This is usually caused by another repository pushing
hint: to the same ref. You may want to first integrate the remote changes
hint: (e.g., 'git pull ...') before pushing again.
hint: See the 'Note about fast-forwards' in 'git push --help' for details.
          

The simplest thing to do here is to just do what the `hint` says and `git pull` before doing the push from local. You can also 'stash' local changes, but this is a topic for another time.

conda software manager

In your assignment this week you will learn to install and use the conda software package manager. This is a popular and powerful tool for scientific programming. It allows you to automate the process of finding and installing software, and dealing with dependencies.


# install a Python package with conda
conda install numpy

# install a binary program with conda from a specific channel
conda install samtools -c bioconda
          

Later we will also learn how to write conda recipes so that you can make the software tools that you develop available on conda.

Why do we need conda? What is the problem it's solving?

Prior to conda there were many alternative ways to install software packages (and dependencies), which installed files and programs in different places. This would usually result in a "hairball" of conflicts among installed software, leading to things breaking and being a huge mess to clean up.

Conda uses 'environments' to tame this hairball

Conda solves this problem by allowing you to create and manage 'environments' for installing software and dependencies. Environments are isolated from one another, so if something breaks in one environment it doesn't break your whole system, and it can be easily cleaned up by simply removing the broken environment.

How does the conda magic work?

In your assignment this week we will see in more detail how conda works We'll learn where it installs software, why it uses this location, and how it manipulates your PATH to make the magic of environments seamless.


# show the PATH variable
echo $PATH
          

/home/deren/gems/bin:/home/deren/miniconda3/bin:/home/deren/miniconda3/condabin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin
          

Learning Python: Why Python?

  • Python is easy to read and easy to learn
  • Python has very active open source development community
  • Python has the most rapidly growing Data Science community
  • Python is a general purpose language (not domain specific)
  • Python is object-oriented with well developed style guides
  • But, Python is relatively slow (compared to C, C++, Fortran, etc)

Python introduction (objects)

Everything in Python is an *object*. This is inherent in the design of the language, and something that we will learn to explore further as we become more advanced. An integer is an object, a function is an object, and a Class is an object.

At its base, Python includes a small number of basic *object types*. Once these basic types were defined, it turns out that you can create almost anything by using them in creative ways. In this way, most of the Python programming language is actually written in Python. These basic types include integers, floats, strings, dictionaries, lists, etc. You will learn about these types this week.

Python introduction (string types)

String type objects are used to represent one or more characters. Strings are created by using either single or double quotes. You can perform operations on strings, such as combining them or reformatting them to accomplish very complex tasks.

To start, we need to create a string type object. We do this by storing the value of a string to a variable name. String objects have more than just their value attached to them. In addition, we can access other attributes and functions from the string object by using a dot after the object.


# create a string variable called x
x = "a string of text"

# call the .upper() function of the string object
print(x.upper())
          

A STRING OF TEXT
          

Object oriented programming

Object oriented programming is a style of programming design where that functions that are meant to be called on a given type of object are *attached* to that object. This makes it easy to find all of the available options given some object, and ensures you call the right type of function on the right type of object.
Coding development tools like jupyter or a good coding editor like vscode makes it easier to write Python code by taking advantage of this design. When writing Python code you can use tab-completion to see all options from your current object.

This week's types

This week you will learn about strings, numbers, and lists. These are some of the basic type object types in Python. You also have a short reading to introduce these topics. By the end of next week we will learn all of the additional basic types, including dictionaries, sets, and tuples.