Learning the `git` command line tool¶

Learning objectives¶

By the end of this tutorial you should be familiar with git terminology, and be able to make changes to git repositories using the command line git program. This tutorial will likely take 15-30 minutes to complete.

Introduction¶

The git program is a powerful tool not only for programmers but also for any scientist interested in keeping data and analysis scripts synced over multiple machines, and shared online. But learning git can be intimidating to many beginners. Many tutorials are aimed at developers working in large groups, where the collaborative workflows can seem overly complex and unnecessary to coders who are mostly writing code on their own, at least to begin with. But learning to use git and GitHub is still useful even when writing code on your own, and, it is actually much simpler to learn and use in this context. We will start simple and work up to more complex workflows. By the end of this session you should be comfortable with creating new remote repos on GitHub, syncing them to your local computer, and pushing changes back to the remote repo using git.

Here I will try to provide a simple introduction to git, and will also link to other resources online that I have found useful. Please see the official git book if you get stuck, or want to pursue further reading.

Setting up git for remote authentication¶

GitHub is a very powerful, industry standard repository for source code, which means that they take access control very seriously. Until recently it was enough to authenticate with a username and password, but it turns out this is not a very safe mechanism for authentication. Passwords can be stolen or cracked, so more intricate authentication mechanisms are becoming more prominent (e.g. 2 factor auth, as you might use for accessing CU systems). In the case of GitHub we need to create a "Personal Access Token" (PAT) to use as a more secure password. The steps for this are documented on the GitHub page for managing your personal access tokens, and they are somewhat complicated, which is why we are walking through it together in class.

Go to your GitHub home page
Click on your avatar image in the upper right hand corner
Choose Settings->Developer settings (all the way at the bottom)
Then choose Personal access tokens->Tokens (classic)
Choose Generate New Token (classic)
Add a Note (you might choose PDSB-2025 or something related to class)
Set Expiration to 'No expiration'
Check the boxes to select all available Scopes
Then click Generate token
This will take you to a new page which shows your token which will be a very long string of numbers and letters. Copy this and save it!

Alert

Action: Save your PAT somewhere secure, you won't be able to access it again.

Now that we have our PATs we can move forward with cloning a repo, making changes and pushing the changes to GitHub.

Installation¶

Open a terminal and type git. If it is not yet installed your shell will prompt you to install it. If you are on Linux (or WSL2) it will be installed already.

`git` is specific to your location¶

When you create a git repository, or convert an existing folder into one, all this means is that the folder will contain a hidden directory named .git/. This directory is where git will store all of the information it needs to do its work. You will never need to look inside of that folder. Instead, you will use commands from the program git to programmatically tell it how to operate.

To use git on a repository, you must be located in that repo. In other words, you must cd from your terminal into that location. If you try a git command from a folder that is not a git repo (i.e., does not contain a .git/ subfolder) then git will raise an error. Let's try this now. We will call the git status command from our $HOME folder, which itself should not be a git repository. This should raise the following error.

# move to home folder and call git status
cd ~
git status

fatal: not a git repository (or any of the parent directories): .git

Always read an error message carefully when you receive it. It is telling us that we are not in a git repo.

Setup git¶

The program git is designed for collaboration. For this reason, it wants to keep track of who makes changes to any given file. So before we start using it we need to configure it to tell it who we are. This is done using the git config commands. You can set different configurations for specific git repositories, and/or you can also set a global configuration. In general, you will only need to set the global config, and only need to do this once. The global settings are stored in ~/.gitconfig and can be set with the command below. You can read more about git configuration in the git book chapter 1.6.

Let's first check our existing global configuration. This shows the config of your username and email address, and shows the file location where it is stored.

git config --list --show-origin --global

If you have not yet configured git locally it will look like this:

fatal: unable to read config file '/home/jovyan/.gitconfig': No such file or directory

Because my git is already configured it looks like this, and this will be what we'll do for you in the first part of this exercise:

file:/home/deren/.gitconfig     user.email=de2356@columbia.edu
file:/home/deren/.gitconfig     user.name=Deren
file:/home/deren/.gitconfig     core.editor=nano

To set your global configuration run the following and enter your name and email in place of mine. You can choose any name and email that you want, but you probably want to set it to the same as your GitHub profile. If there is a space in your name then you must surround it with quotes like in the example below.

git config --global user.name "John Doe"
git config --global user.email johndoe@example.com

I also prefer to set the default code editor that git will use when it asks you to make changes to a file (something that git does when it encounters a conflict). This needs to be set to an editor that is installed on your system. I prefer to set a fast and simple terminal based editor for this, such as nano.

git config --global core.editor nano

Commit changes to a repo¶

Cloning a git repo¶

Let's clone our first repository. This creates an exact copy of a remote repo onto your local machine. We will start by cloning the repo you created in the last session, called hack-2-shell. To clone a repository you can go to its GitHub page online and click on the button labeled Code to select the URL. (For now, we will select the first option, HTTPS, but later we will switch to using the SSH option.) When we clone this repo it will create a copy into your current directory, so beware of your location before doing so. Let's cd into our home directory and create a new directory called hacks/ where you can store all your repos for this class.

# cd into $HOME
cd $HOME

# make a new directory here for storing class repos
mkdir hacks/

# cd into the hacks/ dir
cd hacks/

Finding the link for a github repo you are interested in is very easy, just navigate to the repository, click the green "Code" button and copy the link provided (it's easier and more fool-proof than trying to type it in from memory). Practice by navigating to your own hack-2-shell repository and copying the link directly from there.

# clone 'hack-2-shell' (replace USERNAME with YOUR git username)
# Pasting the link you just copied here should look identical
git clone https://github.com/USERNAME/hack-2-shell

# cd into hack-2-shell repo
cd hack-2-shell/

Edit the README file¶

Let's make a change to the README file. To do this, we'll use the simple nano text editor to open the file, write additional text into it, and then save and close the text editor. Remember, nano requires you to enter a set of hotkey commands to do things like save and close. These are listed along the bottom of the editor when it is open. The most important one is labeled Exit. For me, Exit is ^X, which means hold the control key and press X (this may vary on Mac or Windows, maybe it is the Alt or Windows key, etc.). This will ask you if you want to save any changes, and then it will close the editor.

Alert

Action: Edit the README.md file to add a link (using Markdown) to any videos or webpages that you used when creating your dotfiles. Add a description, such as "I followed instructions from this video to edit my .zshrc file."

# open the README file in the nano text editor
nano README.md

# remember you can exit nano by entering the 'Exit'
# hotkey commands listed at the bottom of the editor.

Sync to the remote¶

We will now sync these changes to the remote repo on GitHub. This involves three steps, which is all you need to memorize to become a git user. These are the core basic commands: add, commit, and push. Run the commands below to sync your changes to the remote. In the next section we will break this down step by step to better comprehend what each step is doing. The final step will sync your changes to GitHub, and will require you to enter your username and password. After it syncs, refresh your repo page on GitHub to see if the changes appear.

git add ./README.md
git commit -m "added links to README file"
git push

Now, the first time you push a change to a cloned repository you must authenticate. It will prompt you for your username and password. Enter your username at the following prompt (for example iao2122):

Username for 'https://github.com': iao2122

Then when prompted for your password you must copy your auth token (which we created earlier) and paste it in here. The paste action won't appear to do anything because the password is masked:

Password for 'https://iao2122@github.com':

If it succeeds you'll see a message like this, confirming the authorization and executing the transaction:

Enumerating objects: 5, done.
Counting objects: 100% (5/5), done.
Writing objects: 100% (3/3), 270 bytes | 270.00 KiB/s, done.
Total 3 (delta 0), reused 0 (delta 0), pack-reused 0
To https://github.com/iao2122/hack-2-shell.git
   5827f76..608da6e  main -> main

Alert

Remember: The process of authenticating with your username and PAT only ever needs to be done once per cloned github repo, so you won't have to do this over and over again, but you will have to do it again when you clone a new repository, so it's good to know how it works.

Essential git commands¶

git status¶

In addition to the three core git commands above, you will also want to know the essential command git status. This will tell you at any given time which of the three commands above you should call given the changes that have been made to your files. Since we just pushed our changes, there should be no differences between our local repo and the remote. Thus, when we call git status we should see the following:

git status

On branch main 
Your branch is up to date with 'origin/main'.

nothing to commit, working tree clean

So what does this mean? First, it is telling us which branch we are on (main). Here, we did not create a separate branch on which to make our changes, instead we are still currently on the main branch. As I mentioned in class, this is not the "best practice" when working on a collaborative project, but when you are working alone it usually fine, and it is a much simpler framework for initially learning git.

Next, it says that our branch is up to date with 'origin/main'. Here the name 'origin' is referring to the remote copy of this repository, the one that is on GitHub. This is easy to remember if you think of 'origin' as referring to the place where the repo was first created.

Finally, it says there is nothing to commit. This makes sense, since we just pushed our changes to the remote, so they are already synced. Below we'll make some changes to files and see how this message changes.

git add¶

The command git add is the first in our series of three commands. It is used to tell git that a file has been updated or added to the list of files to be tracked. It the git lingo this is means the file has been "staged for commit".

Before we learn more, let's create three new files that we will use throughout the next few sections to demonstrate what it looks like for files to be at different stages along the three-step process of syncing.

# create three new file in this directory
echo "hello world" > ./file-1.md
echo "hello world" > ./file-2.md
echo "hello world" > ./file-3.md

OK, we've created three new files inside of our git repo folder, each with a different name and containing the text "hello world". Does this mean that these files are now being tracked by git? The answer is no. You can have files in a git repo folder that are not added to it, and thus are not going to be synced between machines. These are termed untracked files. Let's check git status to see this.

# git status will now show something different.
git status

On branch main
Your branch is up to date with 'origin/main'.

Untracked files:
  (use "git add <file>..." to include in what will be committed)
    file-1.md
    file-2.md
    file-3.md

nothing added to commit but untracked files present (use "git add" to track)

You can see the output has now changed, there is a new section called "Untracked files" that shows the name of the three files that are not being tracked. We could leave these files as untracked if we wanted, but let's imagine we want some of these files to be part of our git repo. Reading the comment under the Untracked files section above we can see that it is giving us instructions for how to start tracking these files. It says to use git add. Let's do this for two of the files, file-1.md and file-2.md.

# add two files to the local git repo
git add file-1.md
git add file-2.md

Now we'll call git status again to see what has changed.

git status

On branch main
Your branch is up to date with 'origin/main'.

Changes to be committed:
  (use "git restore --staged <file>..." to unstage)
    new file:   file-1.md
    new file:   file-2.md

Untracked files:
  (use "git add <file>..." to include in what will be committed)
    file-3.md

Looking at the first few lines of this message, you might ask "why does it say that my branch is up to date with origin/main if I just made changes to this repo? Shouldn't that mean that they are out of sync?". The answer is: not yet. Although we've edited some files in this repo, we haven't yet committed those changes, and so the changes are considered temporary for now.

You can also see that a new category has appeared in the status report, in addition to Untracked files we now also see a list of files under Changes to be committed. This means that git is now tracking these files -- it knows they are different (or new). The name of this new section hints at what we should do next. These have changes that need to be committed. So let's commit them.

git commit¶

You can think of a commit as a timestamp. To the extent git works like a time machine, we are saving a point in time that you might later want to rewind to. A commit at this stage will save the changes that have been made to all files listed in the to be committed category (i.e., we don't need to call commit separately for every call of add). It should be accompanied by a short but informative message about what the commit includes, by using the -m option:

# commit changes with a message about what it entails
git commit -m "added files 1 and 2."

Congratulations, you've made your first commit using git. Your local repo now has a time-stamped version of this new state. Let's take a look at the status again to see what this looks like.

git status

On branch main
Your branch is ahead of 'origin/main' by 1 commit.
  (use "git push" to publish your local commits)

Untracked files:
  (use "git add <file>..." to include in what will be committed)
    file-3.md

nothing added to commit but untracked files present (use "git add" to track)

There is now a new status near the top of the message, telling us that our branch is ahead of origin/main by 1 commit. And once again it tells us what to do about it: use git push to publish our local commits.

git push¶

This final command is used to push our commits to the remote repo. After this step the new files and changes that we've made will appear on GitHub when we visit the repo online. Given our current setup, it will ask you to enter your GitHub username and password to authenticate that you have permission to push these changes to the remote repo. We will learn later how to enable password-less authentication. If we had created a separate branch we could specify the name of the branch that we want to push here as well. By default it will push to origin main. The output that it writes below is not very interesting.

# we could write `git push origin main` to be more explicit.
git push

Username for 'https://github.com': hackers-test
Password for 'https://hackers-test@github.com': 
Enumerating objects: 4, done.
Counting objects: 100% (4/4), done.
Delta compression using up to 8 threads
Compressing objects: 100% (2/2), done.
Writing objects: 100% (3/3), 320 bytes | 320.00 KiB/s, done.
Total 3 (delta 0), reused 0 (delta 0)
To https://github.com/hackers-test/hack-2-shell
   90cdd7d..4a1782a  main -> main

At this stage you can call git status again and you will see that our branch is once again 'up to date' with origin/main. This is true even though file-3.md exists as an untracked file in your local repo directory. It is fine to have some files that you keep locally that you do not want to sync, in fact it is very common.

Learning the git command line tool¶

Learning objectives¶

Introduction¶

Setting up git for remote authentication¶

Installation¶

git is specific to your location¶

Setup git¶

Commit changes to a repo¶

Cloning a git repo¶

Edit the README file¶

Sync to the remote¶

Essential git commands¶

git status¶

git add¶

git commit¶

git push¶

Learning the `git` command line tool¶

`git` is specific to your location¶