Learning the git
command line tool¶
Learning objectives¶
By the end of this tutorial you should be familiar with git terminology, and be able to make changes to git repositories using the command line git program. This tutorial will likely take 15-30 minutes to complete.
Introduction¶
The git
program is a powerful tool not only for programmers but
also for any scientist interested in keeping data and analysis
scripts synced over multiple machines, and shared online.
But learning git
can be intimidating to many beginners.
Many tutorials are aimed at developers working in large groups, where
the collaborative workflows can seem overly complex and unnecessary to
coders who are mostly writing code on their own, at least to begin with.
But learning to use git
and GitHub is still useful even when
writing code on your own, and, it is actually much simpler
to learn and use in this context. We will start simple and work
up to more complex workflows. By the end of this session you should
be comfortable with creating new remote repos on GitHub,
syncing them to your local computer, and pushing changes back
to the remote repo using git
.
Here I will try to provide a simple introduction to git
, and will also
link to other resources online that I have found useful. Please see the
official git book if you get stuck,
or want to pursue further reading.
Setting up git for remote authentication¶
GitHub is a very powerful, industry standard repository for source code, which means that they take access control very seriously. Until recently it was enough to authenticate with a username and password, but it turns out this is not a very safe mechanism for authentication. Passwords can be stolen or cracked, so more intricate authentication mechanisms are becoming more prominent (e.g. 2 factor auth, as you might use for accessing CU systems). In the case of GitHub we need to create a "Personal Access Token" (PAT) to use as a more secure password. The steps for this are documented on the GitHub page for managing your personal access tokens, and they are somewhat complicated, which is why we are walking through it together in class.
- Go to your GitHub home page
- Click on your avatar image in the upper right hand corner
- Choose Settings->Developer settings (all the way at the bottom)
- Then choose Personal access tokens->Tokens (classic)
- Choose Generate New Token (classic)
- Add a Note (you might choose PDSB-2025 or something related to class)
- Set Expiration to 'No expiration'
- Check the boxes to select all available Scopes
- Then click Generate token
- This will take you to a new page which shows your token which will be a very long string of numbers and letters. Copy this and save it!
Alert
Action: Save your PAT somewhere secure, you won't be able to access it again.
Now that we have our PATs we can move forward with cloning a repo, making changes and pushing the changes to GitHub.
Installation¶
Open a terminal and type git
. If it is not yet installed your shell will
prompt you to install it. If you are on Linux (or WSL2) it will be installed
already.
git
is specific to your location¶
When you create a git repository, or convert an existing folder
into one, all this means is that the folder will contain a hidden
directory named .git/
. This directory is where git
will store
all of the information it needs to do its work. You will never need
to look inside of that folder. Instead, you will use commands from
the program git
to programmatically tell it how to operate.
To use git
on a repository, you must be located in that repo.
In other words, you must cd
from your terminal into that location.
If you try a git
command from a folder that is not a
git repo (i.e., does not contain a .git/
subfolder) then git
will raise an error. Let's try this now. We will call the git status
command from our $HOME folder, which itself should not be a git
repository. This should raise the following error.
# move to home folder and call git status
cd ~
git status
fatal: not a git repository (or any of the parent directories): .git
Always read an error message carefully when you receive it. It is telling us that we are not in a git repo.
Setup git¶
The program git
is designed for collaboration. For this reason, it wants
to keep track of who makes changes to any given file. So before we start
using it we need to configure it to tell it who we are. This is done
using the git config
commands. You can set different configurations
for specific git repositories, and/or you can also set a global configuration. In general, you will only need to set the global
config, and only need to do this once. The global settings are
stored in ~/.gitconfig
and can be set with the command below.
You can read more about git configuration in the
git book chapter 1.6.
Let's first check our existing global configuration. This shows the config of your username and email address, and shows the file location where it is stored.
git config --list --show-origin --global
fatal: unable to read config file '/home/jovyan/.gitconfig': No such file or directory
Because my git is already configured it looks like this, and this will be what we'll do for you in the first part of this exercise:
file:/home/deren/.gitconfig user.email=de2356@columbia.edu file:/home/deren/.gitconfig user.name=Deren file:/home/deren/.gitconfig core.editor=nano
To set your global configuration run the following and enter your name and email in place of mine. You can choose any name and email that you want, but you probably want to set it to the same as your GitHub profile. If there is a space in your name then you must surround it with quotes like in the example below.
git config --global user.name "John Doe"
git config --global user.email johndoe@example.com
I also prefer to set the default code editor that git will use
when it asks you to make changes to a file (something that
git does when it encounters a conflict). This needs to be set
to an editor that is installed on your system. I prefer to set
a fast and simple terminal based editor for this, such as nano
.
git config --global core.editor nano
Commit changes to a repo¶
Cloning a git repo¶
Let's clone our first repository. This creates an exact copy of a
remote repo onto your local machine. We will start by cloning the
repo you created in the last session, called hack-2-shell. To clone
a repository you can go to its GitHub page online and click on the
button labeled Code
to select the URL. (For now, we will select the first option, HTTPS,
but later we will switch to using the SSH option.)
When we clone this repo it will create a copy into your current
directory, so beware of your location before doing so. Let's
cd
into our home directory and create a new directory called
hacks/ where you can store all your repos for this class.
# cd into $HOME
cd $HOME
# make a new directory here for storing class repos
mkdir hacks/
# cd into the hacks/ dir
cd hacks/
Finding the link for a github repo you are interested in is very easy,
just navigate to the repository, click the green "Code" button and copy the
link provided (it's easier and more fool-proof than trying to type it
in from memory). Practice by navigating to your own hack-2-shell
repository and copying the link directly from there.

# clone 'hack-2-shell' (replace USERNAME with YOUR git username)
# Pasting the link you just copied here should look identical
git clone https://github.com/USERNAME/hack-2-shell
# cd into hack-2-shell repo
cd hack-2-shell/
Edit the README file¶
Let's make a change to the README file. To do this, we'll use the simple
nano
text editor to open the file, write additional text into it,
and then save and close the text editor. Remember, nano
requires
you to enter a set of hotkey commands to do things like save and
close. These are listed along the bottom of the editor when it is
open. The most important one is labeled Exit.
For me, Exit is ^X, which means hold the control key
and press X (this may vary on Mac or Windows, maybe
it is the Alt or Windows key, etc.). This will ask you if you
want to save any changes, and then it will close the editor.
Alert
Action: Edit the README.md file to add a link (using Markdown) to any videos or webpages that you used when creating your dotfiles. Add a description, such as "I followed instructions from this video to edit my .zshrc file."
# open the README file in the nano text editor
nano README.md
# remember you can exit nano by entering the 'Exit'
# hotkey commands listed at the bottom of the editor.
Sync to the remote¶
We will now sync these changes to the remote repo on GitHub.
This involves three steps, which is all you need to memorize
to become a git
user. These are the core basic commands:
add
, commit
, and push
. Run the commands below to sync your
changes to the remote. In the next section we will break this
down step by step to better comprehend what each step is doing.
The final step will sync your changes to GitHub, and will require
you to enter your username and password. After it syncs, refresh
your repo page on GitHub to see if the changes appear.
git add ./README.md
git commit -m "added links to README file"
git push
Now, the first time you push
a change to a cloned repository you must
authenticate. It will prompt you for your username and password. Enter
your username at the following prompt (for example iao2122
):
Username for 'https://github.com': iao2122
Then when prompted for your password you must copy your auth token (which we created earlier) and paste it in here. The paste action won't appear to do anything because the password is masked:
Password for 'https://iao2122@github.com':
If it succeeds you'll see a message like this, confirming the authorization and executing the transaction:
Enumerating objects: 5, done. Counting objects: 100% (5/5), done. Writing objects: 100% (3/3), 270 bytes | 270.00 KiB/s, done. Total 3 (delta 0), reused 0 (delta 0), pack-reused 0 To https://github.com/iao2122/hack-2-shell.git 5827f76..608da6e main -> main
Alert
Remember: The process of authenticating with your username and PAT only ever needs to be done once per cloned github repo, so you won't have to do this over and over again, but you will have to do it again when you clone a new repository, so it's good to know how it works.
Essential git commands¶
git status¶
In addition to the three core git
commands above, you will
also want to know the essential command git status
. This
will tell you at any given time which of the three commands above you
should call given the changes that have been made to your files.
Since we just pushed our changes, there should be no differences
between our local repo and the remote. Thus, when we call
git status
we should see the following:
git status
On branch main Your branch is up to date with 'origin/main'. nothing to commit, working tree clean
So what does this mean? First, it is telling us which branch we are on (main). Here, we did not create a separate branch on which to make our changes, instead we are still currently on the main branch. As I mentioned in class, this is not the "best practice" when working on a collaborative project, but when you are working alone it usually fine, and it is a much simpler framework for initially learning git.
Next, it says that our branch is up to date with 'origin/main'. Here the name 'origin' is referring to the remote copy of this repository, the one that is on GitHub. This is easy to remember if you think of 'origin' as referring to the place where the repo was first created.
Finally, it says there is nothing to commit. This makes sense, since we just pushed our changes to the remote, so they are already synced. Below we'll make some changes to files and see how this message changes.
git add¶
The command git add
is the first in our series of three commands.
It is used to tell git
that a file has been updated or added to the
list of files to be tracked. It the git lingo this is means the file
has been "staged for commit".
Before we learn more, let's create three new files that we will use throughout the next few sections to demonstrate what it looks like for files to be at different stages along the three-step process of syncing.
# create three new file in this directory
echo "hello world" > ./file-1.md
echo "hello world" > ./file-2.md
echo "hello world" > ./file-3.md
OK, we've created three new files inside of our git repo folder, each
with a different name and containing the text "hello world".
Does this mean that these files are now being tracked by git
?
The answer is no. You can have files in a git repo folder
that are not added to it, and thus are not going to be synced
between machines. These are termed untracked files. Let's
check git status
to see this.
# git status will now show something different.
git status
On branch main Your branch is up to date with 'origin/main'. Untracked files: (use "git add <file>..." to include in what will be committed) file-1.md file-2.md file-3.md nothing added to commit but untracked files present (use "git add" to track)
You can see the output has now changed, there is a new section
called "Untracked files" that shows the name of the three files
that are not being tracked. We could leave these files as untracked
if we wanted, but let's imagine we want some of these files to be
part of our git repo. Reading the comment under the Untracked files
section above we can see that it is giving us instructions for
how to start tracking these files. It says to use git add
.
Let's do this for two of the files, file-1.md and file-2.md.
# add two files to the local git repo
git add file-1.md
git add file-2.md
git status
again to see what has changed.
git status
On branch main Your branch is up to date with 'origin/main'. Changes to be committed: (use "git restore --staged <file>..." to unstage) new file: file-1.md new file: file-2.md Untracked files: (use "git add <file>..." to include in what will be committed) file-3.md
Looking at the first few lines of this message, you might ask "why does it say that my branch is up to date with origin/main if I just made changes to this repo? Shouldn't that mean that they are out of sync?". The answer is: not yet. Although we've edited some files in this repo, we haven't yet committed those changes, and so the changes are considered temporary for now.
You can also see that a new category has appeared in the status report,
in addition to Untracked files we now also see a list of files
under Changes to be committed. This means that
git
is now tracking these files -- it knows they are different (or new).
The name of this new section hints at what we should do next. These have changes that need to be committed. So let's commit them.
git commit¶
You can think of a commit as a timestamp. To the extent git
works
like a time machine, we are saving a point in time that you might
later want to rewind to. A commit at this stage will save the changes
that have been made to all files listed in the to be committed category
(i.e., we don't need to call commit separately for every call of add).
It should be accompanied by a short but informative message about
what the commit includes, by using the -m
option:
# commit changes with a message about what it entails
git commit -m "added files 1 and 2."
Congratulations, you've made your first commit using git
.
Your local repo now has a time-stamped version of this new state.
Let's take a look at the status again to see what this looks like.
git status
On branch main Your branch is ahead of 'origin/main' by 1 commit. (use "git push" to publish your local commits) Untracked files: (use "git add <file>..." to include in what will be committed) file-3.md nothing added to commit but untracked files present (use "git add" to track)
There is now a new status near the top of the message, telling us that our branch is ahead of origin/main by 1 commit. And once again it tells us what to do about it: use git push to publish our local commits.
git push¶
This final command is used to push our commits to the remote repo. After this step the new files and changes that we've made will appear on GitHub when we visit the repo online. Given our current setup, it will ask you to enter your GitHub username and password to authenticate that you have permission to push these changes to the remote repo. We will learn later how to enable password-less authentication. If we had created a separate branch we could specify the name of the branch that we want to push here as well. By default it will push to origin main. The output that it writes below is not very interesting.
# we could write `git push origin main` to be more explicit.
git push
Username for 'https://github.com': hackers-test Password for 'https://hackers-test@github.com': Enumerating objects: 4, done. Counting objects: 100% (4/4), done. Delta compression using up to 8 threads Compressing objects: 100% (2/2), done. Writing objects: 100% (3/3), 320 bytes | 320.00 KiB/s, done. Total 3 (delta 0), reused 0 (delta 0) To https://github.com/hackers-test/hack-2-shell 90cdd7d..4a1782a main -> main
At this stage you can call git status
again and you will see that
our branch is once again 'up to date' with origin/main. This is
true even though file-3.md exists as an untracked file in your
local repo directory. It is fine to have some files that you keep
locally that you do not want to sync, in fact it is very common.