Malaga, Spain 2019
Deren Eaton, Columbia Univesity
Isaac Overcast, City College of New York
By the end of class you should be able to:
1. Describe the structure and format of RAD-seq data.
2. Aassemble RAD-seq data sets using ipyrad to produce files for analyses.
3. Understand the use of jupyter-notebooks for reproducible analyses.
4. Create notebooks for conducting reproducible genomic analyses in Python.
Please follow along on https://radcamp.github.io/IBS2019/
Reconstruct demography, calculate population divergence and introgression. With reference mapped data this can be done spatially along chromosomes.
Infer gene trees and species trees, even over relatively deep evolutionary time scales (~100 Ma).
Sequence to higher coverage by selecting fewer genomic regions.
Concentrate sequencing to the same regions across all samples.
Multiplex many samples onto a sequencing run.
Why not just sequence the whole genome at low-coverage?
Many genomes are very large, so it remains too expensive,
especially when studying many samples/pops/species.
Low coverage base calling can estimate population-wide
statistics well, but not individual level.
Low coverage is messy/difficult for extracting loci
for individuals (e.g., for phylogenetics) b/c so much missing data.
Variation in the presence/absence of a restriction
cut site can cause loci to be present in some samples but not others.
Compared to many other methods (e.g., transcriptomes, UCEs) RAD-seq still
provides substantially more data (e.g., SNPs). The important concern is how your
analysis tools deal with missing information.
Many genomes are very large, so it remains too expensive,
especially when studying many samples/pops/species.
Low coverage base calling can estimate population-wide
statistics well, but not individual level.
Low coverage is messy/difficult for extracting loci
for individuals (e.g., for phylogenetics) b/c so much missing data.
Lines starting with hash (#) are only comments
# This is the general format of unix command line tools
$ program -option1 -option2 target
# e.g., the 'pwd' program with no option or target prints your cur dir
$ pwd
I'll use a grey background to show the returned value
/home/deren/
We'll cover this soon in the RADcamp tutorial.
# The ipyrad CLI can be used in a terminal
$ ipyrad -p params-data.txt -s 123 -t 4 -c 16
Always know where you are and where your files are.
# The root (top) of the entire filesystem (used for writing full paths).
$ /
# Here, in my current directory (used for writing relative paths).
$ ./
# Up one directory from my current directory (a relative path).
$ ../
Always know where you are and where your files are.
# show the files and folders in a location (default target is cur dir)
$ ls
# show result as a list for cur dir.
$ ls -l ./
# show another location on the filesystem
$ ls -l /bin/
# move to a new location. This becomes your new cur dir.
$ cd folder/
Your location (current directory) starts from / (the root) and is described by a nested set of directory names leading to your location.
# use 'pwd' program with no option or target to ask where am I now?
$ pwd
/home/deren/
We can make new directories and change our location.
# make a new directory (mkdir is the program, genomics is the target)
$ mkdir genomics
# change directory (move) into the new directory and run pwd again
$ cd genomics
$ pwd
/home/deren/genomics