1. Understand and describe the structure and format of RAD-seq data.
2. Assemble RAD-seq data sets using ipyrad to produce files for analyses.
3. Understand the use of jupyter-notebooks for reproducible analyses.
4. Create notebooks for conducting reproducible genomic analyses in Python.
Please follow along on
https://radcamp.github.io/NYC2019/
Integrating collections based fieldwork, genomics, comparative methods, and bioinformatics, to study phylogeny, introgression and speciation in flowering plants.
And it's an exciting time for this! Genomic technologies are revolutionizing the study of ecology and evolution.
And it's an exciting time for this! Genomic technologies are revolutionizing the study of ecology and evolution.
And it's an exciting time for this! Genomic technologies are revolutionizing the study of ecology and evolution.
And it's an exciting time for this! Genomic technologies are revolutionizing the study of ecology and evolution.
Reconstruct demography, calculate population divergence and introgression. With reference mapped data this can be done spatially along chromosomes (e.g., Nadeau et al. 2013).
Infer gene trees and species trees, over shallow or relatively deep evolutionary time scales (<100 Ma); e.g., Eaton et al. 2015.
The aim of RAD-seq is select a subset (reduced representation) of orthologous genomic regions across many sampled individuals. This concentrates read coverage to regions of the genome that can be used in comparative analyses. In addition, you can multiplex many samples to attain high efficienty on Illumina machines (e.g., combinatorial indexing).
Variant methods differ in their cost, complexity, and efficiency. We used the 3RAD method, a very inexpensive and accurate new variant of RAD-seq
Why not just sequence the entire genome at low-coverage?
Many genomes are very large and so the cost is expensive. In addition,
most population/phylogenetic questions can be answered using a subsample
of many loci from the genome.
Low coverage data can be powerful for studying very closely related
populations where missing data can be imputed, but when comparing distant
relatives it is important to obtain higher coverage per sample.
What's the deal with missing data in RAD-seq?
Many genomes are very large and so the cost is expensive. In addition,
most population/phylogenetic questions can be answered using a subsample
of many loci from the genome.
Low coverage data can be powerful for studying very closely related
populations where missing data can be imputed, but when comparing distant
relatives it is important to obtain higher coverage per sample.
At what phylogenetic scale can I use RAD-seq?
It depends. Different library types have different rates of
allele dropout. And other sources of missing data like seq.
coverage can be relevant too.
RAD-seq methods have been successfully applied in silico
and empirically up to ~100Ma. However, the efficiency of
RAD-seq compared to other methods is lower at this scale, and you
are often better off choosing a different method.
A strength of RAD-seq is that one type of data can be used
across many scales, from studying pedigrees to millions of years
of divergence.
# This is the general format of unix command line tools
$ program -option1 -option2 target
Example, returned results of program shown in grey.
# e.g., the 'pwd' program with no option or target prints your cur dir
$ pwd
/home/deren/
# This is the general format of unix command line tools
$ program -option1 -option2 target
ipyrad works the same way:
# The ipyrad CLI can be used in a terminal
$ ipyrad -p params-data.txt -s 123 -t 4 -c 16
# The root (top) of the entire filesystem (used for writing full paths).
$ /
# Here, in my current directory (used for writing relative paths).
$ ./
# Up one directory from my current directory (a relative path).
$ ../
# show the files and folders in a location (default target is cur dir)
$ ls
# show result as a list for cur dir.
$ ls -l ./
# show another location on the filesystem
$ ls -l /bin/
# move to a new location. This becomes your new cur dir.
$ cd folder/
Your location (current directory) starts from / (the root) and is described by a nested set of directory names leading to your location.
# use 'pwd' program with no option or target to ask where am I now?
$ pwd
/home/deren/
# make a new directory (mkdir is the program, genomics is the target)
$ mkdir genomics
# change directory (move) into the new directory and run pwd again
$ cd genomics
$ pwd
/home/deren/genomics