Malaga, Spain 2019

#### Part 1: Introduction to RAD-seq

Deren Eaton, Columbia Univesity

Isaac Overcast, City College of New York

## Why are you here today?

### Population genomic analyses

Reconstruct demography, calculate population divergence and introgression. With reference mapped data this can be done spatially along chromosomes.

### Phylogenomic analyses

Infer gene trees and species trees, even over relatively deep evolutionary time scales (~100 Ma).

#### The goal is to attain a reduced representation of the genome

Sequence to higher coverage by selecting fewer genomic regions.

Concentrate sequencing to the same regions across all samples.

Multiplex many samples onto a sequencing run.

##### Variant methods differ in cost and selectivity

Why not just sequence the whole genome at low-coverage?

Many genomes are very large, so it remains too expensive,
especially when studying many samples/pops/species.

Low coverage base calling can estimate population-wide
statistics well, but not individual level.

Low coverage is messy/difficult for extracting loci
for individuals (e.g., for phylogenetics) b/c so much missing data.

### RAD-seq has a lot of missing data

Variation in the presence/absence of a restriction
cut site can cause loci to be present in some samples but not others.

Compared to many other methods (e.g., transcriptomes, UCEs) RAD-seq still
provides substantially more data (e.g., SNPs). The important concern is how your analysis tools deal with missing information.

### Allele dropout can affect base calls

Many genomes are very large, so it remains too expensive,
especially when studying many samples/pops/species.

Low coverage base calling can estimate population-wide
statistics well, but not individual level.

Low coverage is messy/difficult for extracting loci
for individuals (e.g., for phylogenetics) b/c so much missing data.

### A quick introduction to the command line

Lines starting with hash (#) are only comments


# This is the general format of unix command line tools
$program -option1 -option2 target   # e.g., the 'pwd' program with no option or target prints your cur dir$ pwd


I'll use a grey background to show the returned value

    /home/deren/        

### The ipyrad command line interface

We'll cover this soon in the RADcamp tutorial.


# The ipyrad CLI can be used in a terminal
$ipyrad -p params-data.txt -s 123 -t 4 -c 16  ### Hierarchical File System Always know where you are and where your files are.  # The root (top) of the entire filesystem (used for writing full paths).$ /

# Here, in my current directory (used for writing relative paths).
$./ # Up one directory from my current directory (a relative path).$ ../


### Hierarchical File System

Always know where you are and where your files are.


# show the files and folders in a location (default target is cur dir)
$ls # show result as a list for cur dir.$ ls -l ./

# show another location on the filesystem
$ls -l /bin/ # move to a new location. This becomes your new cur dir.$ cd folder/


Your location (current directory) starts from / (the root) and is described by a nested set of directory names leading to your location.


# use 'pwd' program with no option or target to ask where am I now?
$pwd   /home/deren/  We can make new directories and change our location.  # make a new directory (mkdir is the program, genomics is the target)$ mkdir genomics

# change directory (move) into the new directory and run pwd again
$cd genomics$ pwd

    /home/deren/genomics