Notebook 12.2: Phylogenetics in practice¶

Introduction¶

Orthologs are homologous gene copies that diverged from a common ancestral gene during speciation events. After speciation, lineages evolve independently such that orthologous gene copies accumulate different mutations over time. This process can continue through many speciation events. As mutations arise in the periods between speciation they leave behind a record of the existence of the common ancestral sequence. In this way, the mutational process constructs a record that we can eventually use to reconstruct the order and timing of speciation events. By comparing the orthologous gene sequences of the descendant lineages alive today we can contruct a phylogeny as a snapshot of the past.

Inferring phylogenies¶

In this notebook we will not dive too deep into the algorithmic and methodological complexity of phylogenetic inference. In its most basic form, the goal of molecular phylogenetics is to count the number of character differences (e.g., substitutions) between multiple orthologous sequences, and to build a tree that minimizes the number of steps needed to explain those differences. We can do this using simple parsimony methods (the fewest steps is the best), or with methods that model DNA substitution as a probabilistic process (e.g., maximum likelihood).

The details of how various phylogenetic software tools differ, and what assumptions they make, is important to learn and relevant if you use phylogenetics for research. And we'll get to some of those details later. But for now, we'll focus on the applied end of things. How do I infer a phylogenetic tree in practice? We'll turn our attention back to the more theoretical and statistical details later.

Practical tree inference¶

There are three basic steps involved in inferring a phylogenetic tree. We'll complete a practical example from start to finish in this notebook. Then at the end you will try to replicate this in an organism of your choice.

Generate/retrieve sequence data from orthologous gene regions. For this we will use the NCBI API utilities that we learned about in an earlier session to query and fetch fasta files directly from a large online database.

Align the data to identify substitutions at homologous sites. For this we will use a command line program called muscle. There are many alignment programs available, I chose this one because it's easy to use. We will input a fasta file that includes all of our sequence data, but for which each ortholog may be a different length and may start or stop at a different position in the gene. The muscle program will ouput a new fasta file where the sequences are aligned by imputing "-" symbols to improve the alignment. We'll see an example of this below.

Infer a tree by entering the aligned data into a phylogenetic inference program. For this we'll enter the aligned fasta files into a program called RAxML. This is a commonly used phylogenetic inference program that uses Maximum Likelihood to infer the best tree to fit the data. As output RAxML produces a newick formatted tree file (like we learned about in the last notebook). Finally, we'll plot the tree result.

import os
import subprocess
import toytree
import requests
import pandas as pd

Question [5]: What are the 3 steps involved in practical phylogenetic inference? Answer concisely in Markdown below.

- collect character data (e.g., sequences) for each sample you want to study. - align the characters/sequences to identify homology. - infer a tree using one of several possible inference methods. We will also accept detailed answers about tree inference: - propose a starting tree - calculate score of the tree (e.g., parsimony or likelihood) - propose a change to the tree and repeat.

Retrieving sequence data¶

To find individual genes or gene regions sampled from particular species the best place to look is the NCBI Genbank sequence archive. This database includes both Sanger sequenced DNA regions that were targeted by researchers for sequencing using primers, as well as genes or gene regions from sequenced and annotated genomes. In an earlier notebook we used API tools to access fasta data directly from the database. Here we'll use that code again, written as Python functions. Execute the cells to load these functions, and follow along by reading the code and comments in it. We'll mostly focus on the results that the code produces.

Retrieve unique IDs for some number of samples¶

The esearch API returns unique IDs for the search term that we use. Our query below is searching for a particular marker that is commonly used in plant phylogenetic studies (trnK). This is a chloroplast gene that has an important function and so tends to be highly conserved, which makes it easy to sequence by designing primers to amplify it. However, it also contains an intron sequence nearby that can be sequenced along with it, and this region is more variable, and thus provides more phylogenetic information than the conserved part. It has historically been difficult to find genomic regions that are both easy to amplify and variable enough to work as good phylogenetic markers. The few regions that were first discovered to fit this category have been used extensively. And because of this inertia they have become what is called universal markers -- a few gene regions that have been sequenced across thousands of organisms by many research groups over several decades. These markers provide the best sampled phylogenetic data that we have, and through a community effort of collaboration and data sharing have made it possible to combine available data to build enormous phylogenies for thousands of species that help to elucidate the tree of life.

In addition to targeting the trnK gene we're also restricting our query to a specific organism, the plant genus Pedicularis. This is a group of plants that I study. Please follow along using this search term for now, you will have a chance to explore other search terms later below.

# execute this cell to create the get_uids function

def get_uids(term, retmax=10):
    "Search NCBI nucleotide database and return N sequence matches"
    
    # make a request to esearch 
    res = requests.get(
        url="https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi", 
        params={
            "db": "nucleotide",
            "term": term,
            "sort": "length",
            "retmode": "text",
            "retmax": retmax,
            "tool": "genomics-course", 
            "email": "student@university.edu",
            },
        )
    
    # parse the xml output
    count = res.text.split("<Count>")[1].split("</Count>")[0].strip()
    
    # if nothing found then bail out
    if not int(count):
        raise ValueError("No UIDs found")
    
    # return the list of UIDs
    uids = []
    ids = res.text.split("<IdList>")[1].split("</IdList>")[0].strip()
    for item in ids.split("\n"):
        uids.append(item[4:-5])
    return uids

Our query statement to the database is stored as the variable term below. We enter this as an argument to the get_uids() function and tell it to return at most 20 results. The function returns a list of strings that represent unique IDs matching our query.

# the search term
term = "trnk[GENE] AND Pedicularis[ORGN]"

# call the function to get uids and store as a list
ingroups = get_uids(term=term, retmax=20)

# show the first ten values
ingroups[:10]

['1720554516',
 '1720554514',
 '1720554512',
 '1720554510',
 '1720554508',
 '1720554506',
 '1720554504',
 '1720554502',
 '63253964',
 '63253962']

Outgroup sample¶

As we learned in the last notebook, it is important to root your tree so that you know the direction of evolution when interpreting evolutionary relationships. One way to do this is to include an outgroup sample in your analysis -- a sample that you know is more distantly related to everything else. Here we'll include 2 samples from a closely related genus. To do this I changed the search term string below to look for the same gene but from a different organism. Finally we add the list results of this search (2 UIDs) with the list results of the first one (20 UIDs) to have a total of 22 samples in the final list uids. We could combine many searches to select samples from a variety of species, or genera, or families in this way.

# the search term
term = "trnk[GENE] AND Orobanche[ORGN]"

# call the function to get uids and store as a list
outgroups = get_uids(term=term, retmax=2)

# show these uids
print(outgroups)

# add these uids to the list
uids = ingroups + outgroups

['1690492016', '1690492014']

Retrieve fasta data for the UIDs¶

The next function fetches fasta data from the database based on the unique IDs that we got from the previous search. In the API call here we specify the strand of DNA, which will ensure they are all returned in the same orientation, and we also specify a start and stop limit just to ensure we do not get back sequences longer than 2000 base pairs. Here automation can sometimes be dangerous because there are errors in the metadata of the database. For example, people sometimes enter the wrong strand information into the metadata when they upload the sequences, which can cause your sequences to not align well. It is important to visually inspect your data to check for these types of errors, as we will do below.

# execute this cell to create the get_fasta function

def get_fasta(uids):
    """
    Fetch fasta sequence data from NCBI for a list of uids
    and return as a dictionary of {name: sequence}.
    """
    # make a request to efetch 
    res = requests.get(
        url="https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi", 
            params={
            "db": "nucleotide",
            "id": ",".join(uids),
            "seq_start": "1",
            "seq_stop": "2000",
            "strand": "1",
            "retmode": "text",   
            "rettype": "fasta",
            "tool": "genomics-course", 
            "email": "student@university.edu",
            },
        )
    
    # split fasta string into separate fasta sequences
    fastas = res.text.strip().split("\n\n")
    
    # write to output file
    with open("sequences.fasta", 'w') as out:
        for fasta in fastas:

            # separate name and sequence
            name, sequence = fasta.split("\n", 1)

            # reorder and shorten fasta names for easier reading
            bits = name.split()
            genus = bits[1][0]
            species = bits[2]
            accession = bits[0][1:].split(":")[0]
            name = ">{}_{}_{}".format(genus, species, accession)

            # remove line breaks from sequence
            sequence = sequence.replace("\n", "")

            # write to file
            out.write("{}\n{}\n".format(name, sequence))
    
    # print statement
    print("Wrote {} sequences to ./sequences.fasta".format(len(fastas)))

# run the get_fasta command
fastadict = get_fasta(uids)

Wrote 22 sequences to ./sequences.fasta

Looking at the fasta file¶

We have now downloaded orthologous gene sequences for 22 samples (20 Pedicularis and 2 Orobanche). The function above printed a statement saying that it printed the data to ./sequences.fasta. You can open the file in your file viewer to look at it. Or, we can also peek at it here in the notebook. Let's look at just the first few columns and rows of data. Here I'm going to use a cool trick in Pandas to display the DNA as a DataFrame with colored cells.

# execute to create the following functions

def highlight_dna(cell):
    "A function to color cells by DNA base"
    color = {"A": 'red', "T": 'blue', "C": 'green', "G": 'yellow', "-": "lightgrey"}
    return 'background-color: {}'.format(color[cell])


def colored_slice(fasta, start, stop):
    "returns a colored dataframe over a given slice"
    # load seq data 
    with open(fasta) as infile:

        # load sequence file as a dictionary
        fdict = {}
        for fa in infile.read().strip()[1:].split("\n>"):
            name, seq = fa.split("\n")
            fdict[name] = list(seq)[start: stop]

        # make dataframe from dictionary
        df = pd.DataFrame(fdict, index=range(start, stop)).T

    # show dataframe as colored cells
    return df.style.applymap(highlight_dna)

We can now use this function to look at a slice of this fasta file. You can scroll to the right to see more columns as well. Each column here represents one sequenced base. When all of the samples have the same base at a position this means that the site is not variable (at least given our sampling). No mutations have occurred at that site since these species diverged from each other. You can see that most columns for the Pedicularis samples are invariant, but then there is a lot of mismatch with the two outgroup samples in the bottom two rows.

Now remember, the sequences in this file have not been aligned yet, so we expect that they may look pretty bad at this stage. As you scroll to the right you'll see that the ingroup samples (Pedicularis) also fall out of phase pretty quickly. That is because there was an insertion or deletion at some point which is not being accounted for. Let's see if aligning makes this look better.

# we can't tell yet that the sequences are not aligned
colored_slice("./sequences.fasta", 0, 100)

Alignment¶

The muscle alignment program is a command line tool that can be called as muscle -in infile -out outfile. In the function below we use the Python library subprocess to call and run the program from within Python with subprocess.call(). This is a convenient way in which to use Python code as "glue" to tie together programs written in other languages with tasks you want to complete in Python.

The program muscle uses an algorithm to insert "-" characters into the alignment so that it minimizes differences among sites. The details of this algorithm are interesting, and there are many ways to perform alignments, but that is outside the scope of this tutorial. Below we simply call the muscle program and parse the output to write it back to a file in a way that will be easy for us to load it back in as a pandas DataFrame to look at again with color.

# execute this cell to create the align_fasta function

def align_fasta(fasta_file):
    
    # the output aligned file name to use
    aligned = fasta_file + ".aligned"
    
    # run muscle alignment program on fasta file to produce output file
    cmd = ["muscle", "-in", fasta_file, "-out", aligned]
    subprocess.call(cmd)

    # read in results of aligned file
    with open(aligned) as infile:
        fastas = infile.read().strip()[1:].split("\n>")
        
    # remove newlines from aligned file
    with open(aligned, 'w') as out:
        for fasta in fastas:

            # separate name and sequence
            name, sequence = fasta.split("\n", 1)

            # remove line breaks from sequence
            sequence = sequence.replace("\n", "")

            # write to file
            out.write(">{}\n{}\n".format(name, sequence))
    
    # print statement
    print("Wrote aligned file to {}".format(aligned))

# run the alignment program
align_fasta("./sequences.fasta")

Wrote aligned file to ./sequences.fasta.aligned

Look at the aligned fasta file¶

Muscle produced a new fasta file where the data is now aligned. Let's take a look to ensure that it looks better than our previous non-aligned data. Scroll around and see that there are now grey "-" characters imputed to improve the alignment. Is likely not perfect, but hopefully it's pretty accurate. Now when there is variation at a site we expect that variation represents substitutions that have happened at a single orthologous site.

# looking in the middle of the file it's clear they are not aligned
colored_slice("./sequences.fasta.aligned", 100, 200)

Infer a phylogeny¶

Similarly to how we called the program muscle above, here we will call the program raxmlHPC using the Python subprocess library again. The raxml software is complex and requires a number of arguments. You do not need to worry about those for now. You can look at the documentation for the software to learn more. The take home message here is that in just a minute or two this software can take our input aligned fasta file and infer a phylogeny as well as an estimate of confidence/support in that estimate. We'll discuss more later the complexities of how tree inference works.

# execute this cell to create the infer_tree command

def infer_tree(aligned_fasta):
    
    # remove previous results if they exist
    oldname = "RAxML_info.{}".format(os.path.basename(aligned_fasta))
    if os.path.exists(oldname):
        os.remove(oldname)
        
    # the raxml command
    cmd = [
        "raxmlHPC", 
        "-f", "a",
        "-s", aligned_fasta, 
        "-m", "GTRGAMMA", 
        "-n", os.path.basename(aligned_fasta),
        "-p", "12345", 
        "-x", "54321",
        "-w", os.path.realpath("."),
        "-N", "100",
    ]
        
    # print the raxml command
    print("running: {}\n".format(" ".join(cmd)))
    subprocess.call(cmd)
    
    # print statement
    print("Tree file written to ./RAxML_bipartitions.{}".format(
        os.path.basename(aligned_fasta)))

# run the tree inference 
infer_tree("./sequences.fasta.aligned")

running: raxmlHPC -f a -s ./sequences.fasta.aligned -m GTRGAMMA -n sequences.fasta.aligned -p 12345 -x 54321 -w /home/deren/Documents/genomics/2020-phylogenetics-I/notebooks -N 100

Tree file written to ./RAxML_bipartitions.sequences.fasta.aligned

Plot/draw the phylogeny¶

Now that you've inferred a tree the next step is of course to check your results by visualization. Plot the tree. We can then compare the topology to our expectation based on morphology, or geography, or other expectations. Do all of the branch lengths look reasonable? If not, perhaps you have a bad sequence in the alignment. Visually inspect it and consider removing it.

In the tree below everything looks reasonable and likely correct given my knowledge of this group of organisms. I rooted the tree on the two outgroup sequences using the .root() function and a wildcard selector that will match any names starting with "O_". In the final tree I also included a scalebar. You'll notice that on this tree the tips (black lines) do not all line up at 0 (the dashed lines are added just make the name labels align to make reading it easier). This is because this is not a time-scaled tree. The branch lengths are in units of substitutions per site. Some tips branches had more substitutions than others, which may reflect a higher rate of molecular evolution. Using this phylogeny we can now begin to test other evolutionary hypotheses.

# load the tree
tre = toytree.tree("./RAxML_bestTree.sequences.fasta.aligned")

# root on the outgroups
rtre = tre.root(regex="O_*")

# draw with support values
rtre.draw(width=700, tip_labels_align=True, scalebar=True);

Challenge: Use these functions to infer a tree for some other organismal group¶

We've now written functions to automate the minimal three steps required for inferring a phylogeny. And we can now reuse these functions to perform these tasks again with a different starting query to the database.

Action [6]: Infer a phylogeny for a group of organisms of your choice, and using a genetic marker of your choice, by following the instructions below.

1. Find data¶

Every organism is not in the NCBI database for all genes. The hardest part of this challenge will be finding a good genetic marker that is sequenced across enough taxa in the group you are searching. I would recommend starting by searching the internet for the name of common phylogenetic markers used in your organism (or in animals versus plants). You can search these terms directly on NCBI https://www.ncbi.nlm.nih.gov/nuccore/. Search for terms like "universal phylogenetic marker", or do a google scholar search for "phylogeny" + the name of the organism you're interested in. Find a phylogenetic study that used a marker and try to sample that data yourself here. Search the NCBI taxonomy database if needed to find taxonomic groups to restrict your search.

Once you develop a good search term then you can insert it into the code below. If you get stuck head to the chat room. Don't try to move on to the next step until you get some UIDs printed below. If your search term does not match anything the function below will raise an error telling you that.

After you get your ingroup samples you next need to find an outgroup. This means finding an organism outside of your core group, but still pretty closely related. This will require you to do some research as well. NCBI includes a lot of information about taxonomy, as do many other sites including Wikipedia, or published phylogenetic papers.

Change the 'term' and 'oterm' search strings below to perform your own search.

# call the function to get uids and store as a list
term = "COI[GENE] AND Delphinidae[ORGN]"
ingroup = get_uids(term=term, retmax=20)

# call the function to uids from an *outgroup* and add to list
oterm = "COI[GENE] AND Phocoenidae[ORGN]"
outgroup = get_uids(term=oterm, retmax=5)

# show uids
uids = ingroup + outgroup
print(uids)

['323100323', '323100313', '323100305', '296465606', '1624685187', '1624685173', '1624685159', '1624685145', '1624685131', '1624685117', '1624685103', '1624685089', '1624685075', '1624685061', '1624685047', '1624685033', '1624685019', '1624685005', '1624684991', '1624684977', '1139695912', '1139695898', '1139695884', '1139695870', '1022717220']

# get fasta data for the UIDs
get_fasta(uids)

Wrote 25 sequences to ./sequences.fasta

2. Align the fasta file¶

Call the alignment function on your fasta file and then use the colored_slice function to look at your alignment. Does it look good, or a complete mess? If it's a mess do not proceed to the next step. Maybe you need to try a different search term, a more restrictive definition of the gene segment you are searching for.

# run the alignment program
align_fasta("./sequences.fasta")

Wrote aligned file to ./sequences.fasta.aligned

# looking in the middle of the file it's clear they are not aligned
colored_slice("./sequences.fasta.aligned", 700, 900)

Infer the tree¶

If your alignment is good then there's not much to this step. Run it and wait for it to finish.

# run the tree inference 
infer_tree("./sequences.fasta.aligned")

running: raxmlHPC -f a -s ./sequences.fasta.aligned -m GTRGAMMA -n sequences.fasta.aligned -p 12345 -x 54321 -w /home/deren/Documents/genomics/2020-phylogenetics-I/notebooks -N 100

Tree file written to ./RAxML_bipartitions.sequences.fasta.aligned

Draw your tree¶

Before drawing your tree be sure to change the code below to select the proper names of your outgroup sequences to root the tree since their names are likely different than the ones that I used. If you need help with rooting the tree you can find more information here.

# load the tree
tre = toytree.tree("./RAxML_bestTree.sequences.fasta.aligned")

# root on the outgroups
rtre = tre.root(regex="[G,N]_*")

# draw with support values
rtre.draw(width=500, tip_labels_align=True, scalebar=True);

Question [7]: Reflect on your phylogeny above. Do you think it is accurate? Can you find a published phylogeny online for this group of organisms? If so, include it in your answer. What gene/marker did they use in that study? Do your trees show the same relationships for the selected organisms?

- Yes I think the tree is pretty accurate since most species appear monophyletic. But most of the tree is not resolved suggesting we could use more data. - Find a published study and paste a link or title here. - Name the gene in the study, here it would be COI. - State whether the trees are the same or different.

Question [8]: If you were writing a program to automate fetching data from online and inferring a tree, what would you do differently? What steps could be made easier? How do you imagine this will change as more and more annotated genes from assembled genomes are added to NCBI?

- It would be nice if you could more easily examine all of the stats of the samples before downloading the fasta data so that you could select more easily which samples to include. I think this could be done with further modifications to the API code above. - I think that the 'term' used to search becomes very important for extracting data from the database. To automate it you must come up with a term that finds all of the data available while still restricting it to only the data that is relevant to your study.

Orthology and alignments¶

When performing phylogenetic inference we make the assumption that the gene regions we are studying are homologous. To do so we rely on alignment algorithms to align genomic sites that are homologous. As you can see in the empirical alignments above, especially if you tried to align fairly distantly related organisms, is that sometimes genes diverge so significantly that even if we can still identify that the genes are homologous, there are few sites that can be confidently assigned as homologous, and there is a lot of room for error in alignments.

here are additional tools that can be applied to sequence alignments after they have been aligned and before they are used for phylogenetic inference that aim to mask (i.e., remove) sites that are poorly aligned to improve phylogenetic inference. One example is the tool GBLOCKs. This can apply filtering options such as "Min. seq. for flank pos.: 85%", "Max. contig. nonconserved pos.: 8", "Min. block length: 10", "Gaps in final blocks: no". Below is a function that can be applied to clean an alignment using the default GBLOCKS options.

def gblock_filter_alignment(aligned_fasta):
    """
    Takes an aligned fasta file and applies GBLOCKS to filter it and 
    write the results to a new fasta file. 
    """
    # run muscle alignment program on fasta file to produce output file
    cmd = [
        "Gblocks", aligned_fasta,
        "-t=d",
        "-e=.filt",
        "-d=n",
    ]
    print("command:", " ".join(cmd))
    subprocess.call(cmd)
    
    # print statement
    print("Wrote filtered fastsa file to {}".format(aligned_fasta + ".filt"))

Question [9]: Clean your alignment using GBLOCKS and infer the tree again. Did you get a different result? Do you think it was an improvement? If not, can you think of an alternative way to improve your alignment?

- In this case although GBLOCKs excluded many sites from the alignment, the final inferred tree looks very similar. Other ways to improve the alignment might involve sampling more evenly so that there are fewer replicates within species, and thus more similar differences among species.

gblock_filter_alignment("./sequences.fasta.aligned")

command: Gblocks ./sequences.fasta.aligned -t=d -e=.filt -d=n
Wrote filtered fastsa file to ./sequences.fasta.aligned.filt

# run the tree inference 
infer_tree("./sequences.fasta.aligned.filt")

running: raxmlHPC -f a -s ./sequences.fasta.aligned.filt -m GTRGAMMA -n sequences.fasta.aligned.filt -p 12345 -x 54321 -w /home/deren/Documents/genomics/2020-phylogenetics-I/notebooks -N 100

Tree file written to ./RAxML_bipartitions.sequences.fasta.aligned.filt

# load the tree
tre = toytree.tree("./RAxML_bestTree.sequences.fasta.aligned.filt")

# root on the outgroups
rtre = tre.root(regex="[G,N]_*")

# draw with support values
rtre.draw(width=500, tip_labels_align=True, scalebar=True);

	100	101	102	103	104	105	106	107	108	109	110	111	112	113	114	115	116	117	118	119	120	121	122	123	124	125	126	127	128	129	130	131	132	133	134	135	136	137	138	139	140	141	142	143	144	145	146	147	148	149	150	151	152	153	154	155	156	157	158	159	160	161	162	163	164	165	166	167	168	169	170	171	172	173	174	175	176	177	178	179	180	181	182	183	184	185	186	187	188	189	190	191	192	193	194	195	196	197	198	199
O_lycoctoni_MK588421.1	T	A	T	T	T	A	C	A	G	C	T	A	G	A	A	A	G	A	T	C	T	C	A	A	C	A	A	C	A	C	A	G	T	T	T	C	C	T	A	T	A	T	C	C	A	C	T	T	A	T	T	T	T	T	C	A	G	G	A	G	T	A	T	A	T	T	T	A	T	T	C	A	T	T	T	G	C	T	C	A	T	G	A	T	C	G	T	G	G	T	T	T	C	G	G	T	A	G	A	T
O_flava_MK588420.1	C	G	A	T	T	C	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	T	T	T	A	T	C	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	A	A	C	A	A	A	A	A	-	-	-	-	T	T	G	G	A	A	-
P_davidii_AY949766.1	T	G	A	T	C	C	T	A	A	A	A	G	G	G	A	A	T	A	A	A	T	G	G	A	A	A	A	A	A	T	A	G	C	A	T	G	T	C	G	G	A	T	C	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	A	A	T	A	G	A	G	A	G	T	T	C	T	G	G	G	G	A	-
P_cranolopha_AY949759.1	T	G	A	T	C	C	T	A	A	A	A	G	G	G	A	A	T	A	A	A	T	G	G	A	A	A	A	A	A	T	A	G	C	A	T	G	T	C	G	T	A	T	C	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	A	A	T	A	G	A	G	A	G	T	T	C	T	G	G	G	G	A	-
P_thamnophila_AY949761.1	T	G	A	T	C	C	T	G	A	A	G	G	G	G	A	A	T	A	A	A	T	G	G	A	A	A	A	A	A	T	A	G	C	A	T	G	T	C	G	T	A	T	C	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	A	A	T	A	G	A	G	A	G	T	T	C	T	G	G	G	G	A	-
P_furfuracea_AY949765.1	T	G	A	T	C	C	T	G	A	A	A	G	G	G	A	A	T	A	A	A	T	G	G	A	A	A	A	A	A	T	A	G	C	A	T	G	T	C	G	T	A	T	C	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	A	A	T	A	G	A	G	A	G	T	T	C	T	G	G	G	G	A	-
P_axillaris_AY949768.1	T	G	A	T	C	C	T	G	A	A	A	G	G	G	A	A	T	A	A	A	T	G	G	A	A	A	A	A	A	T	A	G	C	A	T	G	T	C	G	T	A	T	C	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	A	A	T	A	G	A	G	A	G	T	T	C	T	G	G	G	G	A	-
P_trichocymba_AY949760.1	T	G	A	T	C	C	T	G	A	A	A	G	G	G	A	A	T	A	A	A	T	G	G	A	A	A	A	A	A	T	A	G	C	A	T	G	T	C	G	T	A	T	C	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	A	A	T	A	G	A	G	A	G	T	T	C	T	G	G	G	G	A	-
P_schistostegia_AY949767.1	T	G	A	T	C	C	T	G	A	A	A	G	G	G	A	A	T	A	A	A	T	G	G	A	A	A	A	A	A	T	A	G	C	A	T	G	T	C	G	T	A	T	C	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	A	A	T	A	G	A	G	A	G	T	T	C	T	G	G	G	G	A	-
P_szetschuanica_LC377634.1	T	G	A	T	C	C	T	G	A	A	A	G	G	G	A	A	T	A	A	A	T	G	G	A	A	A	A	A	A	T	A	G	C	A	T	G	T	C	G	T	A	T	C	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	A	A	T	A	G	A	G	A	G	T	T	C	T	G	T	G	G	A	-
P_metaszetschuanica_LC377635.1	T	G	A	T	C	C	T	G	A	A	A	G	G	G	A	A	T	A	A	A	T	G	G	A	A	A	A	A	A	T	A	G	C	A	T	G	T	C	G	T	A	T	C	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	A	A	T	A	G	A	G	A	G	T	T	C	T	G	T	G	G	A	-
P_verticillata_AY949769.1	T	G	A	T	C	C	T	G	A	A	A	G	G	G	A	A	T	A	A	A	T	G	G	A	A	A	A	A	A	T	A	G	C	A	T	G	T	C	G	T	A	T	C	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	A	A	T	A	G	A	G	A	G	T	T	C	T	G	T	G	G	A	-
P_verticillata_AY949762.1	T	G	A	T	C	C	T	G	A	A	A	G	G	G	A	A	T	A	A	A	T	G	G	A	A	A	A	A	A	T	A	G	C	A	T	G	T	C	G	T	A	T	C	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	A	A	T	A	G	A	G	A	G	T	T	C	T	G	T	G	G	A	-
P_refracta_LC377633.1	T	G	A	T	C	C	T	G	A	A	A	G	G	G	A	A	T	A	A	A	T	G	G	A	A	A	A	A	A	T	A	G	C	A	T	G	T	C	G	T	A	T	C	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	A	A	T	A	G	A	G	A	G	T	T	C	T	G	T	G	G	A	-
P_refracta_LC377632.1	T	G	A	T	C	C	T	G	A	A	A	G	G	G	A	A	T	A	A	A	T	G	G	A	A	A	A	A	A	T	A	G	C	A	T	G	T	C	G	T	A	T	C	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	A	A	T	A	G	A	G	A	G	T	T	C	T	G	T	G	G	A	-
P_refracta_LC377631.1	T	G	A	T	C	C	T	G	A	A	A	G	G	G	A	A	T	A	A	A	T	G	G	A	A	A	A	A	A	T	A	G	C	A	T	G	T	C	G	T	A	T	C	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	A	A	T	A	G	A	G	A	G	T	T	C	T	G	T	G	G	A	-
P_refracta_LC377630.1	T	G	A	T	C	C	T	G	A	A	A	G	G	G	A	A	T	A	A	A	T	G	G	A	A	A	A	A	A	T	A	G	C	A	T	G	T	C	G	T	A	T	C	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	A	A	T	A	G	A	G	A	G	T	T	C	T	G	T	G	G	A	-
P_refracta_LC377629.1	T	G	A	T	C	C	T	G	A	A	A	G	G	G	A	A	T	A	A	A	T	G	G	A	A	A	A	A	A	T	A	G	C	A	T	G	T	C	G	T	A	T	C	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	A	A	T	A	G	A	G	A	G	T	T	C	T	G	T	G	G	A	-
P_refracta_LC377628.1	T	G	A	T	C	C	T	G	A	A	A	G	G	G	A	A	T	A	A	A	T	G	G	A	A	A	A	A	A	T	A	G	C	A	T	G	T	C	G	T	A	T	C	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	A	A	T	A	G	A	G	A	G	T	T	C	T	G	T	G	G	A	-
P_anas_AY949764.1	T	G	A	T	C	C	T	G	A	A	A	G	G	G	A	A	T	A	A	A	T	G	G	A	A	A	A	A	A	T	A	G	C	A	T	G	T	C	G	T	A	T	C	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	A	A	T	A	G	A	G	A	G	T	T	C	T	G	T	G	G	A	-
P_integrifolia_AY949763.1	T	G	A	T	C	C	T	G	A	A	A	G	G	G	A	A	T	A	A	A	T	G	G	A	A	A	A	A	A	T	A	G	C	A	T	G	T	C	G	T	A	T	C	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	A	A	T	A	G	A	G	A	G	T	T	C	T	G	T	G	G	A	-
P_ternata_AY949758.1	T	G	A	T	C	C	T	G	A	A	A	G	G	G	A	A	T	A	A	A	T	G	G	A	A	A	A	A	A	T	A	G	C	A	T	G	T	C	G	T	A	T	C	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	A	A	T	A	G	A	G	A	G	T	T	C	T	G	T	G	G	A	-

	700	701	702	703	704	705	706	707	708	709	710	711	712	713	714	715	716	717	718	719	720	721	722	723	724	725	726	727	728	729	730	731	732	733	734	735	736	737	738	739	740	741	742	743	744	745	746	747	748	749	750	751	752	753	754	755	756	757	758	759	760	761	762	763	764	765	766	767	768	769	770	771	772	773	774	775	776	777	778	779	780	781	782	783	784	785	786	787	788	789	790	791	792	793	794	795	796	797	798	799	800	801	802	803	804	805	806	807	808	809	810	811	812	813	814	815	816	817	818	819	820	821	822	823	824	825	826	827	828	829	830	831	832	833	834	835	836	837	838	839	840	841	842	843	844	845	846	847	848	849	850	851	852	853	854	855	856	857	858	859	860	861	862	863	864	865	866	867	868	869	870	871	872	873	874	875	876	877	878	879	880	881	882	883	884	885	886	887	888	889	890	891	892	893	894	895	896	897	898	899
P_phocoena_KX079493.1	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	G	T	A	T	A	T	A	G	G	G	A	T	G	G	T	A	T	G	A	G	C	T	A	T	G	G	T	C	T	C	T	A	T	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	C	G	G	T	T	T	C	C	T	A	G	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	G	T	T	T	T	A	T	C
G_macrorhynchus_MK790767.1	C	A	G	C	A	A	A	C	C	C	T	A	A	A	A	A	A	G	G	A	A	T	G	A	A	A	G	T	A	A	G	C	A	C	A	A	C	T	A	T	T	G	C	A	C	G	T	A	A	A	A	A	C	G	T	T	A	G	G	T	C	A	A	G	G	T	G	T	A	A	C	C	T	A	T	G	G	A	T	T	G	G	G	A	A	G	A	A	A	T	G	G	G	C	T	A	C	A	T	T	T	T	C	T	A	C	A	A	T	A	A	G	A	A	C	A	C	C	C	C	C	T	A	A	A	C	T	T	A	T	A	C	G	A	A	A	G	T	T	T	T	T	A	T	G	A	A	A	T	C	T	A	A	A	A	A	C	T	A	A	A	G	G	A	G	G	A	T	T	T	A	G	C	A	G	T	A	A	A	T	T	A	A	G	A	A	T	A	G	A	A	T	G	C	T	T	A	A	T	T
G_macrorhynchus_MK790778.1	C	A	G	C	A	A	A	C	C	C	T	A	A	A	A	A	A	G	G	A	A	T	G	A	A	A	G	T	A	A	G	C	A	C	A	A	C	T	A	T	T	G	C	A	C	G	T	A	A	A	A	A	C	G	T	T	A	G	G	T	C	A	A	G	G	T	G	T	A	A	C	C	T	A	T	G	G	A	T	T	G	G	G	A	A	G	A	A	A	T	G	G	G	C	T	A	C	A	T	T	T	T	C	T	A	C	A	A	T	A	A	G	A	A	C	A	C	C	C	C	C	T	A	A	A	C	T	T	A	C	A	C	G	A	A	A	G	T	T	T	T	T	A	T	G	A	A	A	T	C	T	A	A	A	A	A	C	T	A	A	A	G	G	A	G	G	A	T	T	T	A	G	C	A	G	T	A	A	A	T	T	A	A	G	A	A	T	A	G	A	A	T	G	C	T	T	A	A	T	T
G_macrorhynchus_MK790776.1	C	A	G	C	A	A	A	C	C	C	T	A	A	A	A	A	A	G	G	A	A	T	G	A	A	A	G	T	A	A	G	C	A	C	A	A	C	T	A	T	T	G	C	A	C	G	T	A	A	A	A	A	C	G	T	T	A	G	G	T	C	A	A	G	G	T	G	T	A	A	C	C	T	A	T	G	G	A	T	T	G	G	G	A	A	G	A	A	A	T	G	G	G	C	T	A	C	A	T	T	T	T	C	T	A	C	A	A	T	A	A	G	A	A	C	A	C	C	C	C	C	T	A	A	A	C	T	T	A	C	A	C	G	A	A	A	G	T	T	T	T	T	A	T	G	A	A	A	T	C	T	A	A	A	A	A	C	T	A	A	A	G	G	A	G	G	A	T	T	T	A	G	C	A	G	T	A	A	A	T	T	A	A	G	A	A	T	A	G	A	A	T	G	C	T	T	A	A	T	T
G_macrorhynchus_MK790782.1	C	A	G	C	A	A	A	C	C	C	T	A	A	A	A	A	A	G	G	A	A	T	G	A	A	A	G	T	A	A	G	C	A	C	A	A	C	T	A	T	T	G	C	A	C	G	T	A	A	A	A	A	C	G	T	T	A	G	G	T	C	A	A	G	G	T	G	T	A	A	C	C	T	A	T	G	G	A	T	T	G	G	G	A	A	G	A	A	A	T	G	G	G	C	T	A	C	A	T	T	T	T	C	T	A	C	A	A	T	A	A	G	A	A	C	A	C	C	C	C	C	T	A	A	A	C	T	T	A	C	A	C	G	A	A	A	G	T	T	T	T	T	A	T	G	A	A	A	T	C	T	A	A	A	A	A	C	T	A	A	A	G	G	A	G	G	A	T	T	T	A	G	C	A	G	T	A	A	A	T	T	A	A	G	A	A	T	A	G	A	A	T	G	C	T	T	A	A	T	T
G_macrorhynchus_MK790781.1	C	A	G	C	A	A	A	C	C	C	T	A	A	A	A	A	A	G	G	A	A	T	G	A	A	A	G	T	A	A	G	C	A	C	A	A	C	T	A	T	T	G	C	A	C	G	T	A	A	A	A	A	C	G	T	T	A	G	G	T	C	A	A	G	G	T	G	T	A	A	C	C	T	A	T	G	G	A	T	T	G	G	G	A	A	G	A	A	A	T	G	G	G	C	T	A	C	A	T	T	T	T	C	T	A	C	A	A	T	A	A	G	A	A	C	A	C	C	C	C	C	T	A	A	A	C	T	T	A	C	A	C	G	A	A	A	G	T	T	T	T	T	A	T	G	A	A	A	T	C	T	A	A	A	A	A	C	T	A	A	A	G	G	A	G	G	A	T	T	T	A	G	C	A	G	T	A	A	A	T	T	A	A	G	A	A	T	A	G	A	A	T	G	C	T	T	A	A	T	T
G_macrorhynchus_MK790780.1	C	A	G	C	A	A	A	C	C	C	T	A	A	A	A	A	A	G	G	A	A	T	G	A	A	A	G	T	A	A	G	C	A	C	A	A	C	T	A	T	T	G	C	A	C	G	T	A	A	A	A	A	C	G	T	T	A	G	G	T	C	A	A	G	G	T	G	T	A	A	C	C	T	A	T	G	G	A	T	T	G	G	G	A	A	G	A	A	A	T	G	G	G	C	T	A	C	A	T	T	T	T	C	T	A	C	A	A	T	A	A	G	A	A	C	A	C	C	C	C	C	T	A	A	A	C	T	T	A	C	A	C	G	A	A	A	G	T	T	T	T	T	A	T	G	A	A	A	T	C	T	A	A	A	A	A	C	T	A	A	A	G	G	A	G	G	A	T	T	T	A	G	C	A	G	T	A	A	A	T	T	A	A	G	A	A	T	A	G	A	A	T	G	C	T	T	A	A	T	T
G_macrorhynchus_MK790779.1	C	A	G	C	A	A	A	C	C	C	T	A	A	A	A	A	A	G	G	A	A	T	G	A	A	A	G	T	A	A	G	C	A	C	A	A	C	T	A	T	T	G	C	A	C	G	T	A	A	A	A	A	C	G	T	T	A	G	G	T	C	A	A	G	G	T	G	T	A	A	C	C	T	A	T	G	G	A	T	T	G	G	G	A	A	G	A	A	A	T	G	G	G	C	T	A	C	A	T	T	T	T	C	T	A	C	A	A	T	A	A	G	A	A	C	A	C	C	C	C	C	T	A	A	A	C	T	T	A	C	A	C	G	A	A	A	G	T	T	T	T	T	A	T	G	A	A	A	T	C	T	A	A	A	A	A	C	T	A	A	A	G	G	A	G	G	A	T	T	T	A	G	C	A	G	T	A	A	A	T	T	A	A	G	A	A	T	A	G	A	A	T	G	C	T	T	A	A	T	T
G_macrorhynchus_MK790777.1	C	A	G	C	A	A	A	C	C	C	T	A	A	A	A	A	A	G	G	A	A	T	G	A	A	A	G	T	A	A	G	C	A	C	A	A	C	T	A	T	T	G	C	A	C	G	T	A	A	A	A	A	C	G	T	T	A	G	G	T	C	A	A	G	G	T	G	T	A	A	C	C	T	A	T	G	G	A	T	T	G	G	G	A	A	G	A	A	A	T	G	G	G	C	T	A	C	A	T	T	T	T	C	T	A	C	A	A	T	A	A	G	A	A	C	A	C	C	C	C	C	T	A	A	A	C	T	T	A	C	A	C	G	A	A	A	G	T	T	T	T	T	A	T	G	A	A	A	T	C	T	A	A	A	A	A	C	T	A	A	A	G	G	A	G	G	A	T	T	T	A	G	C	A	G	T	A	A	A	T	T	A	A	G	A	A	T	A	G	A	A	T	G	C	T	T	A	A	T	T
G_macrorhynchus_MK790775.1	C	A	G	C	A	A	A	C	C	C	T	A	A	A	A	A	A	G	G	A	A	T	G	A	A	A	G	T	A	A	G	C	A	C	A	A	C	T	A	T	T	G	C	A	C	G	T	A	A	A	A	A	C	G	T	T	A	G	G	T	C	A	A	G	G	T	G	T	A	A	C	C	T	A	T	G	G	A	T	T	G	G	G	A	A	G	A	A	A	T	G	G	G	C	T	A	C	A	T	T	T	T	C	T	A	C	A	A	T	A	A	G	A	A	C	A	C	C	C	C	C	T	A	A	A	C	T	T	A	C	A	C	G	A	A	A	G	T	T	T	T	T	A	T	G	A	A	A	T	C	T	A	A	A	A	A	C	T	A	A	A	G	G	A	G	G	A	T	T	T	A	G	C	A	G	T	A	A	A	T	T	A	A	G	A	A	T	A	G	A	A	T	G	C	T	T	A	A	T	T
G_macrorhynchus_MK790774.1	C	A	G	C	A	A	A	C	C	C	T	A	A	A	A	A	A	G	G	A	A	T	G	A	A	A	G	T	A	A	G	C	A	C	A	A	C	T	A	T	T	G	C	A	C	G	T	A	A	A	A	A	C	G	T	T	A	G	G	T	C	A	A	G	G	T	G	T	A	A	C	C	T	A	T	G	G	A	T	T	G	G	G	A	A	G	A	A	A	T	G	G	G	C	T	A	C	A	T	T	T	T	C	T	A	C	A	A	T	A	A	G	A	A	C	A	C	C	C	C	C	T	A	A	A	C	T	T	A	C	A	C	G	A	A	A	G	T	T	T	T	T	A	T	G	A	A	A	T	C	T	A	A	A	A	A	C	T	A	A	A	G	G	A	G	G	A	T	T	T	A	G	C	A	G	T	A	A	A	T	T	A	A	G	A	A	T	A	G	A	A	T	G	C	T	T	A	A	T	T
G_macrorhynchus_MK790773.1	C	A	G	C	A	A	A	C	C	C	T	A	A	A	A	A	A	G	G	A	A	T	G	A	A	A	G	T	A	A	G	C	A	C	A	A	C	T	A	T	T	G	C	A	C	G	T	A	A	A	A	A	C	G	T	T	A	G	G	T	C	A	A	G	G	T	G	T	A	A	C	C	T	A	T	G	G	A	T	T	G	G	G	A	A	G	A	A	A	T	G	G	G	C	T	A	C	A	T	T	T	T	C	T	A	C	A	A	T	A	A	G	A	A	C	A	C	C	C	C	C	T	A	A	A	C	T	T	A	C	A	C	G	A	A	A	G	T	T	T	T	T	A	T	G	A	A	A	T	C	T	A	A	A	A	A	C	T	A	A	A	G	G	A	G	G	A	T	T	T	A	G	C	A	G	T	A	A	A	T	T	A	A	G	A	A	T	A	G	A	A	T	G	C	T	T	A	A	T	T
G_macrorhynchus_MK790770.1	C	A	G	C	A	A	A	C	C	C	T	A	A	A	A	A	A	G	G	A	A	T	G	A	A	A	G	T	A	A	G	C	A	C	A	A	C	T	A	T	T	G	C	A	C	G	T	A	A	A	A	A	C	G	T	T	A	G	G	T	C	A	A	G	G	T	G	T	A	A	C	C	T	A	T	G	G	A	T	T	G	G	G	A	A	G	A	A	A	T	G	G	G	C	T	A	C	A	T	T	T	T	C	T	A	C	A	A	T	A	A	G	A	A	C	A	C	C	C	C	C	T	A	A	A	C	T	T	A	C	A	C	G	A	A	A	G	T	T	T	T	T	A	T	G	A	A	A	T	C	T	A	A	A	A	A	C	T	A	A	A	G	G	A	G	G	A	T	T	T	A	G	C	A	G	T	A	A	A	T	T	A	A	G	A	A	T	A	G	A	A	T	G	C	T	T	A	A	T	T
G_macrorhynchus_MK790768.1	C	A	G	C	A	A	A	C	C	C	T	A	A	A	A	A	A	G	G	A	A	T	G	A	A	A	G	T	A	A	G	C	A	C	A	A	C	T	A	T	T	G	C	A	C	G	T	A	A	A	A	A	C	G	T	T	A	G	G	T	C	A	A	G	G	T	G	T	A	A	C	C	T	A	T	G	G	A	T	T	G	G	G	A	A	G	A	A	A	T	G	G	G	C	T	A	C	A	T	T	T	T	C	T	A	C	A	A	T	A	A	G	A	A	C	A	C	C	C	C	C	T	A	A	A	C	T	T	A	C	A	C	G	A	A	A	G	T	T	T	T	T	A	T	G	A	A	A	T	C	T	A	A	A	A	A	C	T	A	A	A	G	G	A	G	G	A	T	T	T	A	G	C	A	G	T	A	A	A	T	T	A	A	G	A	A	T	A	G	A	A	T	G	C	T	T	A	A	T	T
G_macrorhynchus_MK790769.1	C	A	G	C	A	A	A	C	C	C	T	A	A	A	A	A	A	G	G	A	A	T	G	A	A	A	G	T	A	A	G	C	A	C	A	A	C	T	A	T	T	G	C	A	C	G	T	A	A	A	A	A	C	G	T	T	A	G	G	T	C	A	A	G	G	T	G	T	A	A	C	C	T	A	T	G	G	A	T	T	G	G	G	A	A	G	A	A	A	T	G	G	G	C	T	A	C	A	T	T	T	T	C	T	A	C	A	A	T	A	A	G	A	A	C	A	C	C	C	C	C	T	A	A	A	C	T	T	A	C	A	C	G	A	A	A	G	T	T	T	T	T	A	T	G	A	A	A	T	C	T	A	A	A	A	A	C	T	A	A	A	G	G	A	G	G	A	T	T	T	A	G	C	A	G	T	A	A	A	T	T	A	A	G	A	A	T	A	G	A	A	T	G	C	T	T	A	A	T	T
G_macrorhynchus_MK790772.1	C	A	G	C	A	A	A	C	C	C	T	A	A	A	A	A	A	G	G	A	A	T	G	A	A	A	G	T	A	A	G	C	A	C	A	A	C	T	A	T	T	G	C	A	C	G	T	A	A	A	A	A	C	G	T	T	A	G	G	T	C	A	A	G	G	T	G	T	A	A	C	C	T	A	T	G	G	A	T	T	G	G	G	A	A	G	A	A	A	T	G	G	G	C	T	A	C	A	T	T	T	T	C	T	A	C	A	A	T	A	A	G	A	A	C	A	C	C	C	C	C	T	A	A	A	C	T	T	A	C	A	C	G	A	A	A	G	T	T	T	T	T	A	T	G	A	A	A	T	C	T	A	A	A	A	A	C	T	A	A	A	G	G	A	G	G	A	T	T	T	A	G	C	A	G	T	A	A	A	T	T	A	A	G	A	A	T	A	G	A	A	T	G	C	T	T	A	A	T	T
G_macrorhynchus_MK790771.1	C	A	G	C	A	A	A	C	C	C	T	A	A	A	A	A	A	G	G	A	A	T	G	A	A	A	G	T	A	A	G	C	A	C	A	A	C	T	A	T	T	G	C	A	C	G	T	A	A	A	A	A	C	G	T	T	A	G	G	T	C	A	A	G	G	T	G	T	A	A	C	C	T	A	T	G	G	A	T	T	G	G	G	A	A	G	A	A	A	T	G	G	G	C	T	A	C	A	T	T	T	T	C	T	A	C	A	A	T	A	A	G	A	A	C	A	C	C	C	C	C	T	A	A	A	C	T	T	A	C	A	C	G	A	A	A	G	T	T	T	T	T	A	T	G	A	A	A	T	C	T	A	A	A	A	A	C	T	A	A	A	G	G	A	G	G	A	T	T	T	A	G	C	A	G	T	A	A	A	T	T	A	A	G	A	A	T	A	G	A	A	T	G	C	T	T	A	A	T	T
N_phocaenoides_KX650871.1	C	A	G	C	A	A	A	C	C	C	T	-	A	A	A	A	A	G	G	A	A	T	A	A	A	A	G	T	A	A	G	C	A	T	A	A	T	T	A	C	A	G	C	A	C	A	T	A	A	A	A	A	C	G	T	T	A	G	G	T	C	A	A	G	G	T	G	T	A	A	C	C	T	A	T	G	G	G	T	T	G	G	G	A	A	G	A	A	A	T	G	G	G	C	T	A	C	A	T	T	T	T	C	T	A	T	A	A	C	A	A	G	A	A	C	A	C	C	C	C	T	T	A	A	G	C	C	C	A	C	A	C	G	A	A	A	G	T	T	T	T	T	A	T	G	A	A	A	T	C	T	A	A	A	A	A	C	T	A	A	A	G	G	A	G	G	A	T	T	T	A	G	C	A	G	T	A	A	A	T	T	A	A	G	A	A	T	A	G	A	A	T	G	C	T	T	A	A	T	T
N_phocaenoides_KX650869.1	C	A	G	C	A	A	A	C	C	C	T	-	A	A	A	A	A	G	G	A	A	T	A	A	A	A	G	T	A	A	G	C	A	T	A	A	T	T	A	C	A	G	C	A	C	A	T	A	A	A	A	A	C	G	T	T	A	G	G	T	C	A	A	G	G	T	G	T	A	A	C	C	T	A	T	G	G	G	T	T	G	G	G	A	A	G	A	A	A	T	G	G	G	C	T	A	C	A	T	T	T	T	C	T	A	T	A	A	C	A	A	G	A	A	C	A	C	C	C	C	T	T	A	A	G	C	C	C	A	C	A	C	G	A	A	A	G	T	T	T	T	T	A	T	G	A	A	A	T	C	T	A	A	A	A	A	C	T	A	A	A	G	G	A	G	G	A	T	T	T	A	G	C	A	G	T	A	A	A	T	T	A	A	G	A	A	T	A	G	A	A	T	G	C	T	T	A	A	T	T
N_phocaenoides_KX650872.1	C	A	G	C	A	A	A	C	C	C	T	-	A	A	A	A	A	G	G	A	A	T	A	A	A	A	G	T	A	A	G	C	A	T	A	A	T	T	A	C	A	G	C	A	C	A	T	A	A	A	A	A	C	G	T	T	A	G	G	T	C	A	A	G	G	T	G	T	A	A	C	C	T	A	T	G	G	G	T	T	G	G	G	A	A	G	A	A	A	T	G	G	G	C	T	A	C	A	T	T	T	T	C	T	A	T	A	A	C	A	A	G	A	A	C	A	C	C	C	C	T	T	A	A	G	C	C	C	A	C	A	C	G	A	A	A	G	T	T	T	T	T	A	T	G	A	A	A	T	C	T	A	A	A	A	A	C	T	A	A	A	G	G	A	G	G	A	T	T	T	A	G	C	A	G	T	A	A	A	T	T	A	A	G	A	A	T	A	G	A	A	T	G	C	T	T	A	A	T	T
N_phocaenoides_KX650870.1	C	A	G	C	A	A	A	C	C	C	T	-	A	A	A	A	A	G	G	A	A	T	A	A	A	A	G	T	A	A	G	C	A	T	A	A	T	T	A	C	A	G	C	A	C	A	T	A	A	A	A	A	C	G	T	T	A	G	G	T	C	A	A	G	G	T	G	T	A	A	C	C	T	A	T	G	G	G	T	T	G	G	G	A	A	G	A	A	A	T	G	G	G	C	T	A	C	A	T	T	T	T	C	T	A	T	A	A	C	A	A	G	A	A	C	A	C	C	C	C	T	T	A	A	A	C	C	C	A	C	A	C	G	A	A	A	G	T	T	T	T	T	A	T	G	A	A	A	T	C	T	A	A	A	A	A	C	T	A	A	A	G	G	A	G	G	A	T	T	T	A	G	C	A	G	T	A	A	A	T	T	A	A	G	A	A	T	A	G	A	A	T	G	C	T	T	A	A	T	T
S_guianensis_HQ918433.1	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	A	G	C	T	C	A	T	G	C	T	T	T	C	G	T	A	A	T	A	-	-	-	-	-	-	-	-	-	-	A	T	T	T	T	C	T	T	T	-	A	T	A	G	T	T	A	T	A	C	C	T	A	T	C	A	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	T	A	A	T	T	G	G	G	G	G	T	T	T	C	G	-	-	-	-	-	-	-	-	-	-	G	G	A	A	C	T	G	A	C	T	A	G	T	T	C	C	T	T
S_guianensis_HQ918428.1	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	A	G	C	T	C	A	T	G	C	T	T	T	C	G	T	A	A	T	A	-	-	-	-	-	-	-	-	-	-	A	T	T	T	T	C	T	T	T	-	A	T	A	G	T	T	A	T	A	C	C	T	A	T	C	A	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	T	A	A	T	T	G	G	G	G	G	T	T	T	C	G	-	-	-	-	-	-	-	-	-	-	G	G	A	A	C	T	G	A	C	T	A	G	T	T	C	C	T	T
S_guianensis_HQ918424.1	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	A	G	C	T	C	A	T	G	C	T	T	T	C	G	T	A	A	T	A	-	-	-	-	-	-	-	-	-	-	A	T	T	T	T	C	T	T	T	-	A	T	A	G	T	T	A	T	A	C	C	T	A	T	C	A	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	T	A	A	T	T	G	G	G	G	G	T	T	T	C	G	-	-	-	-	-	-	-	-	-	-	G	G	A	A	C	T	G	A	C	T	A	G	T	T	C	C	T	T
D_sp._GU674138.1	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	A	G	C	T	C	A	T	G	C	C	T	T	C	G	T	A	A	T	A	-	-	-	-	-	-	-	-	-	-	A	T	T	T	T	C	T	T	T	-	A	T	A	G	T	T	A	T	A	C	C	T	A	T	C	A	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	T	A	A	T	T	G	G	A	G	G	T	T	T	T	G	-	-	-	-	-	-	-	-	-	-	G	G	A	A	C	T	G	A	T	T	A	G	T	C	C	C	C	T