Rooting trees¶
Rooting or re-rooting trees orients the direction of ancestor-descendant relationships and thus provides "polarization" for the direction of evolution. Most tree inference algorithms return an unrooted tree as a result, and it is up to the researcher to select the placement of the root based on external information (e.g., outgroup designation) or analytical methods (e.g., based on edge lengths).
This tutorial section provides background on how rooting or re-rooting affects a tree data structure and how to choose the edge and position on which to root a tree. It also clarifies several common misconceptions and sources of error during tree rooting.
import toytree
# an example tree with outgroup (r3,r4)
tree = toytree.rtree.unittree(ntips=5, seed=123)
# create an unrooted tree
utree = tree.unroot()
# root the tree on its original outgroup
rtree = utree.root("r3", "r4")
# re-root the tree on an alternative outgroup
atree = rtree.root("r2")
Take Home
A tree can be manually rooted on an outgroup using tree.root(...), or using one of several algorithms to estimate the root placement. A tree can be unrooted using tree.unroot().
The treenode¶
All ToyTree objects contain a node that is designated the treenode, and which represents the top level Node
object in the collection of nodes that make up the tree hierarchy. This node exists in a tree whether it is rooted or unrooted. We use the term treenode rather than root node to refer to this top level node, since it is not always a true root node, as in the case of an unrooted tree. This can be a confusing point, but understanding it will help to make clear what the process of tree rooting actually represents. It is a little more complex than simply moving or relabeling a node, as described for the three operations below.
Rooting¶
When rooting an unrooted tree, a new node is inserted on an edge, splitting it into two (it helps me to think of it visually as pinching the edge and pulling it back to insert the node). The new node serves as the treenode. The number of nodes in the tree increases by 1.
Unrooting¶
When unrooting a rooted tree, the current treenode is removed, and an existing node in the tree (the previous treenode's left child) is designated as the treenode. The number of nodes in the tree decreases by 1.
Re-rooting¶
When re-rooting a rooted tree, the current treenode is removed and a new node is inserted on a different edge of the tree. The new node serves as the treenode. The number of nodes in the tree does not change.
# the .treenode is the top-level node
tree.treenode
<Node(idx=8)>
# it is also accessible as the last indexed node
tree[-1]
<Node(idx=8)>
Rooting visualized¶
You cannot verify that a tree is unrooted based simply on a visualization, since an unrooted tree can look the same as a rooted tree that contains a polytomy at its root. Thus, it is best practice to mention in a figure legend whether and how a tree is rooted. Another way of hinting that a tree is rooted versus unrooted is by using different tree layouts for visualization. This is demonstrated below.
This first set of drawing uses the default linear down ('d') layout. This places the treenode at the top of the drawing, which makes it easy to interpret how far each other node is from the treenode. This style makes sense for interpreting rooted trees (left and right), but is misleading for the unrooted tree (middle), since it gives the impression visually that the tree is rooted at node 7.
# draw the trees rooted
c, a, m = toytree.mtree([tree, utree, atree]).draw(ts='p', layout='d');
a[0].label.text = "rooted tree"
a[1].label.text = "unrooted tree"
a[2].label.text = "alt rooted tree"
The drawings below show the same trees but using the unrooted/undirected ('un') layout. This places the treenode near the center and projects edges away from it in a way that minimizes overlaps. You can more clearly see by comparing the three trees in this layout that the structure (topology) of the tree does not change during rooting. The only difference between the middle and two outer drawings is the addition of an extra node (node 8) that is inserted either between nodes 6 and 7 on the left tree, or between nodes 0 and 7 on the right tree (note: node idx labels change between trees with different rootings). An undirected layout would typically be used to visualize an unrooted tree (middle) but is not the most informative for rooted trees, since it is harder to interpret the distances of nodes from the root.
# draw the trees unrooted
c, a, m = toytree.mtree([tree, utree, atree]).draw(ts='p', layout='un');
a[0].label.text = "rooted tree"
a[1].label.text = "unrooted tree"
a[2].label.text = "alt rooted tree"
Tip
A key point is to recognize the difference between whether a tree is rooted or not, and whether a tree is drawn using an unrooted/undirected layout or not. These are two distinct things.
Rooting methods¶
toytree
currently supports three methods for rooting a tree: (1) manually; (2) by the midpoint (Farris 1972); and (3) by the minimal ancestor deviation (Tria et al. 2017). The first requires the user to designate the outgroup and optionally specify the length along the edge at which to insert the new root node. The second method automatically places the root node on an edge that is average distance from all terminal nodes. The last method calculates a set of statistics that can be used to either automatically place the root node, or to provide a score for an alternative manual placement.
The most common methods, .root()
and .unroot
, are available from a ToyTree
object. These are also available in the toytree.mod
subpackage, where the other optional rooting functions are also located. Each is demonstrated with further explanation below.
Manually set the outgroup¶
The .root()
function requires manually designating an outgroup. Specifically, you are designating the node for which the edge above it will be bisected by the new treenode. The clade composed of the selected node and its descendants is designated the outgroup, and the clade of everything else on the other side of the root is the ingroup.
# root the tree using clade (r3,r4) as outgroup
new_tree = utree.root("r3", "r4")
# show the unrooted and newly rooted trees
toytree.mtree([utree, new_tree]).draw(ts='p');
Midpoint deviation¶
Rooting on the "midpoint" assumes a clock-like evolutionary rate (i.e., branch lengths are equal to time) and may yield odd results when this assumption is violated. This algorithm calculates the pairwise path length between all tips in an unrooted tree, and places the treenode on an edge representing the midpoint of the longest path.
# root the tree on the global midpoint and draw it
utree.mod.root_on_midpoint().draw(ts='p');
Minimal-ancestor-deviation¶
The minimal ancestor deviation (MAD) rooting method is intended to accommodate rate heterogeneity among edges of a tree when inferring the root state of an unrooted tree. It assumes that branch lengths are additive and that the true tree is ultrametric (i.e., tip height variation results from rate heterogeneity). This method finds the point on every edge that minimizes the deviations from all pairwise midpoint rooting positions. The optimal rooting position is on the edge with the lowest MAD score, but the user can also manually select a suboptimal edge and assess its relative score compared to alternative root placements (See Inferring the root below.)
# get a rooted tree with MAD scores stored as features
tree.mod.root_on_minimal_ancestor_deviation().draw(ts='p');
Root dist¶
When rooting a tree it is important not only to select the correct edge on which to place the treenode, but also the correct position on that edge. For example, the edge could be split at its midpoint, or closer to one child node than the other. The true rooting position is not known, and so this is a place where a model-based inference can be useful. One common assumption is that the tree should be as close to ultrametric as possible, and thus a position should be selected on the edge that best aligns the tip nodes. This is approach taken by the midpoint and minimal-ancestor-deviation methods. In addition, the user can set a position manually using the manual rooting method. If the root_dist
arg is left at its default=None setting in the root function then the edge midpoint is used.
# manual set rooting position 0.1 height units above clade (r3,r4)
utree.root("r3", "r4", root_dist=0.1).draw(ts='p');
# manual set rooting position 0.6 height units above clade (r3,r4)
utree.root("r3", "r4", root_dist=0.6).draw(ts='p');
Inferring the root¶
It is always best practice to include an outgroup in phylogenetic analyses, but in some cases the outgroup may be unknown or unavailable. In such cases it can be useful to apply methods for inferring the most likely root state based on edge length information. The best method for this currently implemented in toytree
is using the minimal ancestor deviation score.
The root_on_minimal_ancestor_deviation
function in toytree
calculates the MAD score and the root probabilities for each edge in the tree. By default these scores are stored as edge features on the returned tree. In addition, global score info can be returned by using return_stats=True
argument. This includes the minimal_ancestor_deviation
score for the rooting edge (lower is better); the root_ambiguity_index
(whether another edge is as good as the selected one. Lower is better); and the root_clock_coefficient_of_variation
(how variable rates are, i.e., how non-ultrametric the tree is.) This is demonstrated below for an example where the rooted tree is ultrametric, and a case where it is very much not.
MAD statistics¶
In this tree (shown in examples above) the values of each statistic are very low.
tree, stats = tree.mod.root_on_minimal_ancestor_deviation(return_stats=True)
stats
{'minimal_ancestor_deviation': 0.0, 'root_ambiguity_index': 0.0, 'root_clock_coefficient_of_variation': 1.922962686383564e-14}
Whereas in this example non-ultrametric tree the MAD, ambiguity, and clock variation are all very high.
# create a non-ultrametric tree and draw it
non_ultrametric_tree = toytree.rtree.rtree(10)
non_ultrametric_tree.draw(layout='d');
# calculate and return the global MAD stats
_, stats = non_ultrametric_tree.mod.root_on_minimal_ancestor_deviation(return_stats=True)
stats
{'minimal_ancestor_deviation': 0.19333523444378065, 'root_ambiguity_index': 0.9783776829860145, 'root_clock_coefficient_of_variation': 21.521660778220504}
Compare MAD rootings¶
One of the real strengths of the MAD approach is that it not only finds the best edge on which to root a tree, but it also reports scores for all alternative rootings, and how much better one is than another. This is returned for each edge on the tree as a "MAD" and "MAD_root_prob" score. For example, in the tree below, the MAD score for the correct root position is 0.0, indicating that the tree is perfectly ultrametric when rooted at this position. The MAD rooting function correctly infers that this is the best root position, and assigns it as the root. The MAD_root_prob for this edge is 0.20 (the same probability is assigned to nodes 5 and 7, since they share edge on which the root node is placed. As we saw above, the global `root_ambiguity_index' for this rooting was 0.0, meaning that the 0.201 probability for this placement is significantly better than the 0.15 root probability for the next highest scoring edge.
# get a rooted tree with MAD scores stored as features
mad_tree = tree.mod.root_on_minimal_ancestor_deviation()
mad_tree.get_node_data()
idx | name | height | dist | support | MAD | MAD_root_prob | |
---|---|---|---|---|---|---|---|
0 | 0 | r0 | 0.000000e+00 | 0.333333 | NaN | 0.500000 | 0.10066 |
1 | 1 | r1 | 0.000000e+00 | 0.333333 | NaN | 0.500000 | 0.10066 |
2 | 2 | r2 | 0.000000e+00 | 0.666667 | NaN | 0.258199 | 0.14934 |
3 | 3 | r3 | 2.220446e-16 | 0.666667 | NaN | 0.258199 | 0.14934 |
4 | 4 | r4 | 2.220446e-16 | 0.666667 | NaN | 0.258199 | 0.14934 |
5 | 5 | 3.333333e-01 | 0.333333 | NaN | 0.258199 | 0.14934 | |
6 | 6 | 6.666667e-01 | 0.333333 | NaN | 0.000000 | 0.20132 | |
7 | 7 | 6.666667e-01 | 0.333333 | NaN | 0.000000 | 0.20132 | |
8 | 8 | root | 1.000000e+00 | 0.000000 | NaN | NaN | NaN |
Finally, we could plot the MAD or MAD_root_prob scores on the edges of a tree easily, since they are stored as features to the returned rooted tree.
# plot and show the MAD rooting probability for other edges
c, a, m = mad_tree.draw('p', width=450);
mad_tree.annotate.add_edge_markers(a, "r3x1", size=14, color="lightgrey", mask=False, xshift=0)
mad_tree.annotate.add_edge_labels(a, "MAD_root_prob", mask=False, font_size=11);
Check root status¶
A method to check whether a tree is rooted based on resolution of the treenode. Note: this does not distinguish between a tree actually being rooted versus whether the treenode is a polytomy. This method simply returns a boolean for whether the root node has >2 children. It is nevertheless still quite useful.
# returns True if root node has >2 children
tree.is_rooted()
True
Unrooting¶
The unroot
function can be called to unroot a rooted tree. In an unrooted tree the treenode is always a polytomy. A rooted bifurcating tree has nnodes = (ntips * 2) - 1
, whereas an unrooted bifurcating tree has nnodes = (ntips * 2) - 2
. In other words, converting from a rooted to unrooted state removes one node (the former treenode) from the tree, and assigns an existing node as the new treenode.
# get an unrooted copy of the tree
tree.unroot()
<toytree.ToyTree at 0x7f21c8bab580>
Features/Data and Rooting¶
The processes of rooting, unrooting, or re-rooting trees should be reversible, meaning that the operations can be performed in any order without the loss of information about the topology, branch lengths, or any associated meta-data/features. This is the goal in toytree
and it is always achieved for the case of the topology and branch lengths, but requires some user knowledge when dealing with arbitrary additional data features assigned to the tree.
This is because data can be stored to a tree as either a feature of nodes, or of edges (see Node-vs-edge-features).
Some data stored to a tree are intended to represent information about the edges (splits) in a tree, rather than information about the nodes. This is important as these types of data must be treated differently when doing things like re-rooting a tree, and in some cases, for visualization.
(See the section on Information Loss for how other metadata in the tree can be affected, though.)
Support values¶
The way in which support values are displayed on trees is often a source of confusion. This is because support values are often plotted on the nodes of a tree, despite the fact that they are actually features of the edges of a tree. Thus, there are actually only
A support score typically represents confidence in a bipartition that separates one structure of the topology. However, support scores are often plotted on the nodes of a tree, which can lead to misinterpretation of their meaning, especially with regards to nodes near the root of a tree.
For this reason, it is actually incorrect to list a support score for both edges that descend from the treenode of a rooted tree.
ctre = toytree.tree("https://eaton-lab.org/data/Cyathophora.tre")
ctre.draw(ts='r', node_labels="support");
ctre.draw(
layout='unr',
node_labels="support", node_as_edge_data=True, node_sizes=16,
node_markers="r2x1", node_colors="lightgrey",
);
toytree.tree("((a[2],b[1])[3],c[100])[4];", feature_prefix=None,).draw('r', node_mask=False, node_labels="label");
toytree.tree("/home/deren/R/x86_64-pc-linux-gnu-library/4.2/phangorn/extdata/trees/RAxML_bipartitionsBranchLabels.AIs", feature_prefix=None,
).draw('r', node_labels="label", node_hover=True);
toytree.tree("/home/deren/R/x86_64-pc-linux-gnu-library/4.2/phangorn/extdata/trees/RAxML_bipartitions.3moles").draw('s', node_labels="support");
support_tree = tree.set_node_data("support", default=100)
support_tree.get_node_data()
# the proper
tree.set_node_data("support", default=100).draw(node_labels="support", node_sizes=20, node_as_edge_data=True);
# """: Example dataset with inner labels as edge data."""
# self.supp = toytree.tree("(a,b,((c,d)CD[&support=100],(e,f)EF[&support=80])X[&support=90])AB;")
# """: Tree w/ internal names and supports"""
# self.itree = toytree.rtree.imbtree(10, seed=123, treeheight=10)
# self.btree = toytree.rtree.baltree(10, seed=123, treeheight=10)
# self.utree = toytree.rtree.unittree(10, seed=123, treeheight=10)
# self.trees = [self.itree, self.btree, self.utree]
Information loss¶
Is information lost when a tree is unrooted and then re-rooted? The answer is usually no, but there are instances in which data can be lost.
tree.unroot().root('r2').unroot().root('r3', 'r4').draw('p');
Example from paper¶
This problem was well described in the "A critical review on the use of support values in tree viewers and bioinformatics toolkits" by Czech et al. (2017).
# unrooted tree from Czech et al...
czech = "((C,D)1,(A,(B,X)3)2,E)R;"
ctree = toytree.tree(czech, internal_labels="name")
# set data to label nodes and edges of the unrooted tree
colors = {'1': 'red', '2': 'green', '3': 'orange'}
ctree.set_node_data("ecolor", colors, default="black", inplace=True)
ctree.set_node_data("ncolor", colors, inplace=True);
# create a style dict
kwargs = {
'layout': 'd',
'use_edge_lengths': False,
'node_sizes': 10,
'node_labels': 'name',
'node_labels_style': {
'font-size': 20,
'baseline-shift': 10,
'-toyplot-anchor-shift': 10,
}}
# draw original unrooted tree
ctree.draw(node_colors="ncolor", edge_colors="ecolor", **kwargs);
When we root the tree on the edge above "X" this changes the orientation of several nodes on the tree, such that some which were parents of another before now appear as children of that node. This has the important consequence for how the edge between these nodes is interpreted...
For example, the yellow edge which previously represented information about the split separating (B,X) from every other node now instead represents the split between X and every other node. Similarly, the green edge which previously represented the split between (A,B,X) versus (C,D,E) now represents (B,X) versus (A,C,D,E). This is incorrect.
...
# root w/o indicating edge_features, error!
rtree = ctree.root("X")
rtree.draw(node_colors="ncolor", edge_colors="ecolor", **kwargs);
# re-root, treating 'ecolor' but not 'ncolor' as an edge feature.
rtree = ctree.root("X", edge_features=[])
rtree.draw(
node_colors=rtree.get_node_data('ncolor', missing='black'),
edge_colors=rtree.get_node_data('ecolor', missing='black'),
**kwargs,
);