Topology changes¶
These methods are intended to provide the user with the ability to modify a ToyTree
object pragmatically and efficiently without the need to directly modify TreeNode
objects. While manual editing of TreeNodes is possible, it is not reccommended due to the danger of creating a ToyTree
with invalid coordinate structure. These functions ensure that the ToyTree
objects remain intact after topology changes such that when used in combination, trees can be safely modified to fit the structure required for any vizualization or algorithmic analysis while remaining compatible with the rest of the Toytree
package's methods.
The foundation of topology modification consists of adding, removing, or changing the relationships among TreeNodes. There also exists methods to work on the subtree level to make separation particularly efficient, as well as methods to generally restructure trees for informative visual clarity.
These methods are organized into three categories and contain the following functions:
Node-level modification
- add_internal_node()
- add_child_node()
- add_sister_node()
- add_internal_node_and_child()
- add_internal_node_and_subtree()
- remove_nodes()
- remove_unary_nodes()
- collapse_nodes()
- rotate_node()
Subtree-level modification
- prune()
- bisect()
Tree-level modification
- resolve_polytomies()
- ladderize()
import toytree
We will use the following style dict throughout this notebook.
# style to show nodes w/ name labels
style = dict(
node_mask = False,
node_sizes = 16,
node_markers = "s",
node_style = {"fill": "black"},
node_labels = "name",
node_labels_style= {"fill": "white", "font-size": 12}
)
Node-level modification¶
Adding nodes¶
The mod
subpackage includes many methods to add nodes to a ToyTree
object. A Node
can be added as an internal node, child node, or sister node. You can also add nodes as a parent-child pair or as entire subtrees to be merged into a ToyTree
object.
add_internal_node
introduces a new Node
object to the tree by splitting the edge above a queried Node. It creates a new unary Node along this edge, with a name passed in as an argument name=
.
# starting tree
tree = toytree.tree("(B,(A,X)AX)AB;")
# add internal node "C" ABOVE node "A"
tree2 = tree.mod.add_internal_node("A", name="C", dist=None)
# draw both trees
toytree.mtree([tree, tree2]).draw(**style);
Here, the dist=
argument determines how much of the original child-parent distance between the queried Node
and its parent will belong to the new child-parent distance created with add_internal_node()
(parent being the newly introduced Node
). A double $0<x<1$ is passed in which represents the proportion of the original distance being transfered to the new distance. See below how the graph above changes as we manipulate the dist=
argument.
# assign 75% of the original distance to the new tip-parent distance
tree3 = tree.mod.add_internal_node("A", name="C", dist=0.75)
# assign 25% of the original distance to the new tip-parent distance
tree4 = tree.mod.add_internal_node("A", name="C", dist=0.25)
toytree.mtree([tree3, tree4]).draw(**style);
add_child_node
works similarly to, and is a great function to pair with add_internal_node
. This inserts a Node
as a child
to a queried Node within a tree such that, for example, a newly inserted internal node would no longer be unary. This can also create a polyploidy if the function is called on a binary Node
.
# add internal node named "C" ABOVE node "A"
tree2 = tree.mod.add_internal_node("A", name="C", dist=None)
# add child node to new internal node "C" to make binary
tree3 = tree2.mod.add_child_node("C", name="D", dist=None)
toytree.mtree([tree2, tree3]).draw(**style);
# add another child node to "C", creating a polytomy
polytree = tree3.mod.add_child_node("C", name="Y")
toytree.mtree([tree3, polytree]).draw(**style);
add_sister_node
can also be used as a method to use in conjunction to add_internal_node
, the only difference being which node is queried. This function inserts a Node
as the sister of a queried Node. In other words, it adds a child Node to the parent of the queried Node. Similarly, this can also either fix a unary node to become binary, or case a binary node to become a polytomy.
# add internal node named "C" ABOVE node "A"
tree2 = tree.mod.add_internal_node("A", name="C", dist=None)
# add sister node to node "A" to make "C" binary
tree3 = tree2.mod.add_child_node("C", name="D", dist=None)
toytree.mtree([tree2, tree3]).draw(**style);
# add another sister node to "A", creating a polyploidy
polyploid = tree3.mod.add_sister_node("A", name="Y")
polyploid.draw(**style);
add_internal_node_and_child
is a method to combine these previous steps into one command. When used, a parent-child pair is passed into the tree, splitting the edge above the queried Node. The internal Node and new child Node must both be defined with parent_name=
and name=
respectively. If no value is entered for parent_dist
, then the parent Node is inserted at the midpoint of the edge. If a parent_dist
value is defined, then it must fit within the length of the query Node's dist. The new child Node does not share these constraints. If no value is entered for dist
, then it will be set to match the dist of its sister Node.
# add internal node named "C" with child node named "D" ABOVE node "A"
tree2 = tree.mod.add_internal_node_and_child("A", name="D", parent_name="C")
toytree.mtree([tree, tree2]).draw(**style);
add_internal_node_and_subtree
takes the previous function one step further by allowing the user to pass in a subtree. Similar to the other functions, this splits the edge between the queried node and its parent, but this time splits it with a new ancestral node to which a subtree (passed in as a ToyTree
object) is connected. The name of the ancestral Node is passed in to the parent_name=
argument, and both the distance of the parent and the subtree stem can be set with parent_dist=
and subtree_stem_dist=
respectively. By default, these are set at $0.5$. You can also choose to rescale the subtree such that it fits in the distance between the sister Node height and stem height.
# two small trees
tree = toytree.tree("(A,(B,C));")
sub = toytree.tree("(X,(Y,Z));")
# add subtree "sub" to original tree above node C
merged = tree.mod.add_internal_node_and_subtree(
"C", subtree=sub, parent_name="D", subtree_rescale=True)
# draw subtrees and new merged tree
toytree.mtree([tree, sub, merged]).draw();
Removing nodes¶
remove_nodes
simply deletes the nodes that are queried. By default, the orphans created by deleting internal nodes (perhaps this metaphor has gone too far) inheret their deleted parents' distances such that their distances reaches their grandparent's original height. The user can alternatively pass in preserve_dists=False
to have children retain their original distances (while still being connected to their grandparents).
tree = toytree.tree("(a,b,((c,d)CD,(e,f)EF)X)AB;")
mod_tree = tree.mod.remove_nodes("b", "c", "EF")
toytree.mtree([tree, mod_tree]).draw(ts='c', scale_bar=True, layout='r');
Removing unary nodes¶
The .mod
subpackage also offers remove_unary_nodes()
, a method to quickly remove all unary Nodes
, or Nodes that have exactly 1 child. This way, the ToyTree object returned will ony contain tips and internal Nodes with $\geq 2$ children. This method does not take in any arguments other than the ToyTree object (unless called using tree.mod.remove_unary_nodes()
) and inplace=
to determine whether or not to modify the original tree or make a copy.
tree = toytree.tree("(A,(B,C)X)Y;")
tree = (
tree
.mod.add_child_node("C", name="E")
.mod.add_child_node("B", name="F")
.mod.add_child_node("A", name="G")
)
simplified = tree.mod.remove_unary_nodes().mod.rotate_node("Y")
toytree.mtree([tree, simplified]).draw(**style);
Collapsing nodes¶
mod.collapse_nodes()
can be called on an internal node to collapse it into a multi-furcating polytomy. This can either be done by passing in particular Node labels, or by providing a minimum distance min_dist
or minimum support value min_support
. These represent the minimum value allowed for the Node to stay. That is - every internal Node with value less than the min value provided will be collapsed.
# modifying tree with previous methods, adding complexities
tree = (
toytree.tree("(A,(B,(C,D)X)Y)Z;")
.mod.add_child_node("C", name="E")
.mod.add_child_node("B", name="F")
.mod.add_internal_node_and_child("A", name="G", parent_name="H")
)
# collapsing specific nodes by name - collapse X and H
collapsed1 = tree.mod.collapse_nodes("X", "H")
toytree.mtree([tree, collapsed1]).draw(**style);
tree = toytree.tree("(A,((B,E)H,((C,G)J,(D,F)K)X)Y)Z;")
# setting distances to be different values
tree.set_node_data("dist", {7: 3, 8: 4, 9: 5}, inplace=True)
# keep only internal nodes with parental edge length >1.5
# this will only get rid of a few internal nodes with particularly branch lengths
collapsed3 = tree.mod.collapse_nodes(min_dist=1.5)
# keep only internal nodes with parental edge length > 5, collapse the rest.
# in this case, K is the only internal node with edge length >5
collapsed4 = tree.mod.collapse_nodes(min_dist = 5)
toytree.mtree([tree, collapsed3, collapsed4]).draw(ts='c', **style);
Rotating nodes¶
mod.rotate_node()
rotates a particular node such that the order of its children are reversed. A node can be queried using either the index or the name, and internal nodes can be accessed by passing in multiple nodes, which will rotate the node representing their MRCA
.
By default, this returns a modified copy of the tree passed in without chainging the originial tree, however inplace=True
will change and return the original tree passed in.
# simple tree from newick string
tree = toytree.tree("(Alligator,(Bunny,(Cat,Dog)X)Y)Z;")
# rotate tree at node idx=4 (first internal node)
tree2 = toytree.mod.rotate_node(tree, 4)
# look for the difference in order between Cat and Dog
toytree.mtree([tree, tree2]).draw(ts='s', tip_labels_colors="name");
Note: This command udates the indices after the rotation, so notice that the "name" features of the tips are rotated, but the indices are updated to be numbered in visual order up the tips and down the internal Nodes
Multiple calls of rotate_node
can be chained to efficiently change specific formatting.
# more complex newick string
tree = toytree.tree("(a,((b,c)BC,(d,(e,f))DE)X)AB;")
# multiple calls chained together, accessing internal nodes by MRCA of two tips
rotated = tree.mod.rotate_node('c', 'd').mod.rotate_node('f','d')
toytree.mtree([tree, rotated]).draw(ts='s');
Subtree-level manipulation¶
Pruning¶
The prune()
method returns a tree with a subset of queried Nodes along with the minimal spanning edges required to connect the Nodes. Nodes can be queried as individual arguments or as a set of indices, e.g. prune([0,1,2])
When called on a rooted tree, the user can require the originial root to be retained in the pruned tree using require_root=True
. By default, this is False
and the lowest MRCA connecting the queried Nodes will instead be kept as the new root.
When internal Nodes are discarded by prune()
, their distances will be merged into the distance of the queried Node such that the original distance between the root
and the queried Node
remains the same. If preserve_dists=False
, then only the original distances assigned to the queried Nodes are retained.
# rooted tree with 3 tips, all dists =1 except 'c' dist =2
tree = toytree.tree("((a,b)d,c:2)e;")
# modify original tree to only keep nodes 'a' , 'b' , and MRCA
pruned = tree.mod.prune('a','b')
# draw both trees
toytree.mtree([tree, pruned]).draw(ts='s', scale_bar=True);
Retaining original root:
# include original tree's root 'e'
pruned = tree.mod.prune('a','b', require_root=True)
pruned.draw(ts='s', scale_bar=True);
Using the same example tree, these next two figures will show the difference in preserving all distance values (preserve_dists=True
) or only those which belong to the queried Nodes themselves (preserve_dists=False
). By default, when 'a' is queried for pruning, it will inheret the dist of both itself (+1) and that of the intermediate Node 'd' (+1)required to traverse to its MRCA with the other queried Node 'c'. However in when preserve_dists=False
, it only retains its original dist of 1.
# Default behavior
pruned_preserved = tree.mod.prune('a', 'c', preserve_dists=True)
# NOT retaining distances from all intermediate nodes/edges
pruned_not_preserved = tree.mod.prune('a','c', preserve_dists=False)
# rotate at 'e' for easier visual comparison to original tree
pruned_preserved.mod.rotate_node('e', inplace=True)
pruned_not_preserved.mod.rotate_node('e', inplace=True)
# draw and label tres
mtree = toytree.mtree([tree, pruned_preserved, pruned_not_preserved])
c, axes, marks = mtree.draw(ts='c', scale_bar=True);
axes[0].label.text = "original tree"
axes[1].label.text = "preserved dist"
axes[2].label.text = "not preserved dist"
Bisecting¶
The bisect()
method returns a tree bisected into two subtrees on a selected edge. This edge is given by the edge above a queried Node.
When used on a rooted tree
, querying the TreeNode
returns a subtree for each child as its own TreeNode
with its original distance value. When used on an unrooted tree
, the TreeNode
cannot be queried.
When any other Node is queried, it will split the edge above it to create two subtrees with the queried Node bing a TreeNode
of one new subtree. Including the argument reeroot=True
will cause the Node above the query to become the new TreeNode in its subtree, otherwise the original TreeNode will be retained.
The subtree below the query will inherit the entire distance of the split edge, but the dist_partition=
argument can designate a proportion of the distance $(0.0 \leq x \leq 1.0)$ to assign to the below subtree.
tree = toytree.tree("((A,B)E,(C,D)F)G;").root('G')
sub1, sub2 = tree.mod.bisect('G')
toytree.mtree([tree, sub1, sub2]).draw(ts='c', scale_bar=True);
Tree-level modification¶
Resolving polytomies¶
resolve_polytomies()
chooses one or more polytomies to resolve either partially or completeley. This algorithm resolves a polytomy stemming from a queried Node
with $n$ children by choosing a random child
to represent the left child, and creating a New Node
to represent the right child. This New Node has assignable distance dist=
and support value support=
. The remaining originaof the queried Node are then connected to the New Node, which will then have $(n-1)$ children.
By default, resolve_polytomoies()
completely resolves polytomies with $n>3$ children by recursively running the algorithm on the New Node with $(n-1)$ children until $n=2$ (the New Node is binary). However, using recursive=False
resolves the polytomy partially, returning a tree with a mutlifurcation of $n-1$ children connected to the New Node.
The user can query Nodes by name, index, or Node
object. The queried Nodes are the parents of polytomies. For reproducable random resolutions, a numpy seed can be included with seed=
.
# multifurcation of n=5
tree = toytree.tree("((a,b,c,d,e),f);")
# non recursive, so one tip kept at random and the other 4 become children of New Node
tree1 = tree.mod.resolve_polytomies(dist=1, recursive=False)
# competely resolves the rest of the polytomies such that all internal nodes are binary
tree2 = tree.mod.resolve_polytomies(dist=1)
# draw trees
toytree.mtree([tree, tree1, tree2]).draw(ts='c');
Ladderize¶
ladderize()
formats a tree such that a Node's right/lower child always has more descendants than its left/upper child. The user can also choose to order it in reverse order such that the left/upper child has more descendants by using the argument direction=True
.
# generate random tree with 12 tips
tree = toytree.rtree.bdtree(ntips=12, seed=123)
tree1 = tree.ladderize(direction=0)
tree2 = tree.ladderize(direction=1)
toytree.mtree([tree, tree1, tree2]).draw(ts='c');