toytree.Node¶
The toytree.Node
class is primarily used for data storage. Minimally, it contains attributes storing a .name
, .dist
(edge length), and .support
values, as well as attributes .up
and .children
which point to other Node
objects to represent connections between them.
A single Node
instance is generally of little use, it is only when nodes form connections that they have emergent properties in the form a network/tree structure. Thus, most methods in the toytree
library are associated with ToyTree
objects which are a container around a collection of Nodes
. However, Node
objects themselves are important to understand as the underlying object storing data within trees. This section describes the structure of Node
objects and the design behind their intended use.
import toytree
# create an example tree
tree = toytree.rtree.rtree(ntips=8, seed=321)
tree.draw('c');
The Node class¶
The Node
class is accessible from toytree.Node
and can be used to create new instances or to check or validate the type of a Node
instance. Unless you are a developer you are not likely to create new Node
objects often, but instead will most often interact them by selecting them from within ToyTrees
.
# create a new Node
single_node = toytree.Node(name="single")
single_node
<Node(name='single')>
# select a Node from a ToyTree
node3 = tree[3]
node3
<Node(idx=3, name='r3')>
# check that an object's type is a Node
isinstance(node3, toytree.Node)
True
Attributes¶
name: str¶
The default name
attribute is an empty string. Leaf nodes usually have names associated with them whereas internal nodes usually do not. This will depend on the data that a tree is parsed or constructed from, and whether additional names are added. Some characters are not allowed in node names ([:;(),\[\]\t\n\r=]
) as they would interfere with Newick string parsing when written to a file. Names can be accessed from a Node
's .name
attribute, and can be used to query nodes from a ToyTree
.
# a name can be accessed from a Node
single_node.name
'single'
# a name can be accessed from a Node in a ToyTree
tree[3].name
'r3'
# returns .name from Nodes in the order they will be plotted (idxorder)
tree.get_tip_labels()
['r0', 'r1', 'r2', 'r3', 'r4', 'r5', 'r6', 'r7']
idx: int¶
The default idx
attribute is an int value of -1, which means that the node is not part of a ToyTree
. If a node is in a ToyTree
then it will be assigned a unique idx integer between 0 and nnodes-1. The leaf nodes in a tree have idx values between 0 and ntips - 1, and all internal nodes are labeled by increasing numbers in a post-order left-then-right traversal. This is termed an idxorder traversal. When a tree structure changes (e.g. during re-rooting) the idx values of nodes are updated and can change (see Traversal). A node's idx value can be checked from its .idx
attribute, or if it is in a ToyTree
then by calling .get_node_data()
or plotting the tree to visualize idx values.
# a Node that is not part of a ToyTree has idx=-1
single_node.idx
-1
# Nodes in a ToyTree have unique idx values between 0 and nnodes - 1
node3.idx
3
dist: float¶
The default dist
attribute is a float of 0. The value represents the distance from a node to its parent. In other words, it is the length of an edge connecting them. The dist attribute is thus not actually a feature of a node, but of an edge between nodes, but is nevertheless stored to a Node
object. We call this an edge_feature
of a Node
, since it will change if the tree is re-rooted, changing which Node is parent to another. The value of a dist can range from very small to very large values, such as when representing the expected number of substitutions per site on a phylogeny, or divergence times in millions of years.
# default Node dist
single_node.dist
0.0
# the dist from node 3 to its parent
node3.dist
1.0
support: float¶
The default support
value is numpy.nan
, which represents the absence of support information. Tip (leaf) nodes are not expected to have support information, since they do not represent a split in a tree. Similarly, the root node support is nan
since it does not represent a true split.
# the default support value
single_node.support
nan
up: Node¶
The .up
attribute references a node's parent. The default value is None
. This is also the value of the .up
attribute of the root Node
in a ToyTree
, since it has no parent. A Node
can only have one parent. If a tree is re-rooted the relationship between nodes can change such that a Node
that was previously a child can become a parent, and thus the Node
attributes are automatically updated during this process.
# the default .up is None (no value is returned here)
single_node.up
# node3's parent is Node 10
node3.up
<Node(idx=10)>
# the parent of node3's parent is Node 11
node3.up.up
<Node(idx=13)>
children: tuple¶
The .children
attribute is a tuple of zero or more Node
objects that are descended from a node. The default is an empty tuple. If a tree is re-rooted the relationship between nodes can change such that a Node
that was previously a child can become a parent, and thus the Node
attributes are automatically updated during this process.
# this single Node has no children
single_node.children
()
# internal Node 8 in the tree has two children
tree[8].children
(<Node(idx=0, name='r0')>, <Node(idx=1, name='r1')>)
height: float¶
The default height
value is a float of 0. The height of a Node
is an emergent property of a tree of connected nodes. It is the height above the node that is the farthest distance from the root. This value is automatically updated for every node in a ToyTree
when a tree is modified during the cached traversal.
# single node has not height
single_node.height
0.0
# leaf node 3 height
node3.height
1.0
# internal node 8 height
tree[8].height
2.0
Methods¶
The Node
object provides a number of functions for fetching information about a node's position relative to other connected nodes. Some of this information is also accessible from a ToyTree
object, but is sometimes easier to access it from a Node object directly.
node3.is_leaf()
True
node3.is_root()
False
node3.get_ancestors()
(<Node(idx=10)>, <Node(idx=13)>, <Node(idx=14)>)
node3.get_descendants()
(<Node(idx=3, name='r3')>,)
node3.get_leaves()
[<Node(idx=3, name='r3')>]
node3.get_sisters()
(<Node(idx=4, name='r4')>,)
node3.get_leaf_names()
['r3']
Each of the get_[x]
functions above is also available as a generator function named iter_[x]
, which is more efficient for fetching such data over very large trees, or for terminating a traversal over part of the tree once a condition has been met. The traverse()
function is also a generator function.
node3.iter_ancestors()
<generator object Node.iter_ancestors at 0x7f3ba6ca9bd0>
node3.traverse("idxorder")
<generator object Node._traverse_idxorder at 0x7f3ba6ca9d20>
Nodes vs 'Edges'¶
Notably, toytree
does not implement a separate "Edge" class to represent edges in a tree. Instead, edges are simply represented by the connections between Node
objects -- by their .up
and .children
attributes. (This can be important when storing new data types to a tree; see Edge features). Thus you can think of edges as pairs of nodes. You can fetch the edge information from a ToyTree
in a variety of ways. Below we use the function get_edges
which has options for returning this information in a number of tabular formats.
# edges are simply pairs of Nodes with a child,parent relationship
tree.get_edges(feature='idx', df=True)
child | parent | |
---|---|---|
0 | 0 | 8 |
1 | 1 | 8 |
2 | 2 | 9 |
3 | 3 | 10 |
4 | 4 | 10 |
5 | 5 | 11 |
6 | 6 | 11 |
7 | 7 | 12 |
8 | 8 | 9 |
9 | 9 | 14 |
10 | 10 | 13 |
11 | 11 | 12 |
12 | 12 | 13 |
13 | 13 | 14 |
Mutability of Nodes¶
The data assigned to nodes may represent a feature of the node itself, or it may represent a feature of the edge connecting that node to its parent. In the latter case, it is important that the data be treated appropriately if the tree is modified, such as when a node is pruned from the tree, or the tree is re-rooted. In these cases, the edge features, such as the .dist
, .support
, and the connection information .up
and .children
, need to be automatically updated. Similarly, emergent properties of nodes in a tree, such as the .height
of a node relative to the farthest leaf must be re-computed.
The automatic updating of these attributes is done at the level of a ToyTree
, not within individual Nodes
, and thus we have intentionally designed these elements of Node
objects to be immutable (you cannot modify them directly). Thus, users cannot call node.idx = 3
or node.height = 100
to set these atrributes to a new value, since these attributes are properties of the node's placement with respect to other nodes in the tree, which need to also be updated. If you try to set one of these values a ToyTreeError
exception will be raised like in the example below where we catch the exception and print it. For developers there is a simple workaround for this described further below.
# catch 'ToyTreeError' exception raised when trying to modify a Node attribute
try:
single_node.idx = 10
except toytree.utils.ToytreeError as exc:
print("ToyTreeError:", exc)
ToyTreeError: Cannot set .idx attribute of a Node. If you are an advanced user then you can do so by setting ._idx. See the docs section on Modifying Nodes and Tree Topology.
Calling mod functions¶
Instead of modifying a node's attributes directly you should instead call one of the tree modification functions from the toytree.mod
subpackage that will ensure that the rest of the tree data is automatically updated along with the modified node data. Examples include the .root
, .drop_tips
, prune
, ladderize
, rotate_nodes
, edges_set_node_heights
, and many others which modify one or more .up
, .children
, .idx
, .dist
, or .height
attributes of nodes in unison.
# an example toytree.mod function that modifies node attributes
rtree = tree.mod.root("r4")
# the new tree has different idx values b/c the traversal order changed
toytree.mtree([tree, rtree]).draw(ts='p');
Developing mod functions¶
Sometimes, however, you may really want to directly modify one or more core features of a Node
, in which case it is possible, we just want to make sure that you are well aware of the necessary considerations to avoid errors in your code. You can examine the source code of the many .mod
subpackage functions above for examples. Each of these core attributes is available as a private attribute (e.g., ._dist
, ._idx
) which can be modified without raising an exception. The key, however, is that after one or more private node attributes have been modified, the ToyTree
traversal caching function named ._update()
must be called at the end to ensure that all of the linked attributes of nodes are updated.
# create a new tree copy
modtree = tree.copy()
# modify one or more private node attributes
modtree[0]._dist += 2
modtree[1]._dist += 3
# call update to update idxs, heights, etc.
modtree._update()
# show the old and new tree with longer .dists for nodes 0,1 and .heights for all nodes
toytree.mtree([tree, modtree]).draw(ts='p', scale_bar=True);
Building trees from Nodes¶
There are several ways of constructing trees in toytree
from scratch. This most simple is to use one of the random tree generation functions from the toytree.rtree
subpackage. A second method is to write a Newick string and parse it using the toytree.tree
function. A third is to build or modify a tree using one or more functions from toytree.mod
such as .add_child_node
. And finally, the fourth method is to link together Node
objects manually. The last is the most low-level method, which requires eventually calling ToyTree._update()
to cache the traversal order and store idx values. Each of these is demonstrated below.
- Generate random or fixed trees. See the
rtree
documentation section for more details. This includes options to generate trees under a variety of algorithms and of different sizes.
# generate a 6-tip balanced tree with crown height of 1M units
toytree.rtree.baltree(6, treeheight=1e6).draw(scale_bar=True);
- Parse a Newick string to generate a tree from scratch with desired characteristics.
# generate a ToyTree with this specific data
toytree.tree("(((a:3,b:2):1),(c:3,d:2):5);").draw(scale_bar=True);
- Modify a tree using one or more
toytree.mod
functions:
# get a 4-tip balanced tree
tree4 = toytree.rtree.baltree(4)
# add a new sister (internal and tip node) to tip node 'r1'
modtree4 = toytree.mod.add_internal_node_and_child(tree4, 'r1', name="child", parent_name="parent")
# draw to highlight new parent and child nodes
modtree4.draw('r', node_mask=modtree4.get_node_mask(5), node_colors="lightgrey");
- Create connections among
Node
objects and create aToyTree
from them. You can do this by setting._up
,._children
, and._dist
values on a set of nodes.
# create several tips nodes
nodeA = toytree.Node("A", dist=1)
nodeB = toytree.Node("B", dist=1)
nodeC = toytree.Node("C", dist=1)
# create several internal Nodes
nodeAB = toytree.Node("AB", dist=1)
nodeABC = toytree.Node("ABC", dist=1)
# connect the nodes
nodeA._up = nodeAB
nodeB._up = nodeAB
nodeC._up = nodeABC
nodeAB._up = nodeABC
nodeAB._children = (nodeA, nodeB)
nodeABC._children = (nodeAB, nodeC)
# draw the tree (the tree traversal data is cached at this step)
toytree.tree(nodeABC).draw(ts='r', node_colors="lightgrey");
Similarly, this process could be applied to an existing tree to add or remove connections by changing the same types of node attributes. The important thing is that the ToyTree._update()
function is called at the end to update values across connected nodes. The Node
object includes convenience functions _add_child
and _remove_child
which change the ._up
and ._children
attributes together, but setting them manually may be more clear.
# get a 4-tip balanced tree
tree4 = toytree.rtree.baltree(4, treeheight=2)
# add a new sister (internal and tip node) to tip node 0
tree4[0]._add_child(toytree.Node("child0", dist=1))
tree4[0]._add_child(toytree.Node("child1", dist=1))
# connects node data across three
tree4._update()
# draw to highlight new nodes. Note former node (idx=0, name='r0') is now node idx=5
tree4.draw('r', node_mask=tree4.get_node_mask(5), node_colors="lightgrey");