Quick Guide¶
This tutorial introduces several key concepts and applications of the toytree
package to provide a general overview. Please follow links throughout this guide, and explore the broader documentation, to find more detailed instructions of each topic.
The toytree
package¶
toytree
is a Python library for working with tree data. It provides a custom class for storing and representing tree data alongside an extensive library of methods for performing numerical and evolutionary analyses on trees. In addition, toytree
provides interactive features that make it particularly well suited for use inside jupyter notebooks, such as interactive tree plotting methods and a well organized, modular, and documented code base.
This tutorial was created as a jupyter notebook, and you can follow along in a notebook of your own by executing code from the top to the bottom of the document. To begin, let's import the toytree
package.
import toytree
toytree.__version__
'3.0.dev9'
A simple example¶
The code block below contains three lines of code to parse a tree object from data, modify it, and generate a tree drawing, respectively. This simple operation demonstrates several key features of toytree
functionality and design. The first line uses the method toytree.tree
(see Tree Parsing (i/o)) to parse tree data (in this case from a public URL) into a ToyTree
class object, and store it to the variable name tree.
The next line of code calls a method of the ToyTree
object to root the tree on a specified edge. The ToyTree
class object is the main class for representing trees in toytree
and has many methods associated with it. In this case, we enter "~prz" as an argument to .root()
, which is interpreted as a regular expression to match any names on the tree containing "prz" (See Node/Name Query), as a convenient method to select an internal edge on which to root the tree.
Finally, the last code line calls the .draw()
method of the ToyTree
to generate a tree drawing (See Tree Drawing). The drawing is automatically displayed in the notebook output cell, and can be optionally stored as a variable and further modified, or saved to disk in a variety of formats. We provide arguments to align the tip names and show markers at the nodes with interactive tooltip information (try it by hovering your cursor over the nodes).
# load a toytree from a newick string at a URL
utree = toytree.tree("https://eaton-lab.org/data/Cyathophora.tre")
# re-root on internal edge selected using a regex string
rtree = utree.root("~prz")
# draw the rooted tree
rtree.draw(node_hover=True, node_sizes=8, tip_labels_align=True);
Parsing tree data¶
ToyTree
objects can be flexibly loaded from a range of input types using the toytree.tree()
function. Tree data can be loaded from a string, file path, or public URL. The data can be formatted as Newick, NHX, or NEXUS format. Complex metadata can be parsed from extended Newick strings, and/or added to trees and saved to extended Newick formatted strings. Here we parse a newick string into a ToyTree
object and call the get_node_data()
function of the tree, which returns a summary table of the data in the tree. Viewing this table we can see how it has assigned the "name", "dist", and "support" values from the Newick data to the Nodes of the tree.
# newick str with edge-lengths & support values
newick = "((a:1,b:1)90:3,(c:3,(d:1,e:1)100:2)100:1);"
# load as ToyTree
tree = toytree.tree(newick)
# show tree data parsed from Newick str
tree.get_node_data()
idx | name | height | dist | support | |
---|---|---|---|---|---|
0 | 0 | a | 0.0 | 1.0 | NaN |
1 | 1 | b | 0.0 | 1.0 | NaN |
2 | 2 | c | 0.0 | 3.0 | NaN |
3 | 3 | d | 0.0 | 1.0 | NaN |
4 | 4 | e | 0.0 | 1.0 | NaN |
5 | 5 | 1.0 | 3.0 | 90.0 | |
6 | 6 | 1.0 | 2.0 | 100.0 | |
7 | 7 | 3.0 | 1.0 | 100.0 | |
8 | 8 | 4.0 | 0.0 | NaN |
The data that make up a tree can be easily accessed in a number of ways, including during tree visualization. Here a number of options are provided to the .draw()
function to style the drawing. The "node_labels" option is set to "idx", which matches one of the features in the tree data, and so the data (numeric labels) are mapped to the appropriate nodes in the tree. Similarly, the "edge_colors" option is set to "dist" which automatically applies a colormapping to values of the "dist" data in the tree. This approach of assigning data to Nodes of the tree and then mapping these values to a drawing is more secure than trying to correctly order and enter values as a list. See Data/Features for more details.
# tree drawing showing Node idx labels and edges colored by dist
tree.draw(
layout='d',
node_labels="idx", node_sizes=15, node_mask=False, node_colors="lightgrey",
edge_colors="dist", edge_widths=3,
tip_labels_style={"font-size": 20, "anchor-shift": 20},
scale_bar=True,
);
Class objects¶
The main Class objects in toytree
are structured as a nested hierarchy. At the lowest level are toytree.Node
class objects. A collection of connected Nodes
form the data for a toytree.ToyTree
class object. This is the primary class in toytree
. In addition, a collection of trees can form the data of a higher-level class object, called a toytree.MultiTree
. Each object type contains attributes and methods that are designed for its specific place in the hierarchy. For example, nodes contain attributes and methods for extracting information about a specific Node
and its connections (edges) to other nodes. A ToyTree
contains attributes and methods for extracting information about an entire tree of connected nodes, including emergent properties of these collections, and methods for operating on them. Finally, a MultiTree
contains attributes and methods for operating on collections of trees, such as consensus tree inference and methods for drawing multiple trees together.
Node: See the Node class documentation. Users primarily interact with Node objects by selecting them from a tree through indexing, slicing, or traversal methods. Nodes are used to store the basic data that makes up a tree, such as the connections between nodes and their distances. In addition, nodes can be used to store any additional arbitrary data. Each Node
in a ToyTree
has a unique ID assigned to it referred to as its idx
label, and which represents its order in a tip-to-root traversal of the tree (see Traversal and Node selection).
# you can create a Node object on its own
toytree.Node(name="X")
<Node(name='X')>
# but more often you will select Nodes from a tree by slicing or indexing
tree[0]
<Node(idx=0, name='a')>
# a Node's parent is at .up
tree[0].up
<Node(idx=5)>
# a Node's children are at .children
tree[5].children
(<Node(idx=0, name='a')>, <Node(idx=1, name='b')>)
# select one or more Nodes from a ToyTree by name
tree.get_nodes("a", "c")
[<Node(idx=0, name='a')>, <Node(idx=2, name='c')>]
# access data from attributes of a Node in a ToyTree
tree[0].idx, tree[0].name, tree[0].dist
(0, 'a', 1.0)
ToyTree: See the ToyTree class documentation. A ToyTree
stores a cached representation of the connections among a set of Node
objects in memory and contains numerous methods for operating on these data. As we saw above, ToyTree
class objects are usually created by parsing tree data from newick strings. However, a ToyTree
can also be created by passing a Node
object as the root of a new tree, as shown below. This makes clear that a ToyTree
is a container for Node
objects. They also contain functions for modifying the connections and data of nodes such as rooting trees, pruning, grafting, modifying edge lengths or support values, and storing new data, to name a few. If an operation changes the tree structure the ToyTree
will automatically store a new cached representation of the tree traversal, assigning new idx
labels to nodes. This allows for very fast retrieval of information from nodes, and to store cached tree attributes that are emergent properties of the collection of tree, such as node heights, and the number of tips and nodes.
# create a tree from a Node object to serve as its root Node
toytree.ToyTree(toytree.Node("root"))
<toytree.ToyTree at 0x7fd8ef38a5c0>
# parse a tree from newick data
toytree.tree("((a,b),c);")
<toytree.ToyTree at 0x7fd8ef389a50>
tree.ntips
5
tree.nnodes
9
tree[5].height
1.0
# all nodes in the cached idx order (tips first then postorder traversal)
tree[:]
[<Node(idx=0, name='a')>, <Node(idx=1, name='b')>, <Node(idx=2, name='c')>, <Node(idx=3, name='d')>, <Node(idx=4, name='e')>, <Node(idx=5)>, <Node(idx=6)>, <Node(idx=7)>, <Node(idx=8)>]
# or, use .traverse() to visit Nodes in other traversal orders
list(tree.traverse("postorder"))
[<Node(idx=0, name='a')>, <Node(idx=1, name='b')>, <Node(idx=5)>, <Node(idx=2, name='c')>, <Node(idx=3, name='d')>, <Node(idx=4, name='e')>, <Node(idx=6)>, <Node(idx=7)>, <Node(idx=8)>]
MultiTree: See the MultiTree documentation. A MultiTree
object is a container type for multiple ToyTree
objects. It has a number of attributes and methods specific to operating on and visualizing sets of trees. The toytree.mtree()
function can be used to parse multiple input types similar to the toytree.tree()
method for parsing trees from data for individual trees. Also, toytree.MultiTree()
can accept a collection of ToyTree
objects as input, demonstrating that multitrees are collections of trees.
# create a MultiTree containing three copies of 'tree' rooted differently
mtree = toytree.mtree([tree, tree.root('c'), tree.root('d', 'e')])
mtree
<toytree.MultiTree ntrees=3>
# select individual ToyTrees by indexing or slicing
mtree[0]
<toytree.ToyTree at 0x7fd8ef3e4c10>
# visualization methods for multiple trees. Takes similar arguments as ToyTree.draw()
mtree.draw(tip_labels_style={"font-size": 16});
Learning to use toytree
¶
When first learning toytree
it is hugely beneficial to play around in an interactive environment such as a jupyter notebook or IDE that provides tab-completion/auto-complete features that make it possible to see all available methods and attributes of an object. In this way, you can easily explore the many possibilities associated with a ToyTree
object without having to study the entire documentation. To try this feature in a notebook type the name of a ToyTree
variable (e.g., tree below) followed by a dot and then press tab. A few of the many methods that will pop-up are shown below.
tree.copy()
<toytree.ToyTree at 0x7fd8ef3f2da0>
tree.is_rooted()
True
tree.is_monophyletic("a", "b")
True
tree.get_ancestors("a")
{<Node(idx=0, name='a')>, <Node(idx=5)>, <Node(idx=8)>}
tree.get_mrca_node("d", "e")
<Node(idx=6)>
tree.get_tip_labels()
['a', 'b', 'c', 'd', 'e']
Selecting nodes¶
Many methods in toytree
require selecting one or more nodes from a tree to operate on. This can often be challenging since most nodes in a tree usually do not have unique names assigned to them, and selecting nodes by a numeric indexing method can be error-prone if the indices change. We have tried to design the node query and selection methods in toytree to be maximally flexible to allow for ease-of-use when selecting nodes while also trying to prevent users from making simple and common mistakes.
The get_nodes
and get_mrca_node
methods of ToyTree
objects provide a flexible approach to selecting one or more nodes either by name or by their unique integer indices. See the Node Query/Selection documentation section for details.
# select nodes by name
tree.get_nodes("a", "b")
[<Node(idx=1, name='b')>, <Node(idx=0, name='a')>]
# select nodes by regular expression
tree.get_nodes("~[a-c]")
[<Node(idx=1, name='b')>, <Node(idx=0, name='a')>, <Node(idx=2, name='c')>]
# select internal node by mrca of tip names
tree.get_mrca_node("a", "b")
<Node(idx=5)>
# or, select a node directly by its idx label
tree[5]
<Node(idx=5)>
Subpackages¶
There are many possible operations, algorithms, statistics, and metrics that can be implemented or computed on tree data, and if we simply added every method as an additional function of a ToyTree
object it would become crowded and difficult to find its more common attributes and functions. Therefore, we have instead organized many of these additional methods into subpackages where functions with similar themes are organized together. For example, the rtree
subpackage is used to generate random trees under a variety of algorithms, and the mod
subpackage is used to group together many functions for modifying tree data. The methods in each of these subpackages is explained in much greater detail in their specific section of the documentation. Here we provide just a brief introduction to each.
rtree: Random tree generation functions. The rtree
subpackage provides a variety of algorithms for generating random trees that can be used for learning, testing, or analyses. For example, random, balanced, imbalanced, birth-death, coalescent, and other forms of trees can be generated with a variety of options for setting data on the trees.
# generate a birth-death tree
btree = toytree.rtree.bdtree(ntips=8, b=1, d=0.1, seed=123, random_names=True)
btree.draw(scale_bar=True);
enum: enumeration of tree data. Many algorithms for working with trees involve analyzing and comparing subsets of trees, such as bipartitions or quartets created by edges, or require knowing the number of possible trees of a given size. The enum
subpackage provides a number of exact calculations, or generator functions, for accessing these data efficiently with a variety of formatting options. These methods are particularly useful for implementing or testing tree analysis methods.
# expand a generator over all quartets in a tree
list(toytree.enum.iter_quartets(tree))
[({'a', 'b'}, {'d', 'e'}), ({'a', 'b'}, {'c', 'e'}), ({'a', 'b'}, {'c', 'd'}), ({'d', 'e'}, {'b', 'c'}), ({'d', 'e'}, {'a', 'c'})]
distance: node and tree distance metrics. Distances are a common type of measurement associated with trees, either in the form of measuring distances between nodes in a single tree, or comparing two or more trees using metrics of their (dis)similarity. A variety of methods for measuring node and tree distances are available in the distance
subpackage.
# return a matrix of distances between tips in a tree
toytree.distance.get_tip_distance_matrix(tree, df=True)
a | b | c | d | e | |
---|---|---|---|---|---|
a | 0.0 | 2.0 | 8.0 | 8.0 | 8.0 |
b | 2.0 | 0.0 | 8.0 | 8.0 | 8.0 |
c | 8.0 | 8.0 | 0.0 | 6.0 | 6.0 |
d | 8.0 | 8.0 | 6.0 | 0.0 | 2.0 |
e | 8.0 | 8.0 | 6.0 | 2.0 | 0.0 |
# return the Robinson-Foulds tree distance between two random 10-tip trees
rtree1 = toytree.rtree.rtree(10, seed=123)
rtree2 = toytree.rtree.rtree(10, seed=321)
toytree.distance.get_treedist_rf(rtree1, rtree2, normalize=True)
0.8571428571428571
mod: Tree modifications. Methods for manipulating and modifying tree data can be used to generate new trees, change the structure or data contained in a tree, and to store new data in trees. This is currently the largest subpackage with many common tree manipulation algorithms implemented.
# return a tree with edges scaled so root is at height=100
modified = toytree.mod.edges_scale_to_root_height(tree, 100)
# show the original and modified trees side by side
toytree.mtree([tree, modified]).draw(scale_bar=True);
# return a tree with a new split (internal node and child) added
modified = toytree.mod.add_internal_node_and_child(tree, "d", name="x")
# show the original and modified trees side by side
toytree.mtree([tree, modified]).draw();
annotate: add annotations to tree drawings. The ToyTree.draw()
function accepts a large number of arguments that allow it to style tree drawings in variety of ways. However, it is difficult to make all options available within one function without causing confusion. Therefore, we have developed the annotate
subpackage to house many additional methods for adding annotations to tree drawings after they are initially created. Some may prefer the use of this module to create tree drawing code that is more readable and atomized.
# draw a tree and store returned objects
canvas, axes, mark = tree.draw()
# annotate method to add node markers
toytree.annotate.add_tip_markers(tree, axes, color="salmon", size=12);
# annotate method to add to edge labels
toytree.annotate.add_edge_labels(tree, axes, labels="idx", font_size=15, yshift=-12, mask=False);
pcm: phylogenetic comparative methods. This module has a long way to go towards offering many of the numerous comparative methods that have been developed over decades for studying evolution on trees. Currently, a number of simulation and model fitting approaches are available for discrete and continuous traits. (This is a great place to contribute to toytree!)
# get variance-covariance matrix from tree
toytree.pcm.get_vcv_matrix_from_tree(tree, df=True)
a | b | c | d | e | |
---|---|---|---|---|---|
a | 1.0 | 3.0 | 0.0 | 0.0 | 0.0 |
b | 3.0 | 1.0 | 0.0 | 0.0 | 0.0 |
c | 0.0 | 0.0 | 3.0 | 1.0 | 1.0 |
d | 0.0 | 0.0 | 1.0 | 1.0 | 3.0 |
e | 0.0 | 0.0 | 1.0 | 3.0 | 1.0 |
# simulate a discrete trait under a Markov transition model
toytree.pcm.simulate_discrete_data(tree, nstates=3, model="ER", nreplicates=5)
t0 | t1 | t2 | t3 | t4 | |
---|---|---|---|---|---|
0 | 1 | 0 | 2 | 1 | 1 |
1 | 2 | 2 | 1 | 0 | 1 |
2 | 2 | 1 | 1 | 0 | 0 |
3 | 2 | 2 | 2 | 1 | 2 |
4 | 1 | 0 | 0 | 0 | 2 |
5 | 0 | 0 | 1 | 1 | 1 |
6 | 2 | 0 | 0 | 2 | 2 |
7 | 0 | 2 | 2 | 2 | 2 |
8 | 2 | 2 | 1 | 1 | 0 |
Storing data to trees¶
Any arbitrary data can be stored to trees by assigning it to the Node
objects in the tree. This can be done in the most simple way by iterating over nodes in a tree and assigning values to nodes, or, it can be done by using a tree's .set_node_data()
function, which has options for making it easy to assign values to some nodes but not others. The get_node_data()
function is especially useful here in that it collects data from all Nodes and can provide NaN, or a custom value, for nodes that either have no value for a feature, or lack the feature all together, depending on how it was assigned. Several examples for setting data to nodes are shown:
# make a copy of tree on which we will add a bunch of data
dtree = tree.copy()
# add a feature and set all Nodes to a default value
dtree = dtree.set_node_data("trait1", default=10)
# or set some to specific values and others to a default
dtree = dtree.set_node_data("trait2", {i: 5 for i in range(dtree.ntips)}, default=1)
# or add some to specific values and leave others as NaN
dtree = dtree.set_node_data("trait3", {0: "X", 1: "Y"})
# or, add a feature by assigning as an attribute to one or more Nodes
dtree[6].trait4 = "special"
# show the data
dtree.get_node_data()
idx | name | height | dist | support | trait1 | trait2 | trait3 | trait4 | |
---|---|---|---|---|---|---|---|---|---|
0 | 0 | a | 0.0 | 1.0 | NaN | 10 | 5 | X | NaN |
1 | 1 | b | 0.0 | 1.0 | NaN | 10 | 5 | Y | NaN |
2 | 2 | c | 0.0 | 3.0 | NaN | 10 | 5 | NaN | NaN |
3 | 3 | d | 0.0 | 1.0 | NaN | 10 | 5 | NaN | NaN |
4 | 4 | e | 0.0 | 1.0 | NaN | 10 | 5 | NaN | NaN |
5 | 5 | 1.0 | 3.0 | 90.0 | 10 | 1 | NaN | NaN | |
6 | 6 | 1.0 | 2.0 | 100.0 | 10 | 1 | NaN | special | |
7 | 7 | 3.0 | 1.0 | 100.0 | 10 | 1 | NaN | NaN | |
8 | 8 | 4.0 | 0.0 | NaN | 10 | 1 | NaN | NaN |
Tree drawings¶
When you call .draw()
on a tree it returns three objects, a Canvas, Cartesian, and Mark. This follows the design principle of the toyplot plotting library that toytree
uses as its default plotting backend. The Canvas describes the dimensions of the plot space, the Cartesian coordinates define how to project points onto that space, and a Mark represents plotted data. One Canvas can have multiple Cartesian coordinates, and each Cartesian object can have multiple Marks. After a plot is generated, each of these objects can be interacted with to set additional styling to axes, ticks, font sizes, etc. The three returned objects can be seen in the output field of the cell below.
# the draw function returns three objects
tree.draw()
(<toyplot.canvas.Canvas at 0x7fd8ef420940>, <toyplot.coordinates.Cartesian at 0x7fd8ef421600>, <toytree.drawing.src.mark_toytree.ToyTreeMark at 0x7fd8ef4209a0>)
As you may have noticed, I end many drawing commands with a semicolon which simply hides the return values when we don't intend to save them to variables. In a notebook the Canvas will automatically render in the cell below the plot even if you do not save it as a variable. If you save the notebook the rendered figure is saved to the output.
# the semicolon hides the returned text of the Canvas and Cartesian objects
tree.draw();
Finally, you can store the three returned objects, in which case you can add additional styling and/or save to a file. In this example I add additional styling to the Cartesian axes.
# or, we can store them as variables
canvas, axes, mark = tree.draw(scale_bar=True)
# and then optionally add additional styling
axes.x.label.text = "Time (Mya)"
axes.x.label.style["font-size"] = 14
axes.x.label.offset = 20
Styling tree drawings¶
There are innumerous ways in which to style tree drawings. In addition to individual options that change a style component one-by-one, we also provide a number of built-in "tree-style" arguments, which change the default style on top of which additional changes can be made. Users can also create their own tree-style dictionaries easily. You can view the draw
function docstring for more details on available arguments, or you can see which styles are available by accessing the .style
dict-like object of a ToyTree
. See the Styling
documentation for more details.
# drawing with pre-built tree_styles
tree.draw(tree_style='c'); # coalescent-style
tree.draw(tree_style='d'); # dark-style
# 'ts' is also a shortcut for tree_style
tree.draw(ts='o'); # umlaut-style
# define a style dictionary
mystyle = {
"layout": 'd',
"edge_type": 'p',
"edge_style": {
"stroke": "darkcyan",
"stroke-width": 2.5,
},
"tip_labels_colors": "black",
"tip_labels_style": {
"font-size": "16px"
},
"node_sizes": 8,
"node_colors": "dist",
"node_labels": "support",
"node_labels_style": {"baseline-shift": 12, "anchor-shift": 15, "font-size": 12},
"node_mask": (0, 1, 0),
}
# use your custom style dictionary in one or more tree drawings
tree.draw(height=300, **mystyle);
Saving tree drawings¶
Tree drawings can be saved to disk in a variety of formats, including HTML, SVG, PDF, or PNG. The simplest way to save a canvas drawing is using the toytree.save()
function, where the file format will be inferred from the filename suffix. As demonstrated below, you can also save a canvas using options from the toyplot
library.
# draw a plot and store the Canvas object to a variable
canvas, axes, mark = tree.draw(ts='p');
HTML rendering is the default format. This will save the figure as a vector graphic (SVG) wrapped in HTML with optional javascript wrapping that enables interactive features. You can share the file with others and anyone can open it in a browser. You can embed it on your website, or even display it in emails.
# HTML allows for interactivity and embedding in web sites
toytree.save(canvas, "/tmp/tree-plot.html")
# SVG for figures you will further edit in Illustrator/Inkscape
toytree.save(canvas, "/tmp/tree-plot.svg")
# PDF for final shareable figures
toytree.save(canvas, "/tmp/tree-plot.pdf")
# PNG for small and easy to share figures
toytree.save(canvas, "/tmp/tree-plot.png")
The toyplot library also has options for saving canvases. See the toyplot documentation. The toytree.save()
function above is simply a convenient wrapper around these functions.
import toyplot
toyplot.html.render(canvas, "/tmp/tree-plot.html")
import toyplot.svg
toyplot.svg.render(canvas, "/tmp/tree-plot.svg")
import toyplot.pdf
toyplot.pdf.render(canvas, "/tmp/tree-plot.pdf")
import toyplot.png
toyplot.png.render(canvas, "/tmp/tree-plot.png")