Writing tree data (I/O)¶
Tree data can be serialized into a str
as Newick, NHX, or NEXUS format using the .write()
function, available as toytree.io.write(tree, ...)
or from a ToyTree object as tree.write(...)
. This function accepts several additional arguments to optionally format float data or include additional metadata, and to write the str
result to a file path.
import toytree
import numpy as np
# get a balanced 4-tip tree
tree = toytree.rtree.baltree(ntips=4)
# write the tree to serialized newick format
tree.write()
'((r0:0.5,r1:0.5):0.5,(r2:0.5,r3:0.5):0.5);'
Take Home
Write tree data to a serialized text format (Newick, NHX, Nexus) using tree.write(...).
Example data¶
To demonstrate, let's start by generating a ToyTree
with several types of node and edge data to use for examples. Here we are assigning names to all internal nodes, support values to internal nodes/edges except the root, and a feature named "X" with a random float value to every node.
# add internal node names as "A"
tree.set_node_data("name", {4: "A", 5: "B", 6: "C"}, inplace=True)
# add internal node support values as 100
tree.set_node_data("support", {4: 100, 5: 90}, inplace=True)
# add X as node feature with random float values
tree.set_node_data("X", np.random.normal(0, 2, tree.nnodes), inplace=True)
# show the tree data
tree.get_node_data()
idx | name | height | dist | support | X | |
---|---|---|---|---|---|---|
0 | 0 | r0 | 0.0 | 0.5 | NaN | 3.241786 |
1 | 1 | r1 | 0.0 | 0.5 | NaN | 0.669912 |
2 | 2 | r2 | 0.0 | 0.5 | NaN | -3.078850 |
3 | 3 | r3 | 0.0 | 0.5 | NaN | 1.993329 |
4 | 4 | A | 0.5 | 0.5 | 100.0 | -2.234693 |
5 | 5 | B | 0.5 | 0.5 | 90.0 | 2.277798 |
6 | 6 | C | 1.0 | 0.0 | NaN | 0.931252 |
The write function¶
The default arguments to the .write()
function return a newick string with edge lengths (if present) formatted as "%.12g"
, with internal labels as "support"
values (if present) formatted as "%.12g"
, and no additional features (metadata). However, all of these options can be modified, as demonstrated below.
# Newick str from using default arguments to write()
tree.write()
'((r0:0.5,r1:0.5)100:0.5,(r2:0.5,r3:0.5)90:0.5);'
path: save to disk¶
The first argument to write()
is path
, which accepts a file path as a str
, Path
, or None. If a path
is entered then the data will be written to the designated file path and None is returned. If path=None
then nothing is written to file and the serialized tree data str
will be returned (like above). This can be useful when you want to store the str
data as a variable and do something with it. I use the default arg path=None
throughout the rest of this document after this example for demonstration.
# writes to file path, returns None
tree.write(path="/tmp/test.nwk")
Newick¶
As we saw above, the default output format of .write
is a Newick str, and when called with the default arguments it writes the dist edge lengths as well as internal node labels. By modifying these arguments you can either suppress these additional data or modify their formatting.
# write topology only set these args to None
tree.write(path=None, dist_formatter=None, internal_labels=None)
'((r0,r1),(r2,r3));'
# short-hand for simplest tree serialization
tree.write(None, None, None)
'((r0,r1),(r2,r3));'
dist_formatter: edge lengths¶
The dist_formatter
argument can be used to include or exclude edge lengths, and to format the edge lengths if they are included. By setting dist_formatter=None
edge lengths are not shown. Formatting of branch lengths takes a Python formatting string in one of two supported formats, using percent sign or curly brackets, e.g., "%.12g"
or "{:.12g}"
. See the Python documentation for further explanation of Python string formatting (or this resource). Here I set internal_labels=None
just to hide internal labels to make it easier to see the edge lengths.
# hide edge lengths
tree.write(dist_formatter=None, internal_labels=None)
'((r0,r1),(r2,r3));'
# format edge lengths to show two fixed floating points
tree.write(dist_formatter="%.2f", internal_labels=None)
'((r0:0.50,r1:0.50):0.50,(r2:0.50,r3:0.50):0.50);'
# format edge lengths to show max 4 floating points
tree.write(dist_formatter="%.4g", internal_labels=None)
'((r0:0.5,r1:0.5):0.5,(r2:0.5,r3:0.5):0.5);'
# format edge lengths as integers
tree.write(dist_formatter="%d", internal_labels=None)
'((r0:0,r1:0):0,(r2:0,r3:0):0);'
internal_labels¶
As discussed in the Parsing tree data docs, the internal label in a newick string can be ambiguous in its usage for storing either internal node names, edge support values, or possibly other types of data. The internal_labels
arg takes a str feature name as an argument. A ToyTree
always has "name" and "support" features that can be selected, and if empty, they will be ignored. Here I set dist_formatter=None
just to hide edge lengths to make it easier to see the internal_labels.
# None excludes internal labels
tree.write(dist_formatter=None, internal_labels=None)
'((r0,r1),(r2,r3));'
# use support floats as internal labels
tree.write(dist_formatter=None, internal_labels="support")
'((r0,r1)100,(r2,r3)90);'
# use name str as internal labels
tree.write(dist_formatter=None, internal_labels="name")
'((r0,r1)A,(r2,r3)B)C;'
# use other existing feature in tree as internal labels
tree.write(dist_formatter=None, internal_labels="X")
'((r0,r1)-2.2019018558,(r2,r3)-1.51247041326)0.0903984949236;'
internal_labels_formatter¶
Similar to the dist_formatter
arg above, you can similarly apply string formatting to internal_labels
when they are floats. This has no effect on internal names, but is useful for support, or other features.
# None applies no string formatting
tree.write(internal_labels_formatter=None)
'((r0:0.5,r1:0.5):0.5,(r2:0.5,r3:0.5):0.5);'
# float format the 'support' values as max 12 floating points
tree.write(internal_labels="support", internal_labels_formatter="%.12g")
'((r0:0.5,r1:0.5)100:0.5,(r2:0.5,r3:0.5)90:0.5);'
# float format the 'support' values w/ 2 fixed floating points
tree.write(internal_labels="support", internal_labels_formatter="{:.2f}")
'((r0:0.5,r1:0.5)100.00:0.5,(r2:0.5,r3:0.5)90.00:0.5);'
# float format the 'support' values as ints
tree.write(internal_labels="support", internal_labels_formatter="%d")
'((r0:0.5,r1:0.5)100:0.5,(r2:0.5,r3:0.5)90:0.5);'
Write NHX¶
The extended New Hampshire format (NHX) is simply an extension of the Newick format with metadata stored inside square brackets after nodes and/or edges. The data/features in a ToyTree
represent any data stored to one or more Node
objects of the tree (see Data/Features). These data may have been generated by some analysis tool, or could be stored manually in toytree
. You can view the features of a ToyTree
using get_node_data()
, which shows data for each Node for each feature. You can view the features by calling .features
and see which subset of features apply to edges by calling .edge_features
. This is important to note because the .write()
function will append edge features as metadata to edges, and node features as metadata to nodes in the NHX format.
features¶
# see the features of a tree
tree.features
('idx', 'name', 'height', 'dist', 'support', 'X')
# see which features are edge (not node) data
tree.edge_features
{'dist', 'support'}
# write NHX w/ "X" as node feature
tree.write(features=["X"])
'((r0[&X=0.608116507902]:0.5,r1[&X=-1.12762954173]:0.5)100[&X=-2.2019018558]:0.5,(r2[&X=-3.42995006373]:0.5,r3[&X=-1.19754491671]:0.5)90[&X=-1.51247041326]:0.5)[&X=0.0903984949236];'
# write NHX w/ "support" as edge feature
tree.write(features=["support"])
'((r0:0.5,r1:0.5)100:0.5[&support=100],(r2:0.5,r3:0.5)90:0.5[&support=90]);'
features_formatting¶
# write NHX string with one node metadata feature
tree.write(features=["X"], features_formatter="%.3f")
'((r0[&X=0.608]:0.5,r1[&X=-1.128]:0.5)100[&X=-2.202]:0.5,(r2[&X=-3.430]:0.5,r3[&X=-1.198]:0.5)90[&X=-1.512]:0.5)[&X=0.090];'
Write NEXUS¶
Converting tree data into NEXUS format is trivial, simply add the nexus=True
argument to write. You can still use any of the formatting options above to format the Newick/NHX string, but it will now be written inside a "trees" block, with names translated into integers, with a translation section, and with a "#NEXUS" header.
# write tree in Newick format wrapped in Nexus
nexus = tree.write(nexus=True)
print(nexus)
#NEXUS begin trees; translate 0 r0, 1 r1, 2 r2, 3 r3, ; tree 0 = [&R] ((0:0.5,1:0.5)100:0.5,(2:0.5,3:0.5)90:0.5); end;
# write tree in NHX format wrapped in Nexus
nexus = tree.write(features=["support", "name", "X"], nexus=True, features_formatter="%.2f")
print(nexus)
#NEXUS begin trees; translate 0 r0, 1 r1, 2 r2, 3 r3, ; tree 0 = [&R] ((0[&name=r0,X=0.61]:0.5,1[&name=r1,X=-1.13]:0.5)100[&name=A,X=-2.20]:0.5[&support=100.00],(2[&name=r2,X=-3.43]:0.5,3[&name=r3,X=-1.20]:0.5)90[&name=B,X=-1.51]:0.5[&support=90.00])[&name=C,X=0.09]; end;
# write tree to file as Nexus
tree.write(path="/tmp/test.nex", nexus=True)
Write MultiTrees¶
MultiTrees have a .write()
function that works very similarly to the ToyTree.write
but applies to each tree in order. A multi- Newick file contains trees separated by newline characters, whereas a multi Nexus file contains trees labels by increasing number in the trees block.
# create a MultiTree
mtree = toytree.mtree([tree, tree, tree])
# write multi-Newick
print(mtree.write())
((r0:0.5,r1:0.5)100:0.5,(r2:0.5,r3:0.5)90:0.5); ((r0:0.5,r1:0.5)100:0.5,(r2:0.5,r3:0.5)90:0.5); ((r0:0.5,r1:0.5)100:0.5,(r2:0.5,r3:0.5)90:0.5);