Writing tree data (I/O)¶

Tree data can be serialized into a str as Newick, NHX, or NEXUS format using the .write() function, available as toytree.io.write(tree, ...) or from a ToyTree object as tree.write(...). This function accepts several additional arguments to optionally format float data or include additional metadata, and to write the str result to a file path.

In [1]:

Copied!

import toytree
import numpy as np
import toytree
import numpy as np

In [3]:

Copied!

# get a balanced 4-tip tree
tree = toytree.rtree.baltree(ntips=4)

# write the tree to serialized newick format
tree.write()
# get a balanced 4-tip tree
tree = toytree.rtree.baltree(ntips=4)

# write the tree to serialized newick format
tree.write()

Out[3]:

'((r0:0.5,r1:0.5):0.5,(r2:0.5,r3:0.5):0.5);'

Take Home

Write tree data to a serialized text format (Newick, NHX, Nexus) using tree.write(...).

Example data¶

To demonstrate, let's start by generating a ToyTree with several types of node and edge data to use for examples. Here we are assigning names to all internal nodes, support values to internal nodes/edges except the root, and a feature named "X" with a random float value to every node.

In [4]:

Copied!





# add internal node names as "A"
tree.set_node_data("name", {4: "A", 5: "B", 6: "C"}, inplace=True)

# add internal node support values as 100
tree.set_node_data("support", {4: 100, 5: 90}, inplace=True)

# add X as node feature with random float values
tree.set_node_data("X", np.random.normal(0, 2, tree.nnodes), inplace=True)

# show the tree data
tree.get_node_data()
# add internal node names as "A"
tree.set_node_data("name", {4: "A", 5: "B", 6: "C"}, inplace=True)

# add internal node support values as 100
tree.set_node_data("support", {4: 100, 5: 90}, inplace=True)

# add X as node feature with random float values
tree.set_node_data("X", np.random.normal(0, 2, tree.nnodes), inplace=True)

# show the tree data
tree.get_node_data()

Out[4]:

	idx	name	height	dist	support	X
0	0	r0	0.0	0.5	NaN	3.241786
1	1	r1	0.0	0.5	NaN	0.669912
2	2	r2	0.0	0.5	NaN	-3.078850
3	3	r3	0.0	0.5	NaN	1.993329
4	4	A	0.5	0.5	100.0	-2.234693
5	5	B	0.5	0.5	90.0	2.277798
6	6	C	1.0	0.0	NaN	0.931252

The write function¶

The default arguments to the .write() function return a newick string with edge lengths (if present) formatted as "%.12g", with internal labels as "support" values (if present) formatted as "%.12g", and no additional features (metadata). However, all of these options can be modified, as demonstrated below.

In [3]:

Copied!

# Newick str from using default arguments to write()
tree.write()
# Newick str from using default arguments to write()
tree.write()

Out[3]:

'((r0:0.5,r1:0.5)100:0.5,(r2:0.5,r3:0.5)90:0.5);'

path: save to disk¶

The first argument to write() is path, which accepts a file path as a str, Path, or None. If a path is entered then the data will be written to the designated file path and None is returned. If path=None then nothing is written to file and the serialized tree data str will be returned (like above). This can be useful when you want to store the str data as a variable and do something with it. I use the default arg path=None throughout the rest of this document after this example for demonstration.

In [4]:

Copied!

# writes to file path, returns None
tree.write(path="/tmp/test.nwk")
# writes to file path, returns None
tree.write(path="/tmp/test.nwk")

Newick¶

As we saw above, the default output format of .write is a Newick str, and when called with the default arguments it writes the dist edge lengths as well as internal node labels. By modifying these arguments you can either suppress these additional data or modify their formatting.

In [5]:

Copied!

# write topology only set these args to None
tree.write(path=None, dist_formatter=None, internal_labels=None)
# write topology only set these args to None
tree.write(path=None, dist_formatter=None, internal_labels=None)

Out[5]:

'((r0,r1),(r2,r3));'

In [6]:

Copied!

# short-hand for simplest tree serialization
tree.write(None, None, None)
# short-hand for simplest tree serialization
tree.write(None, None, None)

Out[6]:

'((r0,r1),(r2,r3));'

dist_formatter: edge lengths¶

The dist_formatter argument can be used to include or exclude edge lengths, and to format the edge lengths if they are included. By setting dist_formatter=None edge lengths are not shown. Formatting of branch lengths takes a Python formatting string in one of two supported formats, using percent sign or curly brackets, e.g., "%.12g" or "{:.12g}". See the Python documentation for further explanation of Python string formatting (or this resource). Here I set internal_labels=None just to hide internal labels to make it easier to see the edge lengths.

In [7]:

Copied!

# hide edge lengths
tree.write(dist_formatter=None, internal_labels=None)
# hide edge lengths
tree.write(dist_formatter=None, internal_labels=None)

Out[7]:

'((r0,r1),(r2,r3));'

In [8]:

Copied!

# format edge lengths to show two fixed floating points
tree.write(dist_formatter="%.2f", internal_labels=None)
# format edge lengths to show two fixed floating points
tree.write(dist_formatter="%.2f", internal_labels=None)

Out[8]:

'((r0:0.50,r1:0.50):0.50,(r2:0.50,r3:0.50):0.50);'

In [9]:

Copied!

# format edge lengths to show max 4 floating points
tree.write(dist_formatter="%.4g", internal_labels=None)
# format edge lengths to show max 4 floating points
tree.write(dist_formatter="%.4g", internal_labels=None)

Out[9]:

'((r0:0.5,r1:0.5):0.5,(r2:0.5,r3:0.5):0.5);'

In [10]:

Copied!

# format edge lengths as integers
tree.write(dist_formatter="%d", internal_labels=None)
# format edge lengths as integers
tree.write(dist_formatter="%d", internal_labels=None)

Out[10]:

'((r0:0,r1:0):0,(r2:0,r3:0):0);'

internal_labels¶

As discussed in the Parsing tree data docs, the internal label in a newick string can be ambiguous in its usage for storing either internal node names, edge support values, or possibly other types of data. The internal_labels arg takes a str feature name as an argument. A ToyTree always has "name" and "support" features that can be selected, and if empty, they will be ignored. Here I set dist_formatter=None just to hide edge lengths to make it easier to see the internal_labels.

In [11]:

Copied!

# None excludes internal labels
tree.write(dist_formatter=None, internal_labels=None)
# None excludes internal labels
tree.write(dist_formatter=None, internal_labels=None)

Out[11]:

'((r0,r1),(r2,r3));'

In [12]:

Copied!

# use support floats as internal labels
tree.write(dist_formatter=None, internal_labels="support")
# use support floats as internal labels
tree.write(dist_formatter=None, internal_labels="support")

Out[12]:

'((r0,r1)100,(r2,r3)90);'

In [13]:

Copied!

# use name str as internal labels
tree.write(dist_formatter=None, internal_labels="name")
# use name str as internal labels
tree.write(dist_formatter=None, internal_labels="name")

Out[13]:

'((r0,r1)A,(r2,r3)B)C;'

In [14]:

Copied!

# use other existing feature in tree as internal labels
tree.write(dist_formatter=None, internal_labels="X")
# use other existing feature in tree as internal labels
tree.write(dist_formatter=None, internal_labels="X")

Out[14]:

'((r0,r1)-2.2019018558,(r2,r3)-1.51247041326)0.0903984949236;'

internal_labels_formatter¶

Similar to the dist_formatter arg above, you can similarly apply string formatting to internal_labels when they are floats. This has no effect on internal names, but is useful for support, or other features.

In [15]:

Copied!

# None applies no string formatting
tree.write(internal_labels_formatter=None)
# None applies no string formatting
tree.write(internal_labels_formatter=None)

Out[15]:

'((r0:0.5,r1:0.5):0.5,(r2:0.5,r3:0.5):0.5);'

In [16]:

Copied!

# float format the 'support' values as max 12 floating points
tree.write(internal_labels="support", internal_labels_formatter="%.12g")
# float format the 'support' values as max 12 floating points
tree.write(internal_labels="support", internal_labels_formatter="%.12g")

Out[16]:

'((r0:0.5,r1:0.5)100:0.5,(r2:0.5,r3:0.5)90:0.5);'

In [17]:

Copied!

# float format the 'support' values w/ 2 fixed floating points
tree.write(internal_labels="support", internal_labels_formatter="{:.2f}")
# float format the 'support' values w/ 2 fixed floating points
tree.write(internal_labels="support", internal_labels_formatter="{:.2f}")

Out[17]:

'((r0:0.5,r1:0.5)100.00:0.5,(r2:0.5,r3:0.5)90.00:0.5);'

In [18]:

Copied!

# float format the 'support' values as ints
tree.write(internal_labels="support", internal_labels_formatter="%d")
# float format the 'support' values as ints
tree.write(internal_labels="support", internal_labels_formatter="%d")

Out[18]:

'((r0:0.5,r1:0.5)100:0.5,(r2:0.5,r3:0.5)90:0.5);'

Write NHX¶

The extended New Hampshire format (NHX) is simply an extension of the Newick format with metadata stored inside square brackets after nodes and/or edges. The data/features in a ToyTree represent any data stored to one or more Node objects of the tree (see Data/Features). These data may have been generated by some analysis tool, or could be stored manually in toytree. You can view the features of a ToyTree using get_node_data(), which shows data for each Node for each feature. You can view the features by calling .features and see which subset of features apply to edges by calling .edge_features. This is important to note because the .write() function will append edge features as metadata to edges, and node features as metadata to nodes in the NHX format.

features¶

In [19]:

Copied!

# see the features of a tree
tree.features
# see the features of a tree
tree.features

Out[19]:

('idx', 'name', 'height', 'dist', 'support', 'X')

In [20]:

Copied!

# see which features are edge (not node) data
tree.edge_features
# see which features are edge (not node) data
tree.edge_features

Out[20]:

{'dist', 'support'}

In [21]:

Copied!

# write NHX w/ "X" as node feature
tree.write(features=["X"])
# write NHX w/ "X" as node feature
tree.write(features=["X"])

Out[21]:

'((r0[&X=0.608116507902]:0.5,r1[&X=-1.12762954173]:0.5)100[&X=-2.2019018558]:0.5,(r2[&X=-3.42995006373]:0.5,r3[&X=-1.19754491671]:0.5)90[&X=-1.51247041326]:0.5)[&X=0.0903984949236];'

In [22]:

Copied!

# write NHX w/ "support" as edge feature
tree.write(features=["support"])
# write NHX w/ "support" as edge feature
tree.write(features=["support"])

Out[22]:

'((r0:0.5,r1:0.5)100:0.5[&support=100],(r2:0.5,r3:0.5)90:0.5[&support=90]);'

features_formatting¶

In [23]:

Copied!

# write NHX string with one node metadata feature  
tree.write(features=["X"], features_formatter="%.3f")
# write NHX string with one node metadata feature  
tree.write(features=["X"], features_formatter="%.3f")

Out[23]:

'((r0[&X=0.608]:0.5,r1[&X=-1.128]:0.5)100[&X=-2.202]:0.5,(r2[&X=-3.430]:0.5,r3[&X=-1.198]:0.5)90[&X=-1.512]:0.5)[&X=0.090];'

Write NEXUS¶

Converting tree data into NEXUS format is trivial, simply add the nexus=True argument to write. You can still use any of the formatting options above to format the Newick/NHX string, but it will now be written inside a "trees" block, with names translated into integers, with a translation section, and with a "#NEXUS" header.

In [24]:

Copied!

# write tree in Newick format wrapped in Nexus
nexus = tree.write(nexus=True)
print(nexus)
# write tree in Newick format wrapped in Nexus
nexus = tree.write(nexus=True)
print(nexus)

#NEXUS
begin trees;
    translate
        0 r0,
        1 r1,
        2 r2,
        3 r3,
    ;
    tree 0 = [&R] ((0:0.5,1:0.5)100:0.5,(2:0.5,3:0.5)90:0.5);
end;

In [25]:

Copied!

# write tree in NHX format wrapped in Nexus
nexus = tree.write(features=["support", "name", "X"], nexus=True, features_formatter="%.2f")
print(nexus)
# write tree in NHX format wrapped in Nexus
nexus = tree.write(features=["support", "name", "X"], nexus=True, features_formatter="%.2f")
print(nexus)

#NEXUS
begin trees;
    translate
        0 r0,
        1 r1,
        2 r2,
        3 r3,
    ;
    tree 0 = [&R] ((0[&name=r0,X=0.61]:0.5,1[&name=r1,X=-1.13]:0.5)100[&name=A,X=-2.20]:0.5[&support=100.00],(2[&name=r2,X=-3.43]:0.5,3[&name=r3,X=-1.20]:0.5)90[&name=B,X=-1.51]:0.5[&support=90.00])[&name=C,X=0.09];
end;

In [26]:

Copied!

# write tree to file as Nexus
tree.write(path="/tmp/test.nex", nexus=True)
# write tree to file as Nexus
tree.write(path="/tmp/test.nex", nexus=True)

Write MultiTrees¶

MultiTrees have a .write() function that works very similarly to the ToyTree.write but applies to each tree in order. A multi- Newick file contains trees separated by newline characters, whereas a multi Nexus file contains trees labels by increasing number in the trees block.

In [27]:

Copied!

# create a MultiTree
mtree = toytree.mtree([tree, tree, tree])
# create a MultiTree
mtree = toytree.mtree([tree, tree, tree])

In [28]:

Copied!

# write multi-Newick
print(mtree.write())
# write multi-Newick
print(mtree.write())

((r0:0.5,r1:0.5)100:0.5,(r2:0.5,r3:0.5)90:0.5);
((r0:0.5,r1:0.5)100:0.5,(r2:0.5,r3:0.5)90:0.5);
((r0:0.5,r1:0.5)100:0.5,(r2:0.5,r3:0.5)90:0.5);