Bipartition methods¶
A tree can be bisected along any branch to split it into two smaller subtrees, termed bipartitions. There are typically many bipartitions in a tree, and any two trees with different topologies will differ in their bipartition sets. For this reason, comparing the bipartition sets between trees is used in many contexts to quantify their differences. Because extracting and comparing bipartitions is a common procedure for many tree-based methods, we provide a convenient and powerful method for extracting and formatting bipartitions from trees using the iter_bipartitions
method.
import toytree
# unrooted tree w/ internal Node labels used in examples
tree = toytree.tree("(a,b,((c,d)CD,(e,f)EF)X)T;")
iter_bipartitions¶
The method iter_bipartitions()
returns a generator that can be used to iterate over the bipartitions in a tree. This takes several arguments that can toggle the type of information returned (e.g., Nodes, node names, node features); the type of Nodes to include (only tips, or tips and internal nodes); the type of object to be returned containing the bipartitions (e.g., set, tuple); and whether or not to sort the bipartitions. These options are each demonstrated further below. First, we show a simple demonstration of the default options for extracting bipartitions from the example tree below.
# visualize the example tree w/ node and edge labels
c, a, m = tree.draw(use_edge_lengths=False)
tree.annotate.add_edge_markers(a, size=18)
tree.annotate.add_edge_labels(a)
tree.annotate.add_node_markers(a, size=18, marker="s", color="orange")
tree.annotate.add_node_labels(a, "name");
When run with the default arguments this returns only the bipartitions created by internal edges in the tree. The three edges labeled in green above correspond to the three bipartitions shown below, where each is returned as a tuple with two sets of tip Node names that exist on either side of each split.
# iterate and print the bipartitions
for bipartition in tree.iter_bipartitions():
print(bipartition)
({'c', 'd'}, {'b', 'e', 'a', 'f'}) ({'f', 'e'}, {'b', 'c', 'a', 'd'}) ({'c', 'f', 'e', 'd'}, {'b', 'a'})
Module-level versus object-level API¶
You can access the iter_bipartitions
from three places in toytree
: from the module level using toytree.enum
; from the ToyTree.enum
subpackage API associated with any Toytree
object; and finally, also from a ToyTree
object directly, since it is a relatively common method. Each is demonstrated below, where the module-level method accepts a ToyTree
as its first argument, while the latter two examples use context to know which tree to extract bipartitions from.
toytree.enum.iter_bipartitions(tree)
<generator object iter_bipartitions at 0x78d181191800>
tree.enum.iter_bipartitions()
<generator object iter_bipartitions at 0x78d181191f80>
tree.iter_bipartitions()
<generator object iter_bipartitions at 0x78d181191e40>
feature¶
The default behavior is to represent nodes on either side of a bipartition using node names. In other words, we return the "name" feature of Node objects. This is determined by the argument feature
, which can be toggled to instead return Node objects, or any other desired feature of nodes. This can be particularly useful when nodes do not have unique names. In that case, returning Node objects, or their unique integer "idx" labels is likely more useful.
# default: feature=name
for bipart in tree.iter_bipartitions():
print(bipart)
({'c', 'd'}, {'b', 'e', 'a', 'f'}) ({'f', 'e'}, {'b', 'c', 'a', 'd'}) ({'c', 'f', 'e', 'd'}, {'b', 'a'})
# feature=idx is better when names are not unique
for bipart in tree.iter_bipartitions(feature='idx'):
print(bipart)
({2, 3}, {0, 1, 4, 5}) ({4, 5}, {0, 1, 2, 3}) ({2, 3, 4, 5}, {0, 1})
# or feature=None to get Node objects
for bipart in tree.iter_bipartitions(feature=None):
print(bipart)
({<Node(idx=2, name='c')>, <Node(idx=3, name='d')>}, {<Node(idx=4, name='e')>, <Node(idx=1, name='b')>, <Node(idx=0, name='a')>, <Node(idx=5, name='f')>}) ({<Node(idx=4, name='e')>, <Node(idx=5, name='f')>}, {<Node(idx=0, name='a')>, <Node(idx=1, name='b')>, <Node(idx=2, name='c')>, <Node(idx=3, name='d')>}) ({<Node(idx=4, name='e')>, <Node(idx=2, name='c')>, <Node(idx=5, name='f')>, <Node(idx=3, name='d')>}, {<Node(idx=0, name='a')>, <Node(idx=1, name='b')>})
type¶
The items within these tuples can be specified to be represented by a particular object type (e.g., set
, frozenset
, list
, tuple
) using the argument type
. The conversion of data to the specified type is performed efficiently within the iter_bipartitions
method, and so selecting the appropriate type here is typically better than converting yourself afterwards. It is particularly important to be aware of the shortcomings of the set
type, which is the default since it is most useful for comparing bipartitions between trees, but it not ideal for other types of operations or data, since sets cannot be sorted, or store items with identical values. In those cases, setting type=tuple
is commonly useful.
# tuples of tuples
for bipart in tree.iter_bipartitions(type=tuple):
print(bipart)
(('c', 'd'), ('a', 'b', 'e', 'f')) (('e', 'f'), ('a', 'b', 'c', 'd')) (('c', 'd', 'e', 'f'), ('a', 'b'))
# tuples of lists
for bipart in tree.iter_bipartitions(type=list):
print(bipart)
(['c', 'd'], ['a', 'b', 'e', 'f']) (['e', 'f'], ['a', 'b', 'c', 'd']) (['c', 'd', 'e', 'f'], ['a', 'b'])
compatibile features & types¶
To demonstrate an example mishap, consider a case where we want to measure the branch lengths of nodes on either side of each bipartition. We could select dist
(branch length) as the feature to get this feature extracted from each Node on each side of each bipartition. However, because some nodes have identical dist values in this example, they are collapsed in a set, such that we lose data. This can be addressed by changing the returned type from a set to a tuple, as shown below.
# beware when selecting node features stored as a set (default)
for bipart in tree.iter_bipartitions(feature="dist"):
print(bipart)
({1.0}, {1.0}) ({1.0}, {1.0}) ({1.0}, {1.0})
# use type=tuple instead of set when identical node features exist
for bipart in tree.iter_bipartitions(feature="dist", type=tuple):
print(bipart)
((1.0, 1.0), (1.0, 1.0, 1.0, 1.0)) ((1.0, 1.0), (1.0, 1.0, 1.0, 1.0)) ((1.0, 1.0, 1.0, 1.0), (1.0, 1.0))
singleton partitions¶
By default, singleton bipartitions (e.g., (A | B,C,D)) which only separate a single tip Node from the rest of the tree are excluded. This is because these are implicitly shared between any two trees that have the same tips, and thus these bipartitions are not informative about tree differences. However, they can be informative for other reasons. And so if you wish to include singleton bipartitions in the returned generator you can use the argument include_singleton_partitions=True
.
# include singleton splits
for bipartition in tree.iter_bipartitions(include_singleton_partitions=True):
print(bipartition)
({'a'}, {'f', 'b', 'c', 'e', 'd'}) ({'b'}, {'f', 'a', 'c', 'e', 'd'}) ({'c'}, {'f', 'a', 'b', 'e', 'd'}) ({'d'}, {'f', 'a', 'b', 'c', 'e'}) ({'e'}, {'f', 'a', 'b', 'c', 'd'}) ({'f'}, {'a', 'b', 'c', 'e', 'd'}) ({'c', 'd'}, {'b', 'e', 'a', 'f'}) ({'f', 'e'}, {'b', 'c', 'a', 'd'}) ({'c', 'f', 'e', 'd'}, {'b', 'a'})
Sort¶
There are three orders to be aware of when iterating over bipartitions: (1) first is the order in which bipartitions are returned given the tree topology; (2) second is the order of the two partitions the compose each bipartition; and (3) finally the order of items within a partition.
The first is always the same. Bipartitions are returned in idx order by iterating over the tree starting from the edge above Node 0, and ending with an edge below the treenode.
The second is modified if
sort=True
. By default each bipartition is generated as a tuple containing the nodes (below, above) a given edge. When sorted, these will instead be ordered first by length, so that the shorter partition comes first, and second by the lowest item value (e.g., alphanumeric is str names). Thus, if the two partitions are of equal length the one containing the lowest name comes first (e.g.,({'a', 'b'}, {'c', 'd'})
).Finally, the order of items within partitions is also sorted if
sort=True
, if possible. This depends on thetype
argument used as well. If the partition object type is sortable (e.g., it is a tuple and not a set) then the items within the partition are also sorted.
An example is shown below on a tree with randomly ordered names.
# generate a random tree
rtree = toytree.rtree.unittree(ntips=8, random_names=True, seed=123)
# draw the tree
rtree.draw();
# show the unordered bipartitions
list(rtree.iter_bipartitions(sort=False))
[({'r0', 'r3'}, {'r1', 'r2', 'r4', 'r5', 'r6', 'r7'}), ({'r5', 'r6'}, {'r0', 'r1', 'r2', 'r3', 'r4', 'r7'}), ({'r0', 'r3', 'r5', 'r6'}, {'r1', 'r2', 'r4', 'r7'}), ({'r4', 'r7'}, {'r0', 'r1', 'r2', 'r3', 'r5', 'r6'}), ({'r0', 'r3', 'r4', 'r5', 'r6', 'r7'}, {'r1', 'r2'})]
# show the ordered bipartitions
list(rtree.iter_bipartitions(sort=True))
[({'r0', 'r3'}, {'r1', 'r2', 'r4', 'r5', 'r6', 'r7'}), ({'r5', 'r6'}, {'r0', 'r1', 'r2', 'r3', 'r4', 'r7'}), ({'r0', 'r3', 'r5', 'r6'}, {'r1', 'r2', 'r4', 'r7'}), ({'r4', 'r7'}, {'r0', 'r1', 'r2', 'r3', 'r5', 'r6'}), ({'r1', 'r2'}, {'r0', 'r3', 'r4', 'r5', 'r6', 'r7'})]
# show the ordered bipartitions
list(rtree.iter_bipartitions(sort=True, type=tuple))
[(('r0', 'r3'), ('r1', 'r2', 'r4', 'r5', 'r6', 'r7')), (('r5', 'r6'), ('r0', 'r1', 'r2', 'r3', 'r4', 'r7')), (('r0', 'r3', 'r5', 'r6'), ('r1', 'r2', 'r4', 'r7')), (('r4', 'r7'), ('r0', 'r1', 'r2', 'r3', 'r5', 'r6')), (('r1', 'r2'), ('r0', 'r3', 'r4', 'r5', 'r6', 'r7'))]
Internal nodes¶
The default behavior is to only represent tip nodes in the returned bipartitions, as this provides sufficient information for comparing tree topologies. But there can be other cases where you want to compare the internal nodes as well on either side of each bipartition, particularly when the internal nodes have features associated with them, such as internal node names. You can toggle the behavior to also include internal nodes in bipartitions using include_internal_nodes=True
. Note that the treenode (here labeled T) is included in this set by default (see below for further options concerning the treenode).
# include internal nodes
for bipart in tree.iter_bipartitions(include_internal_nodes=True):
print(bipart)
({'CD', 'c', 'd'}, {'f', 'a', 'b', 'e', 'T', 'EF', 'X'}) ({'EF', 'f', 'e'}, {'a', 'b', 'c', 'T', 'CD', 'd', 'X'}) ({'f', 'c', 'e', 'CD', 'EF', 'X', 'd'}, {'T', 'b', 'a'})
treenode (root)¶
Note that the treenode can be moved to difference places by re-rooting a tree and this does not change the number or identity of the bipartitions, since the number of splits in the tree remains the same, since we do not consider a split induced by the root as a bipartition. This is demonstrated below by comparing a rooted versus unrooted tree, and comparing two alternatively rooted trees. We show the internal node names to show that this is true whether or not internal nodes are included.
rooted versus unrooted¶
# bipartitions of the tree when unrooted
list(tree.iter_bipartitions(include_internal_nodes=True, sort=True))
[({'CD', 'c', 'd'}, {'EF', 'T', 'X', 'a', 'b', 'e', 'f'}), ({'EF', 'e', 'f'}, {'CD', 'T', 'X', 'a', 'b', 'c', 'd'}), ({'T', 'a', 'b'}, {'CD', 'EF', 'X', 'c', 'd', 'e', 'f'})]
# bipartitions of the tree after rooting
list(tree.root("c", "d").iter_bipartitions(include_internal_nodes=True, sort=True))
[({'CD', 'c', 'd'}, {'EF', 'T', 'X', 'a', 'b', 'e', 'f'}), ({'EF', 'e', 'f'}, {'CD', 'T', 'X', 'a', 'b', 'c', 'd'}), ({'T', 'a', 'b'}, {'CD', 'EF', 'X', 'c', 'd', 'e', 'f'})]
two alternative rootings¶
Two alternative rootings of the tree return the exact same bipartitions, and only differ in the order that those bipartitions are returned.
# bipartitions of the tree after rooting
list(tree.root("a").iter_bipartitions(include_internal_nodes=True, sort=True))
[({'CD', 'c', 'd'}, {'EF', 'T', 'X', 'a', 'b', 'e', 'f'}), ({'EF', 'e', 'f'}, {'CD', 'T', 'X', 'a', 'b', 'c', 'd'}), ({'T', 'a', 'b'}, {'CD', 'EF', 'X', 'c', 'd', 'e', 'f'})]
# bipartitions of the tree after rooting
list(tree.root("c", "d").iter_bipartitions(include_internal_nodes=True, sort=True))
[({'CD', 'c', 'd'}, {'EF', 'T', 'X', 'a', 'b', 'e', 'f'}), ({'EF', 'e', 'f'}, {'CD', 'T', 'X', 'a', 'b', 'c', 'd'}), ({'T', 'a', 'b'}, {'CD', 'EF', 'X', 'c', 'd', 'e', 'f'})]
Comparing bipartitions¶
Below are some example set operations used to compare bipartitions between two trees. This is merely shown for demonstration for users who want to develop their own metrics based on bipartitions. See the toytree.distance
module for many examples of existing tree distance metrics already implemented in toytree
.
# three trees to compare
tree_1 = toytree.tree("(a,b,((c,d),(e,f)));")
tree_2 = toytree.tree("(c,d,((a,b),(e,f)));")
tree_3 = toytree.tree("(a,c,((b,d),(e,f)));")
# get SORTED bipartitions
biparts_1 = set(tree_1.iter_bipartitions(type=frozenset, sort=True))
biparts_1
{(frozenset({'c', 'd'}), frozenset({'a', 'b', 'e', 'f'})), (frozenset({'e', 'f'}), frozenset({'a', 'b', 'c', 'd'})), (frozenset({'a', 'b'}), frozenset({'c', 'd', 'e', 'f'}))}
# get SORTED bipartitions
biparts_2 = set(tree_2.iter_bipartitions(type=frozenset, sort=True))
biparts_2
{(frozenset({'c', 'd'}), frozenset({'a', 'b', 'e', 'f'})), (frozenset({'e', 'f'}), frozenset({'a', 'b', 'c', 'd'})), (frozenset({'a', 'b'}), frozenset({'c', 'd', 'e', 'f'}))}
# get SORTED bipartitions
biparts_3 = set(tree_3.iter_bipartitions(type=frozenset, sort=True))
biparts_3
{(frozenset({'e', 'f'}), frozenset({'a', 'b', 'c', 'd'})), (frozenset({'b', 'd'}), frozenset({'a', 'c', 'e', 'f'})), (frozenset({'a', 'c'}), frozenset({'b', 'd', 'e', 'f'}))}
shared biparts¶
# get bipartitions shared by both trees
biparts_1.intersection(biparts_2)
{(frozenset({'c', 'd'}), frozenset({'a', 'b', 'e', 'f'})), (frozenset({'e', 'f'}), frozenset({'a', 'b', 'c', 'd'})), (frozenset({'a', 'b'}), frozenset({'c', 'd', 'e', 'f'}))}
# get bipartitions shared by both trees
biparts_1.intersection(biparts_3)
{(frozenset({'e', 'f'}), frozenset({'a', 'b', 'c', 'd'}))}
# get bipartitions shared by both trees
biparts_2.intersection(biparts_3)
{(frozenset({'e', 'f'}), frozenset({'a', 'b', 'c', 'd'}))}
differences¶
# bipartitions in set1 not in set2
biparts_1 - biparts_2
set()
# bipartitions in set1 not in set3
biparts_1 - biparts_3
{(frozenset({'c', 'd'}), frozenset({'a', 'b', 'e', 'f'})), (frozenset({'a', 'b'}), frozenset({'c', 'd', 'e', 'f'}))}
# all bipartitions not shared by both trees
biparts_1.symmetric_difference(biparts_3)
{(frozenset({'b', 'd'}), frozenset({'a', 'c', 'e', 'f'})), (frozenset({'a', 'c'}), frozenset({'b', 'd', 'e', 'f'})), (frozenset({'a', 'b'}), frozenset({'c', 'd', 'e', 'f'})), (frozenset({'c', 'd'}), frozenset({'a', 'b', 'e', 'f'}))}