Distance & Dissimilarity Functions¶
The toytree .distance subpackage has two main purposes: (1) to provide the user with efficient methods to measure or describe paths between nodes in a tree, and (2) to provide many methods of describing dissimilarities between two trees. All dissimilarity metrics currently implemented are quantified by quartet and bipartition differences, which are explained in tree distances.
Node-level distances¶
The functions provided to study node-level distances are generally provided as get_
and iter_
functions. get_
functions return paths or distances as tuples, dictionaries, or matrices while iter_
functions are iterable generators. All currently implemented node-level distance functions are shown with examples below.
Distances can generally be described by patristic distance
(default), or the sum of the lengths of edges in the shortest path between two nodes, or by toplogical distance
, or simply the number of edges separating two nodes. For topological distance, use toplogy_only=True
Node Paths¶
import toytree
#generate random topology with 16 tips
tree = toytree.rtree.rtree(ntips=16)
#draw to show all internal nodes
tree.draw(ts = 's', tip_labels = False, node_labels = 'idx');
get_node_path
returns a list of Nodes connecting two queried Nodes of a tree(including at ends).
toytree.distance.get_node_path(tree, 15, 0)
(<Node(idx=15, name='r15')>, <Node(idx=29)>, <Node(idx=30)>, <Node(idx=28)>, <Node(idx=23)>, <Node(idx=21)>, <Node(idx=18)>, <Node(idx=17)>, <Node(idx=16)>, <Node(idx=0, name='r0')>)
And iter_node_path
is the iterative generator version.
from toytree.distance import iter_node_path
for node in iter_node_path(tree, 15, 0):
print(node.idx)
15 29 30 28 23 21 18 17 16 0
Node Distances¶
#Newick string generated in R with phylomaker_v2
newick = "(((Sambucus_nigra:112.340729,(Arctostaphylos_viscida:1.761115,Arctostaphylos_patula:1.761115):110.579613)mrcaott248ott650:11.393508,((Lupinus_sparsiflorus:112.701196,(((Ceanothus_leucodermis:4.464401,Ceanothus_cuneatus:4.464401):46.93409,(Frangula_rubra:10.957388,Rhamnus_ilicifolia:10.957388):40.441103):59.749516,(Quercus_douglasii:11.776698,Quercus_wislizeni:11.776699):99.371309)mrcaott371ott2511:1.553188)mrcaott371ott579:5.877408,Aesculus_californica:118.578604)mrcaott2ott96:5.155633)Pentapetalae:201.315791,Pinus_sabiniana:325.050028)Spermatophyta;"
#generate ToyTree from Newick string
tree = toytree.tree(newick)
tree.draw('s');
Yi Jin, Hong Qian, V.PhyloMaker2: An updated and enlarged R package that can generate very large phylogenies for vascular plants, Plant Diversity, Volume 44, Issue 4, 2022, Pages 335-339, ISSN 2468-2659, https://doi.org/10.1016/j.pld.2022.05.005.
get_node_distance
returns the patristic distance (sum of distances belonging to each edge in shortest path) between two Nodes on a ToyTree.
toytree.distance.get_node_distance(tree, 15, 17)
199.561928
toytree.distance.get_node_distance(tree, 15, 17, topology_only= True)
3
get_node_distance_matrix
returns the pairwise distance matrix for every node in the tree. The user can also use get_internal_node_distance_matrix
and get_tip_distance_matrix
for more specific distance matrices.
A matrix is returned as a np.ndarray with rows and columns ordered by Node int idx labels, or as a pd.DataFrame (df=True
) with row and column names as str Node names for leaf Nodes and idx labels for internal Nodes.
tree.distance.get_internal_node_distance_matrix(df= True, topology_only=True)
12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | |
---|---|---|---|---|---|---|---|---|---|---|---|
12 | 0 | 1 | 7 | 7 | 6 | 6 | 5 | 4 | 3 | 2 | 3 |
13 | 1 | 0 | 6 | 6 | 5 | 5 | 4 | 3 | 2 | 1 | 2 |
14 | 7 | 6 | 0 | 2 | 1 | 3 | 2 | 3 | 4 | 5 | 6 |
15 | 7 | 6 | 2 | 0 | 1 | 3 | 2 | 3 | 4 | 5 | 6 |
16 | 6 | 5 | 1 | 1 | 0 | 2 | 1 | 2 | 3 | 4 | 5 |
17 | 6 | 5 | 3 | 3 | 2 | 0 | 1 | 2 | 3 | 4 | 5 |
18 | 5 | 4 | 2 | 2 | 1 | 1 | 0 | 1 | 2 | 3 | 4 |
19 | 4 | 3 | 3 | 3 | 2 | 2 | 1 | 0 | 1 | 2 | 3 |
20 | 3 | 2 | 4 | 4 | 3 | 3 | 2 | 1 | 0 | 1 | 2 |
21 | 2 | 1 | 5 | 5 | 4 | 4 | 3 | 2 | 1 | 0 | 1 |
22 | 3 | 2 | 6 | 6 | 5 | 5 | 4 | 3 | 2 | 1 | 0 |
get_descendant_dists
returns a dictionary with {Node: dist} pairs of all descendants relative to a queried node. Without a queried node, all descendants/distances are relative to the root node. Values
are generated in "preorder" traversal order (left then right).
An iterable generator iter_descendant_dists
is also provided.
tree.distance.get_descendant_dists(18)
{<Node(idx=18, name='mrcaott371ott2511')>: 0, <Node(idx=16)>: 59.749516, <Node(idx=14)>: 106.683606, <Node(idx=4, name='Ceanothus_leucodermis')>: 111.14800699999999, <Node(idx=5, name='Ceanothus_cuneatus')>: 111.14800699999999, <Node(idx=15)>: 100.190619, <Node(idx=6, name='Frangula_rubra')>: 111.14800699999999, <Node(idx=7, name='Rhamnus_ilicifolia')>: 111.14800699999999, <Node(idx=17)>: 99.371309, <Node(idx=8, name='Quercus_douglasii')>: 111.14800699999999, <Node(idx=9, name='Quercus_wislizeni')>: 111.148008}
get_farthest_node
returns the farthest Node from a selected Node and get_farthest_node_distance
returns the distance between the two.
node = tree.distance.get_farthest_node(11)
dist = tree.distance.get_farthest_node_distance(11)
print(node, dist)
<Node(idx=0, name='Sambucus_nigra')> 650.100056