tree distances
Tree-level dissimilarities¶
In order to quantify the difference between two trees, these methods decompose trees into sets of bipartitions or quartets and measure differences based on these sets. In order to quickly show an overview of the different distance scores, use get_treedist_quartets
. This overview shows all tree distances based on quartet metrics where:
$Q =$ Total possible quartets
$S =$ Resolved in the same way between the two trees
$D =$ Resolved differently between the two trees
$R1 =$ Unresolved in tree 1, resolved in tree 2
$R2 =$ Unresolved in tree 2, resolved in tree 1
$U =$ Unresolved in both trees
$N = S + D + R1 + R2 + U$
with arguments (tree1, tree2, similarity=False). When similarity=True, scores are shown as similarity scores (1-distance)
Using these metrics, get_treedist_quartets
also shows a list of calculated scores. Descriptions of these scores can be found in the paper below:
Estabrook GF, McMorris FR, Meacham CA (1985). “Comparison of undirected phylogenetic trees based on subtrees of four evolutionary units.” Systematic Zoology, 34(2), 193--200. doi:10.2307/2413326 .
import toytree
tree1 = toytree.rtree.rtree(ntips=10, seed=123)
tree2 = toytree.rtree.rtree(ntips=10, seed=321)
tree1.draw('s')
tree2.draw('s')
toytree.distance.get_treedist_quartets(tree1, tree2)
Q 210.000000 S 107.000000 D 103.000000 U 0.000000 R1 0.000000 R2 0.000000 N 210.000000 do_not_conflict 0.490476 explicitly_agree 0.490476 strict_joint_assertions 0.490476 semistrict_joint_assertions 0.490476 steel_and_penny 0.490476 symmetric_difference 0.490476 symmetric_divergence 0.019048 similarity_to_reference 0.490476 marczewski_steinhaus 0.658147 dtype: float64
Note: For reference, these two trees will be used for the rest of this notebook's examples.
Robinson-Foulds distance¶
The Robinson-Foulds (RF) distance is a metric that measures the normalized* count of bipartitions induced by one tree, but not the other tree. In other words, it is the symmmetric difference between two bipart sets divided by the total number of bipartitions in both sets. ___Larger_ values indicate that the two trees are more different__
*To show the normalized score, use normalize=True
normalized = toytree.distance.get_treedist_rf(tree1, tree2, normalize=True)
default = toytree.distance.get_treedist_rf(tree1, tree2)
print(' normalized: ', normalized, '\n','default: ',default)
normalized: 0.8571428571428571 default: 12
Information-corrected Robinson-Foulds distance¶
The information-corrected Robinson-Foulds distance (RFI) measures the sum of the phylogenetic information
of edges taht are different between two trees. Information
is calculated as the probability that a randomly sampled binary tree of the same size contains the split. Splits that contain less information (e.g.m a cherry vs a deep split) are more likely to arise by chance, and thus contribute less to the metric.
normalize=True
normalizes the score relative to the sum of phylogenetic information present in both subtrees.
normalized = toytree.distance.get_treedist_rfi(tree1, tree2, normalize=True)
default = toytree.distance.get_treedist_rfi(tree1, tree2)
print(' normalized: ', normalized, '\n','default: ',default)
normalized: 0.8944865320126851 default: 66.2410417642415
Generalized Robinson-Foulds Matching Split distance¶
normalized = toytree.distance.get_treedist_rfg_ms(tree1, tree2, normalize=True)
default = toytree.distance.get_treedist_rfg_ms(tree1, tree2, normalize=False)
print(' normalized: ', normalized, '\n','default: ',default)
⚠️ toytree | treedist_utils:get_trees_matching_split_dist | no normalization method for matching split distance.
normalized: 15.0 default: 15.0
# toytree.distance.get_treedist_rfg_mci(tree1, tree2)
# toytree.distance.get_treedist_rfg_msi(tree1, tree2)
# toytree.distance.get_treedist_rfg_spi(tree1, tree2)
Matching Split Information Distance¶
normalized = toytree.distance.get_treedist_rfg_msi(tree1, tree2, normalize=True)
default = toytree.distance.get_treedist_rfg_msi(tree1, tree2)
print(' normalized: ', normalized, '\n','default: ',default)
normalized: 0.610108956007992 default: 0.610108956007992
Generalized Robinson-Foulds Distance based on Shared Phylogenetic Infomration¶
normalized = toytree.distance.get_treedist_rfg_spi(tree1, tree2, normalize=True)
default = toytree.distance.get_treedist_rfg_spi(tree1, tree2)
print(' normalized: ', normalized, '\n','default: ',default)
normalized: 0.7141801751537229 default: 52.88848642930732
C:\Users\natet\Desktop\eatonlab\toytree_NT\toytree\distance\_src\treedist_utils.py:208: RuntimeWarning: divide by zero encountered in log2 return -np.log2(_get_phylo_prob_two_splits(ntips, size_a1, size_a2))
Generalized Robinson-Foulds Distance based on Mutual Clustering Information¶
normalized = toytree.distance.get_treedist_rfg_spi(tree1, tree2, normalize=True)
default = toytree.distance.get_treedist_rfg_spi(tree1, tree2)
print(' normalized: ', normalized, '\n','default: ',default)
normalized: 0.7141801751537229 default: 52.88848642930732