Tree distance metrics¶
A number of tree distance metrics are implemented in toytree to calculate the difference between two trees based on a number of criteria.
import toytree
# example trees
tree1 = toytree.rtree.unittree(ntips=8, seed=123)
tree2 = toytree.rtree.unittree(ntips=8, seed=987)
toytree.mtree([tree1, tree2]).draw();
Quartet tree distance¶
This returns a pandas.Series
object with many tree distance metrics computed from the quartet set. You can select any individual stat from this Series by name. The arg similarity=True
can be used to report similarity as opposed to dissimilarity scores. The quartet data in the result table includes the following metrics and statistics computed from them:
Q = Total possible quartets
S = Resolved in the same way between the two trees
D = Resolved differently between the two trees
R1 = Unresolved in tree 1, resolved in tree 2
R2 = Unresolved in tree 2, resolved in tree 1
U = Unresolved in both trees
N = S + D + R1 + R2 + U$
Estabrook GF, McMorris FR, Meacham CA (1985). “Comparison of undirected phylogenetic trees based on subtrees of four evolutionary units.” Systematic Zoology, 34(2), 193--200. doi:10.2307/2413326 .
toytree.distance.get_treedist_quartets(tree1, tree2)
Q 70.000000 S 57.000000 D 13.000000 U 0.000000 R1 0.000000 R2 0.000000 N 70.000000 do_not_conflict 0.185714 explicitly_agree 0.185714 strict_joint_assertions 0.185714 semistrict_joint_assertions 0.185714 steel_and_penny 0.185714 symmetric_difference 0.185714 symmetric_divergence 0.628571 similarity_to_reference 0.185714 marczewski_steinhaus 0.313253 dtype: float64
Robinson-Foulds distances¶
RF¶
The Robinson-Foulds (RF) distance is a metric that measures the normalized* count of bipartitions induced by one tree, but not the other tree. In other words, it is the symmmetric difference between two bipart sets divided by the total number of bipartitions in both sets. Larger values indicate that the two trees are more different. *To show the normalized score, use normalize=True
.
toytree.distance.get_treedist_rf(tree1, tree2, normalize=True)
0.4
RFi (information-corrected)¶
The information-corrected Robinson-Foulds distance (RFI) measures the sum of the phylogenetic information
of edges that are different between two trees. Information
is calculated as the probability that a randomly sampled binary tree of the same size contains the split. Splits that contain less information (e.g. a cherry vs a deep split) are more likely to arise by chance, and thus contribute less to the metric. normalize=True
normalizes the score relative to the sum of phylogenetic information present in both subtrees.
toytree.distance.get_treedist_rfi(tree1, tree2, normalize=True)
0.3825066230466303
RFg_ms (matching split)¶
Return the Matching Split Distance.
toytree.distance.get_treedist_rfg_ms(tree1, tree2, normalize=False)
3.0
RFg_msi (matching split info)¶
Return the Matching Split Information Distance.
toytree.distance.get_treedist_rfg_msi(tree1, tree2, normalize=True)
0.2672083416810132
RFg_mci (mutual clustering info)¶
Generalized Robinson-Foulds Distance based on Mutual Clustering Information. This is the recommended tree distance metric according to Smith 2020.
Smith, Martin R. (2020). "Information theoretic Generalized Robinson-Foulds metrics for comparing phylogenetic trees". Bioinformatics. 36 (20): 5007–5013. doi:10.1093/bioinformatics/btaa614.
toytree.distance.get_treedist_rfg_mci(tree1, tree2, normalize=True)
0.2672083416810132
RFg_spi (shared phylo info)¶
Generalized Robinson-Foulds Distance based on Shared Phylogenetic Infomration
# BUGFIX in progress.
# toytree.distance.get_treedist_rfg_spi(tree1, tree2, normalize=True)