Inference: Consensus Trees¶
Infer majority-rule consensus trees from a set of gene trees with shared tips, then map edge/node summary statistics back onto a target tree.
This page documents:
toytree.infer.consensus_tree(...)toytree.infer.consensus_features(...)MultiTreewrappers.get_consensus_tree(...)and.get_consensus_features(...)
import toytree
import numpy as np
import pandas as pd
# Unrooted example tree set (shared tip names)
unrooted_newicks = [
"((a:1,b:1):3,(c:3,(d:2,e:2):1):1);",
"((a:1,b:1):3,(c:3,(d:2,e:2):1):1);",
"((a:1,b:1):3,(c:3,(d:2,e:2):1):1);",
"((a:1,b:1):3,(e:2,(c:3,d:2):1):1);",
"((a:1,b:1):3,(d:2,(c:3,e:2):1):1);",
"((d:2,(c:3,e:2):1),(a:1,b:1):3):1;",
]
unrooted_trees = toytree.mtree(unrooted_newicks)
# Rooted ultrametric example tree set
rooted_newicks = [
"((a:1,b:1):3,(c:3,(d:2,e:2):1):1);",
"((a:1,b:1):3,(c:3,(d:2,e:2):1):1);",
"((a:1,b:1):3,(c:3,(d:2,e:2):1):1);",
"((a:1,b:1):3,(e:3,(c:2,d:2):1):1);",
"((a:1,b:1):3,(d:3,(c:2,e:2):1):1);",
"((a:1,b:1):4,(d:3,(c:2,e:2):1):2);",
]
rooted_trees = toytree.mtree(rooted_newicks)
Input Requirements¶
Consensus inference requires that all input trees contain the same set of tip names.
ultrametric=True in feature mapping additionally requires the source trees to be rooted and ultrametric.
# quick check: all unrooted example trees have aligned tip sets
[t.get_tip_labels() for t in unrooted_trees[:2]]
consensus_tree: topology and support¶
min_freq controls the minimum split frequency needed to keep a clade in the consensus.
ctree_00 = toytree.infer.consensus_tree(unrooted_trees, min_freq=0.0)
ctree_00.draw(layout="un", node_labels="support", node_as_edge_data=True, node_sizes=12, width=360, height=320)
ctree_05 = toytree.infer.consensus_tree(unrooted_trees, min_freq=0.5)
ctree_05.draw(layout="un", node_labels="support", node_as_edge_data=True, node_sizes=12, width=360, height=320)
Consensus Output Fields¶
By default, consensus inference stores support and distance summaries on the returned tree.
| Field | Meaning |
|---|---|
support |
Proportion of source trees containing the clade/split |
dist_mean |
Mean mapped edge distance |
dist_median |
Median mapped edge distance |
dist_std |
Standard deviation of mapped edge distance |
dist_min |
Minimum mapped edge distance |
dist_max |
Maximum mapped edge distance |
dist_range |
dist_max - dist_min |
dist_count |
Number of mapped values used for summaries |
ctree_05.get_node_data().head(10)
consensus_features: default mapping¶
By default this maps support + dist summaries from the source tree set onto a target tree.
ftree_default = toytree.infer.consensus_features(ctree_05, unrooted_trees, conditional=False)
ftree_default.draw(layout="un", node_labels="support", node_as_edge_data=True, node_sizes=12, width=360, height=320)
[c for c in ftree_default.get_node_data().columns if c.startswith("dist") or c == "support"]
Additional Features (features=[...])¶
You can summarize extra numeric features (if present) from source trees.
trees_with_rate = [t.copy() for t in unrooted_trees]
for idx, t in enumerate(trees_with_rate, start=1):
node = t.get_mrca_node("a", "b")
t = t.set_node_data("rate", {node.idx: float(idx)}, inplace=True)
ctree_rate = toytree.infer.consensus_tree(trees_with_rate, min_freq=0.5)
ftree_rate = toytree.infer.consensus_features(ctree_rate, trees_with_rate, features=["rate"])
ftree_rate.get_node_data(["idx", "support", "rate_mean", "rate_median", "rate_std", "rate_min", "rate_max", "rate_range", "rate_count"]).dropna(how="all")
Conditional Mapping (conditional=True)¶
Conditional mode changes how some tip/internal summaries are accumulated when matching clades are present.
ftree_uncond = toytree.infer.consensus_features(ctree_05, unrooted_trees, conditional=False)
ftree_cond = toytree.infer.consensus_features(ctree_05, unrooted_trees, conditional=True)
pd.DataFrame({
"tip": [n.name for n in ftree_cond[:ftree_cond.ntips]],
"dist_count_unconditional": [getattr(n, "dist_count", np.nan) for n in ftree_uncond[:ftree_uncond.ntips]],
"dist_count_conditional": [getattr(n, "dist_count", np.nan) for n in ftree_cond[:ftree_cond.ntips]],
})
Ultrametric Mode (ultrametric=True)¶
When source trees are rooted and ultrametric, set ultrametric=True to also summarize node heights.
target_rooted = rooted_trees[0].copy()
ftree_ultra = toytree.infer.consensus_features(target_rooted, rooted_trees, ultrametric=True)
ftree_ultra.draw(node_labels="height_mean", node_as_edge_data=True, node_sizes=10, width=360, height=320)
[c for c in ftree_ultra.get_node_data().columns if c.startswith("height") or c == "support"]
Validation and Error Examples¶
These examples show common input-validation errors.
try:
toytree.infer.consensus_tree(unrooted_trees, min_freq=1.1)
except Exception as err:
print(type(err).__name__, err)
try:
bad = [toytree.tree("((a,b),c);"), toytree.tree("((a,b),d);")]
toytree.infer.consensus_tree(bad)
except Exception as err:
print(type(err).__name__, err)
try:
toytree.infer.consensus_features(ctree_05, unrooted_trees, ultrametric=True)
except Exception as err:
print(type(err).__name__, err)