Phylogenetic signal¶
Several metrics are available to measure "phylogenetic signal" in a trait value among the tips of a phylogeny. These metrics test the extent to which traits exhibit phylogenetic inertia, such that their values can be explained by a random walk (Brownian motion model of evolution) along the edges of a tree. A trait exhibiting low phylogenetic signal is poorly explained by the this model and the specified tree, whereas a trait exhibiting high phylogenetic signal fits well to this model of evolution on the tree.
import toytree
import numpy as np
Example dataset¶
Let's simulate some test data. We generate a random 20 tip tree with uniform internal edge lengths and then simulate two traits ("t0" and "t1") on the tree under the Brownian motion (BM) model, and another trait ("t2") is generated by assigning random uniform values to tips of the tree (i.e., it is not simulated under a BM model on the tree). We should expect the first two traits to exhibit greater phylogenetic signal that the latter trait.
# generate a random tree
tree = toytree.rtree.unittree(ntips=20, treeheight=10.0, seed=123)
# generate two traits under BM model and store to tree
tree.pcm.simulate_continuous_bm(rates=[5.5, 0.05], seed=123, tips_only=True, inplace=True)
# generate a random uniform trait (i.e., not evolved under BM)
uniform = np.random.default_rng(seed=123).uniform(-50, 50, size=tree.ntips)
tree.set_node_data("t2", {i: j for (i, j) in enumerate(uniform)}, inplace=True)
# show all feature data for tip nodes
tree.get_node_data()[:tree.ntips]
idx | name | height | dist | support | t0 | t1 | t2 | |
---|---|---|---|---|---|---|---|---|
0 | 0 | r0 | 0.0 | 7.0 | NaN | 77.160413 | -0.245377 | 18.235186 |
1 | 1 | r1 | 0.0 | 6.0 | NaN | 40.971918 | 0.305800 | -44.617898 |
2 | 2 | r2 | 0.0 | 6.0 | NaN | 2.540054 | -0.040868 | -27.964013 |
3 | 3 | r3 | 0.0 | 6.0 | NaN | 28.549970 | -0.207394 | -31.562819 |
4 | 4 | r4 | 0.0 | 3.0 | NaN | 19.181300 | -0.070625 | -32.409410 |
5 | 5 | r5 | 0.0 | 2.0 | NaN | -1.701751 | 0.172896 | 31.209451 |
6 | 6 | r6 | 0.0 | 1.0 | NaN | 10.366843 | -0.048158 | 42.334500 |
7 | 7 | r7 | 0.0 | 1.0 | NaN | 9.120536 | -0.053057 | -22.342560 |
8 | 8 | r8 | 0.0 | 3.0 | NaN | 16.935370 | 0.186425 | 31.975456 |
9 | 9 | r9 | 0.0 | 2.0 | NaN | 8.033774 | -0.117348 | 38.989269 |
10 | 10 | r10 | 0.0 | 2.0 | NaN | 12.220348 | -0.026871 | 1.297046 |
11 | 11 | r11 | 0.0 | 5.0 | NaN | 17.805914 | -0.045131 | -25.503540 |
12 | 12 | r12 | 0.0 | 7.0 | NaN | 11.572519 | -0.033040 | 32.424160 |
13 | 13 | r13 | 0.0 | 8.0 | NaN | 99.549183 | -0.379688 | -28.623704 |
14 | 14 | r14 | 0.0 | 8.0 | NaN | -3.906753 | -0.249530 | 24.146705 |
15 | 15 | r15 | 0.0 | 7.0 | NaN | -19.235036 | -0.203506 | 12.994020 |
16 | 16 | r16 | 0.0 | 6.0 | NaN | 50.392113 | 0.000320 | 42.740726 |
17 | 17 | r17 | 0.0 | 6.0 | NaN | 27.232351 | -0.331218 | -26.809181 |
18 | 18 | r18 | 0.0 | 8.0 | NaN | 45.349668 | -0.125299 | 29.912513 |
19 | 19 | r19 | 0.0 | 9.0 | NaN | 37.123951 | 0.774358 | 1.816504 |
# draw tree with extra space reserved to the right
canvas, axes, mark = tree.draw(shrink=50, label="example tree w/ 3 traits");
# draw traits color mapped on the same scale (-50, 50)
tree.annotate.add_tip_markers(axes, marker="s", xshift=45, color=("t0", "Greys"))
tree.annotate.add_tip_markers(axes, marker="s", xshift=60, color=("t1", "Greys"));
tree.annotate.add_tip_markers(axes, marker="s", xshift=75, color=("t2", "Greys"));
Blomberg's K¶
Blomberg's K (Blomberg et al. 2003) is used to quantify phylogenetic signal relative in trait evolution relative to a Brownian motion model. Values of K>1 indicate samples are less similar than expected, whereas K<1 indicates that they are more similar than expected. Permutations can be used to perform a significance test.
Examples¶
As an example, when K is calculated for the trait "t0" that was simulated under a model of Brownian motion we recover a K statistic very close to 1.0. By contrast, when K is calculated for trait "t2" which has uniform random values assigned to the tips, we get a K of 0.4. From this we can say that the phylogenetic signal in "t0" is greater than that of "t2". However, we don't yet know how much we expect either trait to deviate from the null expectation (K=1) by chance. This deviation depends on the variance in the trait value and on the shape and size of the tree, thus we use a permutation approach below to perform a significance test.
# measure K for BM trait 't0'
toytree.pcm.phylogenetic_signal_k(tree, "t0")
{'K': 1.0177977520950325, 'P-value': nan, 'permutations': nan}
# measure K for non-BM trait 't2'
toytree.pcm.phylogenetic_signal_k(tree, "t2")
{'K': 0.4059842428938925, 'P-value': nan, 'permutations': nan}
Significance test¶
We can perform a permutation test to calculate the probability that the phylogenetic signal in a trait value is greater than expected by chance given the tree and variance in the trait data. This shuffles the trait values among the tips and recalculates K many times. The P-value represents the number of permutations that generate a K value with as much phylogenetic signal as the original trait data. A P-value < 0.05 is typically considered significance evidence of phylogenetic signal.
# measure K and perform significance test
toytree.pcm.phylogenetic_signal_k(tree, "t0", test=True)
{'K': 1.0177977520950325, 'P-value': 0.007, 'permutations': 1000}
# measure K and perform significance test
toytree.pcm.phylogenetic_signal_k(tree, "t2", test=True)
{'K': 0.4059842428938925, 'P-value': 0.868, 'permutations': 1000}
Variation (Std. Err.)¶
If a trait is measured from many individuals then you can represent its value as both a mean and standard error, and the variation can be taken into account when calculating K (Ives et al. 2007). Here a model is fit to estimate the Brownian rate parameter ($\sigma^2$), which is also returned along with the log-likelihood.
# measure K for trait w/ standard error and perform significance test
toytree.pcm.phylogenetic_signal_k(tree, data="t0", error="t1", test=True)
{'K': 1.0178023908892584, 'P-value': 0.011, 'permutations': 1000, 'log-likelihood': -92.34654755664657, 'sig2': 94.18633873360999, 'convergence': True}
Multivariate K¶
Adams...
# TODO...
Lambda¶
# TODO...