Phylogenetic signal¶
Several metrics are available to measure "phylogenetic signal" in a trait value among the tips of a phylogeny. These metrics test the extent to which traits exhibit phylogenetic inertia, such that their values can be explained by a random walk (Brownian motion model of evolution) along the edges of a tree. A trait exhibiting low phylogenetic signal is poorly explained by the this model and the specified tree, whereas a trait exhibiting high phylogenetic signal fits well to this model of evolution on the tree.
Here we show how to measure Blomberg's K and Pagel's lambda.
import toytree
import numpy as np
Example data¶
Let's simulate some test data. We generate a random tree with uniform internal edge lengths and then simulate one continuous trait ("t0") under the Brownian motion (BM) model. In addition, we generate an array to represent standard error ("se") in the trait value, drawn as random uniform values. We expect "t0" to exhibit phylogenetic signal, while "se" will not. We will also test whether trait "t0" exhibits phylogenetic signal while taking into account the standard error within species.
# generate a random tree
tree = toytree.rtree.unittree(ntips=60, treeheight=1.0, seed=123)
# generate a trait value under BM model and store to tree
traits = tree.pcm.simulate_continuous_bm(rates={"trait": 1.0}, seed=123, tips_only=True)
# generate a random uniform value as measurement error
traits["se"] = np.random.default_rng(seed=123).uniform(0, 1e-1, size=tree.ntips)
# show the first few values
traits.head(10)
trait | se | |
---|---|---|
0 | -0.593520 | 0.068235 |
1 | -0.736497 | 0.005382 |
2 | 0.737992 | 0.022036 |
3 | -0.485426 | 0.018437 |
4 | -1.606551 | 0.017591 |
5 | -0.319333 | 0.081209 |
6 | -1.023654 | 0.092334 |
7 | 0.905526 | 0.027657 |
8 | 0.714016 | 0.081975 |
9 | 0.110878 | 0.088989 |
Let's visualize the trait and se.
# draw tree with extra space reserved to the right
canvas, axes, mark = tree.draw(layout='d', height=350, label="Example tree w/ trait and se");
# draw traits color mapped
colors = toytree.style.get_color_mapped_values(traits["trait"], "Greys")
tree.annotate.add_tip_markers(axes, marker="s", yshift=45, color=colors);
colors = toytree.style.get_color_mapped_values(traits["se"], "Greys")
tree.annotate.add_tip_markers(axes, marker="s", yshift=60, color=colors);
Blomberg's K¶
Blomberg's K (Blomberg et al. 2003) is used to quantify phylogenetic signal relative in trait evolution relative to a Brownian motion model. Values of K>1 indicate samples are less similar than expected, whereas K<1 indicates that they are more similar than expected. Permutations can be used to perform a significance test.
Example¶
As an example, when K is calculated for the trait "t0" that was simulated under a model of Brownian motion we recover a K statistic close to 1.0. By contrast, when K is calculated for "se" which is composed of uniform random values we get a much lower K. We can say that the phylogenetic signal in "t0" is greater than for "se". However, the K value alone does not yet tell us whether this is different from a random expectation given the data.
# measure K for BM trait 't0'
toytree.pcm.phylogenetic_signal_k(tree, traits["trait"], nsims=0)
{'K': 0.813631398963683, 'P-value': nan, 'permutations': nan}
# measure K for non-BM trait 't2'
toytree.pcm.phylogenetic_signal_k(tree, traits["se"], nsims=0)
{'K': 0.48043536702811224, 'P-value': nan, 'permutations': nan}
Significance test¶
We can perform a permutation test to calculate the probability that the phylogenetic signal in a trait value is greater than expected by chance given the tree and variance in the trait data. This shuffles the trait values among the tips and recalculates K many times. The P-value represents the number of permutations that generate a K value with as much phylogenetic signal as the original trait data. A P-value < 0.05 is typically considered significance evidence of phylogenetic signal. The default option is to perform 1000 permutations to calculate P.
# measure K and perform significance test
toytree.pcm.phylogenetic_signal_k(tree, traits["trait"], nsims=1000)
{'K': 0.813631398963683, 'P-value': 0.001, 'permutations': 1000}
# measure K and perform significance test
toytree.pcm.phylogenetic_signal_k(tree, traits["se"], nsims=1000)
{'K': 0.48043536702811224, 'P-value': 0.202, 'permutations': 1000}
Measurement error¶
If a trait is measured from many individuals then you can measure both its mean and standard error, and the latter can be taken into account when calculating phylogenetic signal (Ives et al. 2007). Here a model is fit to estimate the Brownian rate parameter ($\sigma^2$), which is also returned along with the log-likelihood.
# measure K for trait w/ standard error and perform significance test
toytree.pcm.phylogenetic_signal_k(tree, data=traits["trait"], error=traits["se"], nsims=1000)
{'K': 0.8168893422998975, 'P-value': 0.0, 'permutations': 1000, 'log-likelihood': -30.497738879932697, 'sig2': 0.2952762091848135, 'convergence': True}
Pagel's λ¶
TODO: Pagel's lambda ...
TODO: Link to lambda transformation method.
Example¶
The method phylogenetic_signal_lambda
estimates the optimal lambda transformation of the tree to fit the data. A significance test is performed by comparing the log-likelihood of the fit model to the log-likelihood of a model with lambda=0. A likelihood ratio test...
toytree.pcm.phylogenetic_signal_lambda(tree, traits["trait"], intervals=20)
{'lambda': 1.0819649326447975, 'P-value': 7.004273033171038e-07, 'LR_test': 24.613953274584404, 'log-likelihood_λ': 26.40277320446915, 'log-likelihood_λ0': 38.70974984176135}
Measurement error¶
If a trait is measured from many individuals then you can represent its value as both a mean and standard error, and the standard error can be taken into account when calculating lambda.
toytree.pcm.phylogenetic_signal_lambda(tree, traits["trait"], error=traits["se"])
{'lambda': 1.0833333333323334, 'P-value': 2.187112884895932e-06, 'LR_test': 22.42324190223958, 'log-likelihood_λ': 27.61266023558629, 'log-likelihood_λ0': 38.82428118670608, 'sig2': 0.3301921997179848}
Multivariate K¶
Adams...
# TODO...