Simulate continuous traits¶
Simulate one or more continuous traits under one or more models of trait evolution.
import toytree
# tree used in examples
tree = toytree.rtree.unittree(ntips=6, treeheight=1.0, seed=123)
Brownian motion¶
The amount of change in a continuous trait over a given time interval can be modeled under Brownian motion as the result of a random walk. At each time step the value changes by an amount randomly sampled from a normal distribution with mean=0 and variance described by an evolutionary rate parameter, $\sigma^2$. To model the change over an interval of time (e.g. a branch length) a random value is sampled from a normal distribution with mean=0 and variance as the product of the branch length and rate parameter ($\sigma^2 t$).
simulate_continuous_bm¶
Simulated traits are labeled t0-tN for N traits, unless the rates
arg is entered as a mapping (e.g., dict) in which case traits can be given custom names. By default, simulated data are returned as a pandas DataFrame, however, you can alternatively use the argument inplace=True
to store simulated traits to Node objects of the input tree. These options and others are demonstrated below. The simulate_continuous_bm
function is available from both the module-level and object-level APIs.
# call from the module-level API
trait = toytree.pcm.simulate_continuous_bm(tree, rates=1.0)
# call from the tree-level API (equivalent to above)
trait = tree.pcm.simulate_continuous_bm(rates=1.0);
rates¶
The rates takes one or more $\sigma^2$ evolutionary rate parameters. Note that the variance in a trait over the length of a branch is a product of the rate parameter and branch length, and thus you should take into account the branch length units of your tree when selecting rate parameters.
# simulate one trait on the tree
toytree.pcm.simulate_continuous_bm(tree, rates=1.0)
t0 | |
---|---|
0 | -0.123130 |
1 | -0.367562 |
2 | 0.037406 |
3 | 0.361301 |
4 | -1.829936 |
5 | 1.160787 |
6 | -0.230188 |
7 | 0.453950 |
8 | -0.152855 |
9 | -0.488458 |
10 | 0.000000 |
# simulate three traits with different sigma2 params
toytree.pcm.simulate_continuous_bm(tree, rates=[1.0, 2.0, 3.0])
t0 | t1 | t2 | |
---|---|---|---|
0 | 0.327413 | 0.201442 | -1.901418 |
1 | -0.311619 | -0.558166 | -0.000391 |
2 | 0.008609 | 0.706444 | 1.415715 |
3 | -0.359562 | 0.846861 | 0.846022 |
4 | -0.169758 | 1.168309 | 2.318473 |
5 | -1.912181 | 1.103331 | 1.432499 |
6 | -0.029991 | -0.339502 | -0.845061 |
7 | -0.165949 | 0.614321 | 0.461416 |
8 | -0.266276 | -0.009451 | -0.513884 |
9 | -0.905512 | 0.444233 | 0.939258 |
10 | 0.000000 | 0.000000 | 0.000000 |
# use a dict to assign custom names to traits
toytree.pcm.simulate_continuous_bm(tree, rates={"size": 1.0, "speed": 5.0})
size | speed | |
---|---|---|
0 | 0.191103 | -4.119331 |
1 | 0.273508 | -1.693085 |
2 | -0.397871 | -4.093221 |
3 | -0.500289 | -1.497661 |
4 | -1.056051 | 2.326184 |
5 | -1.093082 | 0.777344 |
6 | 0.370993 | -1.802604 |
7 | -0.691834 | -4.372475 |
8 | 0.032350 | -2.116944 |
9 | -0.668472 | 1.512406 |
10 | 0.000000 | 0.000000 |
tips_only¶
The data simulated above includes a trait value for every node in the tree, including internal nodes. However, in many cases we may be only interested in the traits at the tips of the tree. The argument tips_only
will return on the simulated values for the tip nodes. (Note that the simulation process requires generating values for internal nodes, so you are effectively discarding that information when using this option, but it can be useful to keep things tidy).
# simulate traits and store only for the tips
toytree.pcm.simulate_continuous_bm(tree, rates=[1.0], tips_only=True)
t0 | |
---|---|
0 | 1.437384 |
1 | 1.015807 |
2 | 0.019017 |
3 | 0.255915 |
4 | 0.701643 |
5 | 0.239817 |
root_states¶
You can set the root state for one or more simulated traits using the option root_states
. The default root_state is 0. You can see this in the first few simulations above where the root node (node 10) has a value of 0.0 for each trait. Below we simulate the same tree traits but with different starting (root) values.
# simulate three traits with different sigma2 params
toytree.pcm.simulate_continuous_bm(tree, rates=[1.0, 2.0, 3.0], root_states=[10, 12, 50])
t0 | t1 | t2 | |
---|---|---|---|
0 | 9.234072 | 13.078664 | 53.680268 |
1 | 9.809092 | 12.646147 | 53.121963 |
2 | 9.489151 | 12.000104 | 53.999656 |
3 | 9.893895 | 11.322589 | 51.947425 |
4 | 11.145741 | 13.803036 | 53.737769 |
5 | 9.708878 | 12.179884 | 50.373467 |
6 | 9.940027 | 12.584615 | 52.620478 |
7 | 9.271315 | 11.285911 | 52.907048 |
8 | 9.511940 | 12.815645 | 51.751943 |
9 | 10.059270 | 12.185471 | 51.298720 |
10 | 10.000000 | 12.000000 | 50.000000 |
inplace¶
By default the simulate data are returned in a pandas DataFrame where the index corresponds to the numeric idx labels of Nodes in the tree. Alternatively, you can use inplace=True
to store the simulated traits as one or more features saved to Nodes of the tree.
# save simulated traits to the ToyTree
toytree.pcm.simulate_continuous_bm(tree, rates=[1.0, 2.0], inplace=True)
# fetch simulated trait feature data from the tree
tree.get_node_data(["t0", "t1"])
t0 | t1 | |
---|---|---|
0 | -0.130727 | -1.141893 |
1 | 0.301832 | -0.302874 |
2 | -0.624571 | -1.659822 |
3 | -0.820687 | -1.623450 |
4 | -0.367290 | 0.166104 |
5 | -0.136680 | 1.225382 |
6 | 0.002804 | -0.984722 |
7 | -0.576153 | -1.148334 |
8 | -0.495188 | -0.721662 |
9 | -0.593774 | -0.603338 |
10 | 0.000000 | 0.000000 |
One motivation for this option is that it makes it very easy to visualize the traits on a tree drawing, where you can select the traits by name rather than entering in the trait variable. Here we use color mapping to draw node colors scaled to the Greys colormap.
# draw the tree and show trait t0 values
tree.draw(node_sizes=10, node_colors=("t0", "Greys"), node_mask=False, label="trait 't0'");
Multivariate Brownian motion¶
To simulate traits with correlated evolution you can enter a variance-covariance matrix for the rates
option. This can be be a list of lists, numpy array, or pandas DataFrame.
# TODO