Simulate continuous traits¶

Simulate one or more continuous traits under one or more models of trait evolution.

In [1]:

Copied!

import toytree
import toytree

In [2]:

Copied!

# tree used in examples
tree = toytree.rtree.unittree(ntips=6, treeheight=1.0, seed=123)
# tree used in examples
tree = toytree.rtree.unittree(ntips=6, treeheight=1.0, seed=123)

Brownian motion¶

The amount of change in a continuous trait over a given time interval can be modeled under Brownian motion as the result of a random walk. At each time step the value changes by an amount randomly sampled from a normal distribution with mean=0 and variance described by an evolutionary rate parameter, $\sigma^2$. To model the change over an interval of time (e.g. a branch length) a random value is sampled from a normal distribution with mean=0 and variance as the product of the branch length and rate parameter ($\sigma^2 t$).

simulate_continuous_bm¶

Simulated traits are labeled t0-tN for N traits, unless the rates arg is entered as a mapping (e.g., dict) in which case traits can be given custom names. By default, simulated data are returned as a pandas DataFrame, however, you can alternatively use the argument inplace=True to store simulated traits to Node objects of the input tree. These options and others are demonstrated below. The simulate_continuous_bm function is available from both the module-level and object-level APIs.

In [3]:

Copied!

# call from the module-level API
trait = toytree.pcm.simulate_continuous_bm(tree, rates=1.0)

# call from the tree-level API (equivalent to above)
trait = tree.pcm.simulate_continuous_bm(rates=1.0);
# call from the module-level API
trait = toytree.pcm.simulate_continuous_bm(tree, rates=1.0)

# call from the tree-level API (equivalent to above)
trait = tree.pcm.simulate_continuous_bm(rates=1.0);

rates¶

The rates takes one or more $\sigma^2$ evolutionary rate parameters. Note that the variance in a trait over the length of a branch is a product of the rate parameter and branch length, and thus you should take into account the branch length units of your tree when selecting rate parameters.

In [4]:

Copied!

# simulate one trait on the tree
toytree.pcm.simulate_continuous_bm(tree, rates=1.0)
# simulate one trait on the tree
toytree.pcm.simulate_continuous_bm(tree, rates=1.0)

Out[4]:

	t0
0	-0.123130
1	-0.367562
2	0.037406
3	0.361301
4	-1.829936
5	1.160787
6	-0.230188
7	0.453950
8	-0.152855
9	-0.488458
10	0.000000

In [5]:

Copied!

# simulate three traits with different sigma2 params
toytree.pcm.simulate_continuous_bm(tree, rates=[1.0, 2.0, 3.0])
# simulate three traits with different sigma2 params
toytree.pcm.simulate_continuous_bm(tree, rates=[1.0, 2.0, 3.0])

Out[5]:

	t0	t1	t2
0	0.327413	0.201442	-1.901418
1	-0.311619	-0.558166	-0.000391
2	0.008609	0.706444	1.415715
3	-0.359562	0.846861	0.846022
4	-0.169758	1.168309	2.318473
5	-1.912181	1.103331	1.432499
6	-0.029991	-0.339502	-0.845061
7	-0.165949	0.614321	0.461416
8	-0.266276	-0.009451	-0.513884
9	-0.905512	0.444233	0.939258
10	0.000000	0.000000	0.000000

In [6]:

Copied!

# use a dict to assign custom names to traits
toytree.pcm.simulate_continuous_bm(tree, rates={"size": 1.0, "speed": 5.0})
# use a dict to assign custom names to traits
toytree.pcm.simulate_continuous_bm(tree, rates={"size": 1.0, "speed": 5.0})

Out[6]:

	size	speed
0	0.191103	-4.119331
1	0.273508	-1.693085
2	-0.397871	-4.093221
3	-0.500289	-1.497661
4	-1.056051	2.326184
5	-1.093082	0.777344
6	0.370993	-1.802604
7	-0.691834	-4.372475
8	0.032350	-2.116944
9	-0.668472	1.512406
10	0.000000	0.000000

tips_only¶

The data simulated above includes a trait value for every node in the tree, including internal nodes. However, in many cases we may be only interested in the traits at the tips of the tree. The argument tips_only will return on the simulated values for the tip nodes. (Note that the simulation process requires generating values for internal nodes, so you are effectively discarding that information when using this option, but it can be useful to keep things tidy).

In [7]:

Copied!

# simulate traits and store only for the tips
toytree.pcm.simulate_continuous_bm(tree, rates=[1.0], tips_only=True)
# simulate traits and store only for the tips
toytree.pcm.simulate_continuous_bm(tree, rates=[1.0], tips_only=True)

Out[7]:

	t0
0	1.437384
1	1.015807
2	0.019017
3	0.255915
4	0.701643
5	0.239817

root_states¶

You can set the root state for one or more simulated traits using the option root_states. The default root_state is 0. You can see this in the first few simulations above where the root node (node 10) has a value of 0.0 for each trait. Below we simulate the same tree traits but with different starting (root) values.

In [8]:

Copied!

# simulate three traits with different sigma2 params
toytree.pcm.simulate_continuous_bm(tree, rates=[1.0, 2.0, 3.0], root_states=[10, 12, 50])
# simulate three traits with different sigma2 params
toytree.pcm.simulate_continuous_bm(tree, rates=[1.0, 2.0, 3.0], root_states=[10, 12, 50])

Out[8]:

	t0	t1	t2
0	9.234072	13.078664	53.680268
1	9.809092	12.646147	53.121963
2	9.489151	12.000104	53.999656
3	9.893895	11.322589	51.947425
4	11.145741	13.803036	53.737769
5	9.708878	12.179884	50.373467
6	9.940027	12.584615	52.620478
7	9.271315	11.285911	52.907048
8	9.511940	12.815645	51.751943
9	10.059270	12.185471	51.298720
10	10.000000	12.000000	50.000000

inplace¶

By default the simulate data are returned in a pandas DataFrame where the index corresponds to the numeric idx labels of Nodes in the tree. Alternatively, you can use inplace=True to store the simulated traits as one or more features saved to Nodes of the tree.

In [9]:

Copied!

# save simulated traits to the ToyTree
toytree.pcm.simulate_continuous_bm(tree, rates=[1.0, 2.0], inplace=True)

# fetch simulated trait feature data from the tree
tree.get_node_data(["t0", "t1"])
# save simulated traits to the ToyTree
toytree.pcm.simulate_continuous_bm(tree, rates=[1.0, 2.0], inplace=True)

# fetch simulated trait feature data from the tree
tree.get_node_data(["t0", "t1"])

Out[9]:

	t0	t1
0	-0.130727	-1.141893
1	0.301832	-0.302874
2	-0.624571	-1.659822
3	-0.820687	-1.623450
4	-0.367290	0.166104
5	-0.136680	1.225382
6	0.002804	-0.984722
7	-0.576153	-1.148334
8	-0.495188	-0.721662
9	-0.593774	-0.603338
10	0.000000	0.000000

One motivation for this option is that it makes it very easy to visualize the traits on a tree drawing, where you can select the traits by name rather than entering in the trait variable. Here we use color mapping to draw node colors scaled to the Greys colormap.

In [10]:

Copied!

# draw the tree and show trait t0 values 
tree.draw(node_sizes=10, node_colors=("t0", "Greys"), node_mask=False, label="trait 't0'");
# draw the tree and show trait t0 values 
tree.draw(node_sizes=10, node_colors=("t0", "Greys"), node_mask=False, label="trait 't0'");

Multivariate Brownian motion¶

To simulate traits with correlated evolution you can enter a variance-covariance matrix for the rates option. This can be be a list of lists, numpy array, or pandas DataFrame.

In [11]:

Copied!

# TODO
# TODO

In [ ]: