sim discrete traits

Simulating Discrete Traits (Markov Model)¶

This notebook demonstrates toytree.pcm.simulate_discrete_trait() for single and multiple replicates, including trait naming and plotting the result on a tree.

Continuous-time Markov models¶

simulate_discrete_trait() uses a continuous-time Markov chain (CTMC) to model transitions among a fixed number of discrete states along each branch. The model is defined by an instantaneous rate matrix Q where each off-diagonal entry q_ij is the rate of change from state i to state j. Diagonal entries are set so each row sums to zero, and transition probabilities over time t are computed as P(t) = expm(Q * t).

Supported model types differ in how Q is parameterized:

ER (equal rates): all off-diagonal rates are the same value. This is the simplest model and assumes every state changes to every other state at the same rate.
SYM (symmetric rates): rates are symmetric (q_ij = q_ji), but different state pairs can have different rates. This allows variation among pairs while keeping reversibility.
ARD (all rates different): every off-diagonal rate can be different (q_ij and q_ji may differ). This is the most flexible model, allowing directional biases between states.

In all cases, you can provide relative_rates and state_frequencies explicitly, or let toytree sample valid parameters for the chosen model. The rate_scalar multiplies the relative rates to set the overall tempo of change.

Full vs tips-only datasets¶

Empirical datasets typically include observations only at the tips (extant taxa), which corresponds to tips_only=True. Simulations can also return full datasets that include internal nodes, which is useful for validating methods.

Plot trait values on nodes¶

Below we color nodes by a discrete trait using a categorical colormap. For tips-only data, internal nodes are left in a neutral color.

Full dataset (all nodes)

In [ ]:

Copied!





# Plot full dataset (all nodes)
cm = toyplot.color.brewer.map('Set2')
vals = full_series.reindex(range(tree_full.nnodes)).astype(int).to_numpy()
node_colors = [cm.colors(v, 0, 2) for v in vals]
c1, a1, m1 = tree_full.draw(layout='r', node_colors=node_colors, node_sizes=14, tip_labels=True, scale_bar=True)
c1
# Plot full dataset (all nodes)
cm = toyplot.color.brewer.map('Set2')
vals = full_series.reindex(range(tree_full.nnodes)).astype(int).to_numpy()
node_colors = [cm.colors(v, 0, 2) for v in vals]
c1, a1, m1 = tree_full.draw(layout='r', node_colors=node_colors, node_sizes=14, tip_labels=True, scale_bar=True)
c1

Tips-only dataset (internal nodes neutral)

In [ ]:

Copied!





# Plot tips-only dataset (internal nodes neutral)
cm = toyplot.color.brewer.map('Set2')
neutral = '#c7c7c7'
tip_vals = tips_series.to_dict()
colors2 = []
for node in tree_full:
    if node.is_leaf():
        colors2.append(cm.colors(int(tip_vals[node.name]), 0, 2))
    else:
        colors2.append(neutral)
c2, a2, m2 = tree_full.draw(layout='r', node_colors=colors2, node_sizes=14, tip_labels=True, scale_bar=True)
c2
# Plot tips-only dataset (internal nodes neutral)
cm = toyplot.color.brewer.map('Set2')
neutral = '#c7c7c7'
tip_vals = tips_series.to_dict()
colors2 = []
for node in tree_full:
    if node.is_leaf():
        colors2.append(cm.colors(int(tip_vals[node.name]), 0, 2))
    else:
        colors2.append(neutral)
c2, a2, m2 = tree_full.draw(layout='r', node_colors=colors2, node_sizes=14, tip_labels=True, scale_bar=True)
c2

In [ ]:

Copied!

import toytree

tree = toytree.rtree.unittree(12, seed=123)
import toytree

tree = toytree.rtree.unittree(12, seed=123)

Single trait (returns Series)¶

When nreplicates is 1 (or less), the function returns a pd.Series.

In [ ]:

Copied!





traits = toytree.pcm.simulate_discrete_trait(
    tree=tree,
    nstates=3,
    model='ER',
    nreplicates=1,
    trait_name='state',
    tips_only=True,
)
traits
traits = toytree.pcm.simulate_discrete_trait(
    tree=tree,
    nstates=3,
    model='ER',
    nreplicates=1,
    trait_name='state',
    tips_only=True,
)
traits

Plot the trait values by coloring tip labels.

In [ ]:

Copied!





color_map = {0: '#4c78a8', 1: '#f58518', 2: '#54a24b'}
tip_colors = [color_map[traits[name]] for name in tree.get_tip_labels()]
c, a, m = tree.draw(layout='r', tip_labels=True, tip_labels_colors=tip_colors, scale_bar=True)
c
color_map = {0: '#4c78a8', 1: '#f58518', 2: '#54a24b'}
tip_colors = [color_map[traits[name]] for name in tree.get_tip_labels()]
c, a, m = tree.draw(layout='r', tip_labels=True, tip_labels_colors=tip_colors, scale_bar=True)
c

Inplace storage¶

If inplace=True, the simulated trait(s) are stored on the tree as node features instead of being returned.

In [ ]:

Copied!





tree2 = toytree.rtree.unittree(8, seed=7)
toytree.pcm.simulate_discrete_trait(
    tree=tree2,
    nstates=2,
    model='ER',
    nreplicates=1,
    trait_name='state',
    tips_only=True,
    inplace=True,
)
# Access stored feature values
tree2.get_node_data('state').head()
tree2 = toytree.rtree.unittree(8, seed=7)
toytree.pcm.simulate_discrete_trait(
    tree=tree2,
    nstates=2,
    model='ER',
    nreplicates=1,
    trait_name='state',
    tips_only=True,
    inplace=True,
)
# Access stored feature values
tree2.get_node_data('state').head()

Model parameters and their effects¶

The parameters below control the Markov model and simulation:

relative_rates: Off-diagonal entries of Q. Larger values increase transition frequency between specific state pairs.
state_frequencies: Stationary frequencies (equilibrium proportions) that influence long-run state occupancy and, for reversible models, how rates are scaled among states.
root_state: Fixes the starting state at the root. This biases the entire simulation toward descendants of that root state.
rate_scalar: Multiplies the relative rates, controlling the overall tempo of evolution (higher values -> more changes).

In [ ]:

Copied!





import numpy as np

tree_param = toytree.rtree.unittree(10, seed=999)

# 1) relative_rates: favor transitions 0->1 over 1->0
rates = np.array([[0, 3.0], [0.5, 0]])
traits_rates = toytree.pcm.simulate_discrete_trait(
    tree=tree_param,
    nstates=2,
    model='ARD',
    relative_rates=rates,
    nreplicates=1,
    trait_name='rate_bias',
    tips_only=True,
)

# 2) state_frequencies: favor state 0 at equilibrium
freqs = np.array([0.8, 0.2])
traits_freq = toytree.pcm.simulate_discrete_trait(
    tree=tree_param,
    nstates=2,
    model='ER',
    state_frequencies=freqs,
    nreplicates=1,
    trait_name='freq_bias',
    tips_only=True,
)

# 3) root_state: fix the root to state 1
traits_root = toytree.pcm.simulate_discrete_trait(
    tree=tree_param,
    nstates=2,
    model='ER',
    root_state=1,
    nreplicates=1,
    trait_name='root_fixed',
    tips_only=True,
)

# 4) rate_scalar: compare slow vs fast rates
traits_slow = toytree.pcm.simulate_discrete_trait(
    tree=tree_param,
    nstates=2,
    model='ER',
    rate_scalar=0.2,
    nreplicates=1,
    trait_name='slow',
    tips_only=True,
)
traits_fast = toytree.pcm.simulate_discrete_trait(
    tree=tree_param,
    nstates=2,
    model='ER',
    rate_scalar=3.0,
    nreplicates=1,
    trait_name='fast',
    tips_only=True,
)

traits_rates.head(), traits_freq.head(), traits_root.head(), traits_slow.head(), traits_fast.head()
import numpy as np

tree_param = toytree.rtree.unittree(10, seed=999)

# 1) relative_rates: favor transitions 0->1 over 1->0
rates = np.array([[0, 3.0], [0.5, 0]])
traits_rates = toytree.pcm.simulate_discrete_trait(
    tree=tree_param,
    nstates=2,
    model='ARD',
    relative_rates=rates,
    nreplicates=1,
    trait_name='rate_bias',
    tips_only=True,
)

# 2) state_frequencies: favor state 0 at equilibrium
freqs = np.array([0.8, 0.2])
traits_freq = toytree.pcm.simulate_discrete_trait(
    tree=tree_param,
    nstates=2,
    model='ER',
    state_frequencies=freqs,
    nreplicates=1,
    trait_name='freq_bias',
    tips_only=True,
)

# 3) root_state: fix the root to state 1
traits_root = toytree.pcm.simulate_discrete_trait(
    tree=tree_param,
    nstates=2,
    model='ER',
    root_state=1,
    nreplicates=1,
    trait_name='root_fixed',
    tips_only=True,
)

# 4) rate_scalar: compare slow vs fast rates
traits_slow = toytree.pcm.simulate_discrete_trait(
    tree=tree_param,
    nstates=2,
    model='ER',
    rate_scalar=0.2,
    nreplicates=1,
    trait_name='slow',
    tips_only=True,
)
traits_fast = toytree.pcm.simulate_discrete_trait(
    tree=tree_param,
    nstates=2,
    model='ER',
    rate_scalar=3.0,
    nreplicates=1,
    trait_name='fast',
    tips_only=True,
)

traits_rates.head(), traits_freq.head(), traits_root.head(), traits_slow.head(), traits_fast.head()

Visual comparison¶

Below we color tips for each scenario to visualize the effect of parameter changes.

In [ ]:

Copied!





cm2 = toyplot.color.brewer.map('Set2')

def plot_tip_trait(trait_series, title):
    colors = [cm2.colors(int(trait_series[name]), 0, 1) for name in tree_param.get_tip_labels()]
    c, a, m = tree_param.draw(layout='r', tip_labels=True, tip_labels_colors=colors, scale_bar=True)
    a.label.text = title
    return c
cm2 = toyplot.color.brewer.map('Set2')

def plot_tip_trait(trait_series, title):
    colors = [cm2.colors(int(trait_series[name]), 0, 1) for name in tree_param.get_tip_labels()]
    c, a, m = tree_param.draw(layout='r', tip_labels=True, tip_labels_colors=colors, scale_bar=True)
    a.label.text = title
    return c

In [ ]:

Copied!

c_rate = plot_tip_trait(traits_rates, 'relative_rates bias')
c_rate
c_rate = plot_tip_trait(traits_rates, 'relative_rates bias')
c_rate

In [ ]:

Copied!

c_freq = plot_tip_trait(traits_freq, 'state_frequencies bias')
c_freq
c_freq = plot_tip_trait(traits_freq, 'state_frequencies bias')
c_freq

In [ ]:

Copied!

c_root = plot_tip_trait(traits_root, 'root_state fixed')
c_root
c_root = plot_tip_trait(traits_root, 'root_state fixed')
c_root

In [ ]:

Copied!

c_slow = plot_tip_trait(traits_slow, 'rate_scalar slow')
c_slow
c_slow = plot_tip_trait(traits_slow, 'rate_scalar slow')
c_slow

In [ ]:

Copied!

c_fast = plot_tip_trait(traits_fast, 'rate_scalar fast')
c_fast
c_fast = plot_tip_trait(traits_fast, 'rate_scalar fast')
c_fast

Multiple replicates (returns DataFrame)¶

When nreplicates is greater than 1, the function returns a pd.DataFrame with one column per replicate.

In [ ]:

Copied!





traits_multi = toytree.pcm.simulate_discrete_trait(
    tree=tree,
    nstates=2,
    model='SYM',
    nreplicates=3,
    trait_name='trait',
    tips_only=True,
)
traits_multi.head()
traits_multi = toytree.pcm.simulate_discrete_trait(
    tree=tree,
    nstates=2,
    model='SYM',
    nreplicates=3,
    trait_name='trait',
    tips_only=True,
)
traits_multi.head()