fit continuous traits
Fit Continuous Trait Models (BM, OU, EB)¶
This notebook demonstrates fitting univariate continuous-trait models on a phylogeny with toytree.pcm.fit_continuous_ml().
It focuses on model fitting and comparison (AIC / AICc). For simulation details, see the companion page: sim continuous traits BM.
Theory background¶
For a continuous trait observed at tips, these models define the expected covariance among taxa from shared evolutionary history:
- BM: neutral random walk with constant variance rate $\sigma^2$.
- OU: stochastic process with pull toward an optimum (strength $\alpha$).
- EB: rate changes exponentially through time (parameter $r$).
Models are fit by Gaussian likelihood, and compared here using AICc by default (with AIC also available). AICc is often preferred when sample size (number of tips) is not large relative to parameter count.
References¶
- Felsenstein, J. (1985). Phylogenies and the comparative method. American Naturalist, 125, 1–15.
- Hansen, T. F. (1997). Stabilizing selection and the comparative analysis of adaptation. Evolution, 51, 1341–1351.
- Harmon, L. J., et al. (2010). Early bursts of body size and shape evolution are rare in comparative data. Evolution, 64, 2385–2396.
- Burnham, K. P., & Anderson, D. R. (2002). Model Selection and Multimodel Inference. Springer.
In [ ]:
Copied!
import toytree
import numpy as np
import toytree
import numpy as np
Simulate a tree with continuous data "X" across the tips¶
In [47]:
Copied!
tree = toytree.rtree.bdtree(ntips=40, seed=1234)
trait = tree.pcm.simulate_continuous_bm(sigma2=1.0, model="BM", tips_only=True, seed=456)
tree = tree.set_node_data("X", dict(trait), default=np.nan, inplace=False)
c, a, m = tree.draw(layout='d');
tree.annotate.add_node_markers(a, color=("X", "Spectral"), size=8, mask=False);
tree = toytree.rtree.bdtree(ntips=40, seed=1234)
trait = tree.pcm.simulate_continuous_bm(sigma2=1.0, model="BM", tips_only=True, seed=456)
tree = tree.set_node_data("X", dict(trait), default=np.nan, inplace=False)
c, a, m = tree.draw(layout='d');
tree.annotate.add_node_markers(a, color=("X", "Spectral"), size=8, mask=False);
Fit Models (BM, OU, ER) to the data¶
In [33]:
Copied!
fit = tree.pcm.fit_continuous_ml("X")
fit = tree.pcm.fit_continuous_ml("X")
In [34]:
Copied!
# select the best fitting model
fit["best_fit"]
# select the best fitting model
fit["best_fit"]
Out[34]:
FitContinuousMLResult( model=EB, nobs=40, nparams=3 log_likelihood=-31.5603 mu=2.04564, sigma2=0.0970524, alpha=None, r=0.715694063614909 converged=True, message='Solution found.' )
In [35]:
Copied!
# Examine the Model Comparison table
fit["model_table"]
# Examine the Model Comparison table
fit["model_table"]
Out[35]:
| model | loglik | k | AIC | dAIC | weight_AIC | evidence_ratio_vs_best_AIC | nobs | AICc | dAICc | weight_AICc | evidence_ratio_vs_best_AICc | weight | evidence_ratio_vs_best | rank_by | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | EB | -31.560312 | 3 | 69.120624 | 0.000000 | 9.731734e-01 | 1.000000e+00 | 40 | 69.787291 | 0.000000 | 9.683237e-01 | 1.000000e+00 | 9.683237e-01 | 1.000000e+00 | AICc |
| 1 | BM | -36.151481 | 2 | 76.302962 | 7.182338 | 2.682659e-02 | 3.627645e+01 | 40 | 76.627286 | 6.839995 | 3.167630e-02 | 3.056934e+01 | 3.167630e-02 | 3.056934e+01 | AICc |
| 2 | OU | -52.127406 | 3 | 110.254812 | 41.134188 | 1.137666e-09 | 8.554122e+08 | 40 | 110.921479 | 41.134188 | 1.131997e-09 | 8.554122e+08 | 1.131997e-09 | 8.554122e+08 | AICc |
In [36]:
Copied!
# Examine the ancestral state "X_anc" and "X_anc_var" inferred at each node.
fit["data"].head()
# Examine the ancestral state "X_anc" and "X_anc_var" inferred at each node.
fit["data"].head()
Out[36]:
| X_anc | X_anc_var | |
|---|---|---|
| 0 | 2.056966 | 0.0 |
| 1 | 1.836631 | 0.0 |
| 2 | 1.466209 | 0.0 |
| 3 | 0.608835 | 0.0 |
| 4 | -0.338354 | 0.0 |
Plot the ancestral states ("X_anc") reconstructed under the best model¶
In [40]:
Copied!
tree_fit = fit['tree']
c, a, m = tree_fit.draw(layout='d')
tree_fit.annotate.add_node_markers(a, color=("X_anc", "Spectral"), size=8, mask=False);
tree_fit = fit['tree']
c, a, m = tree_fit.draw(layout='d')
tree_fit.annotate.add_node_markers(a, color=("X_anc", "Spectral"), size=8, mask=False);
Optional parameters¶
The fitter supports several options that affect optimization, ranking, and output naming.
In [48]:
Copied!
# Compare only BM vs OU
fit_bm_ou = tree.pcm.fit_continuous_ml("X", models=("BM", "OU"))
fit_bm_ou["model_table"]
# Compare only BM vs OU
fit_bm_ou = tree.pcm.fit_continuous_ml("X", models=("BM", "OU"))
fit_bm_ou["model_table"]
Out[48]:
| model | loglik | k | AIC | dAIC | weight_AIC | evidence_ratio_vs_best_AIC | nobs | AICc | dAICc | weight_AICc | evidence_ratio_vs_best_AICc | weight | evidence_ratio_vs_best | rank_by | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | BM | -36.151481 | 2 | 76.302962 | 0.00000 | 1.000000e+00 | 1.000000e+00 | 40 | 76.627286 | 0.000000 | 1.000000e+00 | 1.000000e+00 | 1.000000e+00 | 1.000000e+00 | AICc |
| 1 | OU | -52.127406 | 3 | 110.254812 | 33.95185 | 4.240815e-08 | 2.358037e+07 | 40 | 110.921479 | 34.294193 | 3.573639e-08 | 2.798268e+07 | 3.573639e-08 | 2.798268e+07 | AICc |
In [49]:
Copied!
# Rank by AIC instead of AICc
fit_aic = tree.pcm.fit_continuous_ml("X", criterion="AIC")
fit_aic["model_table"][["model", "AIC", "AICc", "rank_by"]]
# Rank by AIC instead of AICc
fit_aic = tree.pcm.fit_continuous_ml("X", criterion="AIC")
fit_aic["model_table"][["model", "AIC", "AICc", "rank_by"]]
Out[49]:
| model | AIC | AICc | rank_by | |
|---|---|---|---|---|
| 0 | EB | 69.120624 | 69.787291 | AIC |
| 1 | BM | 76.302962 | 76.627286 | AIC |
| 2 | OU | 110.254812 | 110.921479 | AIC |
In [50]:
Copied!
# Adjust optimization bounds
fit_bounds = tree.pcm.fit_continuous_ml(
"X",
alpha_bounds=(1e-6, 5.0),
r_bounds=(-5.0, 5.0),
)
fit_bounds["model_table"]
# Adjust optimization bounds
fit_bounds = tree.pcm.fit_continuous_ml(
"X",
alpha_bounds=(1e-6, 5.0),
r_bounds=(-5.0, 5.0),
)
fit_bounds["model_table"]
Out[50]:
| model | loglik | k | AIC | dAIC | weight_AIC | evidence_ratio_vs_best_AIC | nobs | AICc | dAICc | weight_AICc | evidence_ratio_vs_best_AICc | weight | evidence_ratio_vs_best | rank_by | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | EB | -32.513613 | 3 | 71.027227 | 0.000000 | 0.696440 | 1.000000 | 40 | 71.693893 | 0.000000 | 0.690024 | 1.000000 | 0.690024 | 1.000000 | AICc |
| 1 | OU | -33.523226 | 3 | 73.046452 | 2.019225 | 0.253755 | 2.744538 | 40 | 73.713119 | 2.019225 | 0.251417 | 2.744538 | 0.251417 | 2.744538 | AICc |
| 2 | BM | -36.151481 | 2 | 76.302962 | 5.275735 | 0.049805 | 13.983352 | 40 | 76.627286 | 4.933393 | 0.058559 | 11.783454 | 0.058559 | 11.783454 | AICc |
In [51]:
Copied!
# Store outputs directly on the tree and customize feature prefix
tree2 = tree.copy()
res_inplace = tree2.pcm.fit_continuous_ml("X", inplace=True, store_prefix="size")
tree2.get_node_data(["size_anc", "size_anc_var"]).head()
# Store outputs directly on the tree and customize feature prefix
tree2 = tree.copy()
res_inplace = tree2.pcm.fit_continuous_ml("X", inplace=True, store_prefix="size")
tree2.get_node_data(["size_anc", "size_anc_var"]).head()
Out[51]:
| size_anc | size_anc_var | |
|---|---|---|
| 0 | 2.056966 | 0.0 |
| 1 | 1.836631 | 0.0 |
| 2 | 1.466209 | 0.0 |
| 3 | 0.608835 | 0.0 |
| 4 | -0.338354 | 0.0 |
Return object structure¶
fit_continuous_ml() returns a dict with:
best_fit: best model result object.model_fits: mapping from model name to fit result object.model_table: AIC/AICc comparison table.data: node-indexed DataFrame containing ancestral estimates and variances for the selected model.tree: output tree (same object forinplace=True, copy otherwise).
Practical notes¶
- This fitter is currently one trait at a time (
str | pd.Series). - Tip values must be numeric and non-missing.
- Model ranking is relative; close AICc values indicate weak evidence separation.
- OU/EB parameter bounds can influence optimization behavior on small or noisy datasets.