fit continuous traits

Fit Continuous Trait Models (BM, OU, EB)¶

This notebook demonstrates fitting univariate continuous-trait models on a phylogeny with toytree.pcm.fit_continuous_ml().

It focuses on model fitting and comparison (AIC / AICc). For simulation details, see the companion page: sim continuous traits BM.

Theory background¶

For a continuous trait observed at tips, these models define the expected covariance among taxa from shared evolutionary history:

BM: neutral random walk with constant variance rate $\sigma^2$.
OU: stochastic process with pull toward an optimum (strength $\alpha$).
EB: rate changes exponentially through time (parameter $r$).

Models are fit by Gaussian likelihood, and compared here using AICc by default (with AIC also available). AICc is often preferred when sample size (number of tips) is not large relative to parameter count.

References¶

Felsenstein, J. (1985). Phylogenies and the comparative method. American Naturalist, 125, 1–15.
Hansen, T. F. (1997). Stabilizing selection and the comparative analysis of adaptation. Evolution, 51, 1341–1351.
Harmon, L. J., et al. (2010). Early bursts of body size and shape evolution are rare in comparative data. Evolution, 64, 2385–2396.
Burnham, K. P., & Anderson, D. R. (2002). Model Selection and Multimodel Inference. Springer.

In [ ]:

Copied!

import toytree
import numpy as np
import toytree
import numpy as np

Simulate a tree with continuous data "X" across the tips¶

In [47]:

Copied!





tree = toytree.rtree.bdtree(ntips=40, seed=1234)
trait = tree.pcm.simulate_continuous_bm(sigma2=1.0, model="BM", tips_only=True, seed=456)
tree = tree.set_node_data("X", dict(trait), default=np.nan, inplace=False)
c, a, m = tree.draw(layout='d');
tree.annotate.add_node_markers(a, color=("X", "Spectral"), size=8, mask=False);
tree = toytree.rtree.bdtree(ntips=40, seed=1234)
trait = tree.pcm.simulate_continuous_bm(sigma2=1.0, model="BM", tips_only=True, seed=456)
tree = tree.set_node_data("X", dict(trait), default=np.nan, inplace=False)
c, a, m = tree.draw(layout='d');
tree.annotate.add_node_markers(a, color=("X", "Spectral"), size=8, mask=False);

Fit Models (BM, OU, ER) to the data¶

In [33]:

Copied!

fit = tree.pcm.fit_continuous_ml("X")
fit = tree.pcm.fit_continuous_ml("X")

In [34]:

Copied!

# select the best fitting model
fit["best_fit"]
# select the best fitting model
fit["best_fit"]

Out[34]:

FitContinuousMLResult(
  model=EB, nobs=40, nparams=3
  log_likelihood=-31.5603
  mu=2.04564, sigma2=0.0970524, alpha=None, r=0.715694063614909
  converged=True, message='Solution found.'
)

In [35]:

Copied!

# Examine the Model Comparison table
fit["model_table"]
# Examine the Model Comparison table
fit["model_table"]

Out[35]:

	model	loglik	k	AIC	dAIC	weight_AIC	evidence_ratio_vs_best_AIC	nobs	AICc	dAICc	weight_AICc	evidence_ratio_vs_best_AICc	weight	evidence_ratio_vs_best	rank_by
0	EB	-31.560312	3	69.120624	0.000000	9.731734e-01	1.000000e+00	40	69.787291	0.000000	9.683237e-01	1.000000e+00	9.683237e-01	1.000000e+00	AICc
1	BM	-36.151481	2	76.302962	7.182338	2.682659e-02	3.627645e+01	40	76.627286	6.839995	3.167630e-02	3.056934e+01	3.167630e-02	3.056934e+01	AICc
2	OU	-52.127406	3	110.254812	41.134188	1.137666e-09	8.554122e+08	40	110.921479	41.134188	1.131997e-09	8.554122e+08	1.131997e-09	8.554122e+08	AICc

In [36]:

Copied!

# Examine the ancestral state "X_anc" and "X_anc_var" inferred at each node. 
fit["data"].head()
# Examine the ancestral state "X_anc" and "X_anc_var" inferred at each node. 
fit["data"].head()

Out[36]:

	X_anc	X_anc_var
0	2.056966	0.0
1	1.836631	0.0
2	1.466209	0.0
3	0.608835	0.0
4	-0.338354	0.0

Plot the ancestral states ("X_anc") reconstructed under the best model¶

In [40]:

Copied!

tree_fit = fit['tree']
c, a, m = tree_fit.draw(layout='d')
tree_fit.annotate.add_node_markers(a, color=("X_anc", "Spectral"), size=8, mask=False);
tree_fit = fit['tree']
c, a, m = tree_fit.draw(layout='d')
tree_fit.annotate.add_node_markers(a, color=("X_anc", "Spectral"), size=8, mask=False);

Optional parameters¶

The fitter supports several options that affect optimization, ranking, and output naming.

In [48]:

Copied!

# Compare only BM vs OU
fit_bm_ou = tree.pcm.fit_continuous_ml("X", models=("BM", "OU"))
fit_bm_ou["model_table"]
# Compare only BM vs OU
fit_bm_ou = tree.pcm.fit_continuous_ml("X", models=("BM", "OU"))
fit_bm_ou["model_table"]

Out[48]:

	model	loglik	k	AIC	dAIC	weight_AIC	evidence_ratio_vs_best_AIC	nobs	AICc	dAICc	weight_AICc	evidence_ratio_vs_best_AICc	weight	evidence_ratio_vs_best	rank_by
0	BM	-36.151481	2	76.302962	0.00000	1.000000e+00	1.000000e+00	40	76.627286	0.000000	1.000000e+00	1.000000e+00	1.000000e+00	1.000000e+00	AICc
1	OU	-52.127406	3	110.254812	33.95185	4.240815e-08	2.358037e+07	40	110.921479	34.294193	3.573639e-08	2.798268e+07	3.573639e-08	2.798268e+07	AICc

In [49]:

Copied!

# Rank by AIC instead of AICc
fit_aic = tree.pcm.fit_continuous_ml("X", criterion="AIC")
fit_aic["model_table"][["model", "AIC", "AICc", "rank_by"]]
# Rank by AIC instead of AICc
fit_aic = tree.pcm.fit_continuous_ml("X", criterion="AIC")
fit_aic["model_table"][["model", "AIC", "AICc", "rank_by"]]

Out[49]:

	model	AIC	AICc	rank_by
0	EB	69.120624	69.787291	AIC
1	BM	76.302962	76.627286	AIC
2	OU	110.254812	110.921479	AIC

In [50]:

Copied!





# Adjust optimization bounds
fit_bounds = tree.pcm.fit_continuous_ml(
    "X",
    alpha_bounds=(1e-6, 5.0),
    r_bounds=(-5.0, 5.0),
)
fit_bounds["model_table"]
# Adjust optimization bounds
fit_bounds = tree.pcm.fit_continuous_ml(
    "X",
    alpha_bounds=(1e-6, 5.0),
    r_bounds=(-5.0, 5.0),
)
fit_bounds["model_table"]

Out[50]:

	model	loglik	k	AIC	dAIC	weight_AIC	evidence_ratio_vs_best_AIC	nobs	AICc	dAICc	weight_AICc	evidence_ratio_vs_best_AICc	weight	evidence_ratio_vs_best	rank_by
0	EB	-32.513613	3	71.027227	0.000000	0.696440	1.000000	40	71.693893	0.000000	0.690024	1.000000	0.690024	1.000000	AICc
1	OU	-33.523226	3	73.046452	2.019225	0.253755	2.744538	40	73.713119	2.019225	0.251417	2.744538	0.251417	2.744538	AICc
2	BM	-36.151481	2	76.302962	5.275735	0.049805	13.983352	40	76.627286	4.933393	0.058559	11.783454	0.058559	11.783454	AICc

In [51]:

Copied!





# Store outputs directly on the tree and customize feature prefix
tree2 = tree.copy()
res_inplace = tree2.pcm.fit_continuous_ml("X", inplace=True, store_prefix="size")
tree2.get_node_data(["size_anc", "size_anc_var"]).head()
# Store outputs directly on the tree and customize feature prefix
tree2 = tree.copy()
res_inplace = tree2.pcm.fit_continuous_ml("X", inplace=True, store_prefix="size")
tree2.get_node_data(["size_anc", "size_anc_var"]).head()

Out[51]:

	size_anc	size_anc_var
0	2.056966	0.0
1	1.836631	0.0
2	1.466209	0.0
3	0.608835	0.0
4	-0.338354	0.0

Return object structure¶

fit_continuous_ml() returns a dict with:

best_fit: best model result object.
model_fits: mapping from model name to fit result object.
model_table: AIC/AICc comparison table.
data: node-indexed DataFrame containing ancestral estimates and variances for the selected model.
tree: output tree (same object for inplace=True, copy otherwise).

Practical notes¶

This fitter is currently one trait at a time (str | pd.Series).
Tip values must be numeric and non-missing.
Model ranking is relative; close AICc values indicate weak evidence separation.
OU/EB parameter bounds can influence optimization behavior on small or noisy datasets.