EcoEnsemble Speed-Up

Luke E. K. Broadbent, Thomas I. J. Bartos, Michael J. Thomson and Michael A. Spence

2026-04-27

Summary

This document describes an option in EcoEnsemble V1.2.0 that samples the short term discrepancies and variables of interest explicitly alongside the model parameters, hereafter referred to as the explicit approach, rather than viewing them as latent states and then integrating them out of the likelihood using the Kalman filter.

By default, the explicit option is chosen within fit_ensemble_model by the sampler argument, kalman is an option to use the Kalman filter.

The argument sampler in the fit_ensemble_model allows one to decide the fitting algorithm. Currently there are two options, explicit and kalman, with explicit as the default.

#generate priors
priors <- EnsemblePrior(4)

#run the model
fit_sample <- fit_ensemble_model(observations = list(SSB_obs, Sigma_obs),
                                 simulators = list(list(SSB_ewe, Sigma_ewe, "EwE"),
                                                   list(SSB_lm,  Sigma_lm,  "LeMans"),
                                                   list(SSB_miz, Sigma_miz, "mizer"),
                                                   list(SSB_fs,  Sigma_fs,  "FishSUMS")),
                                 priors = priors,
                                 sampler = "kalman")

The main motivation for this change is improved computational efficiency. Below for the hierarchical model, bulk and tail effective sample size per second (ESS/s) are compared between the explicit and kalman options. The ESS shown here is computed for the variable of interest.

The explicit option is more efficient while sampling from the same distribution as the kalman approach. Resulting in larger ESS/s and may require more iterations as there is generally lower ESS per iteration.

It does not mean that the explicit is more efficient in every model, as there are cases where we identified the kalman to be the better approach. The user should evaluate their options prior to running.

Overview

This document will describe the new approach and prove its sampling from the same distribution, followed by a description of the computational changes.

Equivalence of the samplers

Let \(\mathbf{\phi}\) be a vector containing the fixed parameters, specifically the (vectorised) covariance matrices, the (vectorised) AR(1) parameter matrices as well as the shared long-term discrepancy \(\delta\): \[ \mathbf{\phi} = (\Lambda_{1},\Lambda_{2},\ldots,\Lambda_{m},\Lambda_{\eta},\Lambda_{y},C_{\gamma},R_{1}, R_{2},\ldots R_{m},R_{\eta},\mathbf{\delta})\,. \] If we have a time series of length \(T\), the aim is to fit these parameters given the data \(\mathbf{w}^{(1)},\mathbf{w}^{(2)},\ldots,\mathbf{w}^{(T)}\). When the option sampler = "explicit" is chosen in fit_ensemble_model, the sampler generates samples from the distribution

\[ p(\mathbf{\phi},\mathbf{\theta}^{(1)},\ldots,\mathbf{\theta}^{(T)}\,|\,\mathbf{w}^{(1)},\mathbf{w}^{(2)},\ldots,\mathbf{w}^{(T)})\,. \] A sample from the posterior of \(\mathbf{\phi}\) is then produced by simple marginalisation

\[ p(\mathbf{\phi}\,|\,\mathbf{w}^{(1)},\ldots,\mathbf{w}^{(T)}) = \int p(\mathbf{\phi},\mathbf{\theta}^{(1)},\ldots,\mathbf{\theta}^{(T)}\,|\,\mathbf{w}^{(1)},\mathbf{w}^{(2)},\ldots,\mathbf{w}^{(T)}) \,d\mathbf{\theta}^{(1)}\cdots d\mathbf{\theta}^{(T)}\,. \] Alternatively, directly sampling from the joint distribution of \(\mathbf{\phi}\) and \(\mathbf{\theta}^{(1)},\ldots,\mathbf{\theta}^{(T)}\) can be avoided by noting that

\[ \begin{split} &p(\mathbf{\phi}\,|\,\mathbf{w}^{(1)},\ldots,\mathbf{w}^{(T)}) = \int p(\mathbf{\phi},\mathbf{\theta}^{(1)},\ldots,\mathbf{\theta}^{(T)}\,|\,\mathbf{w}^{(1)},\mathbf{w}^{(2)},\ldots,\mathbf{w}^{(T)}) \,d\mathbf{\theta}^{(1)}\cdots d\mathbf{\theta}^{(T)}\\ &\propto \int p(\mathbf{w}^{(1)},\mathbf{w}^{(2)},\ldots,\mathbf{w}^{(T)}\,|\,\mathbf{\phi},\mathbf{\theta}^{(1)},\ldots,\mathbf{\theta}^{(T)})p(\mathbf{\phi},\mathbf{\theta}^{(1)},\ldots,\mathbf{\theta}^{(T)}) \,d\mathbf{\theta}^{(1)}\cdots d\mathbf{\theta}^{(T)}\\ &=p(\mathbf{\phi})\int \bigg(\prod_{t = 1}^{T}p(\mathbf{w}^{(t)}\,|\,\mathbf{\phi},\mathbf{\theta}^{(t)})\bigg)\bigg(\prod_{t = 1}^{T - 1}p(\mathbf{\theta}^{(t + 1)}\,|\,\mathbf{\phi},\mathbf{\theta}^{(t)})\bigg)p(\mathbf{\theta}^{(1)}\,|\,\mathbf{\phi})\,d\mathbf{\theta}^{(1)}\cdots d\mathbf{\theta}^{(T)}\\ &= p(\mathbf{\phi})\int \bigg(\prod_{t = 1}^{T-1}p(\mathbf{w}^{(t + 1)}\,|\,\mathbf{\phi},\mathbf{\theta}^{(t + 1)})p(\mathbf{\theta}^{(t + 1)}\,|\,\mathbf{\phi},\mathbf{\theta}^{(t)})\bigg)p(\mathbf{\theta}^{(1)}\,|\,\mathbf{\phi})p(\mathbf{w}^{(1)}\,|\,\mathbf{\phi},\mathbf{\theta}^{(1)})\,d\mathbf{\theta}^{(1)}\cdots d\mathbf{\theta}^{(T)}\,.\\ \end{split} \] The integral can be evaluated iteratively using the Kalman filter and the fixed parameters \(\mathbf{\phi}\) can be sampled using MCMC.

Output

The four currently available configurations of a EcoEnsemble model were tested with the new explicit option and compared with the kalman option. The runs used the non-hierarchical model, specified by lkj priors, the hierarchical model, specified by hierarchical priors, the drivers model and the hierarchical drivers model, both specified by drivers = T. Code and data is taken from the vignettes, IncludingDrivers and EcoEnsemble.

Posterior Comparison

The plot below compares posterior summaries between the explicit and kalman options. It shows the difference in posterior means and distributions for the shared state-space parameters used in posterior state reconstruction for 14 chains. As each chain is independent, each chain would have persistent discrepancies between explicit and kalman if the posterior distributions are different.

The parameters with the greatest difference is plotted below.

There are no considerable differences in the posterior output between the explicit or kalman sampler.

Performance

The table below summarises the speed increase of the explicit over the kalman option, for the example models found in the vignettes. ESS is per second spent sampling, discarding the burn-in time.

Drivers
Model
Model + hierarchical
Drivers + hierarchical
Diagnostic E K E K E K E K
Run time
Sampling run time (s) 1004 7031 2559 11482 6381 17373 1014 11632
Convergence
Max R-hat 1.01 1.01 1.02 1 1.06 1.01 1.01 1.01
Bulk diagnostics
Min ESS 415 735 181 982 97.6 1161 966 1251
Median ESS 3288 3590 2875 3794 2158 3847 3139 3460
Min ESS/s 0.414 0.104 0.0707 0.0856 0.0153 0.0668 0.953 0.108
Median ESS/s 3.28 0.511 1.12 0.33 0.338 0.221 3.1 0.297
Min ESS/N 0.104 0.184 0.0453 0.246 0.0244 0.29 0.242 0.313
Median ESS/N 0.822 0.898 0.719 0.948 0.539 0.962 0.785 0.865
Tail diagnostics
Min ESS 840 232 153 1107 158 651 1020 749
Median ESS 3311 3404 2892 3412 2397 3423 3157 3331
Min ESS/s 0.837 0.033 0.0599 0.0964 0.0248 0.0375 1.01 0.0644
Median ESS/s 3.3 0.484 1.13 0.297 0.376 0.197 3.11 0.286
Min ESS/N 0.21 0.058 0.0383 0.277 0.0396 0.163 0.255 0.187
Median ESS/N 0.828 0.851 0.723 0.853 0.599 0.856 0.789 0.833

Between the two options, explicit results in a lower ESS for each model configuration, but kalman has a lower ESS per second. ESS per run is higher for kalman in the non-drivers cases. So in the explicit case, further iterations may be necessary to achieve the same ESS. Regardless, the run time for both the sampling and the warm-up is much faster for the explicit sampler. The pattern seen is not different between the bulk and tail ESS.

Sampler Warnings

The two options induce different posterior geometries, so the stan (Stan Development Team (2020)) No-U-Turn Sampler’s diagnostics can differ. The table below summarises the diagnostics for each model formulation.

Drivers
Model
Model + hierarchical
Drivers + hierarchical
Diagnostic E K E K E K E K
Divergences 1 57 4 13 0 9 8 37
Treedepth hits 0 0 3996 0 4000 0 0 0
Max treedepth seen 9 8 10 8 10 8 9 9
Min E-BFMI 0.828 0.825 0.726 0.777 0.769 0.797 0.847 0.82
Median E-BFMI 0.863 0.847 0.81 0.819 0.852 0.812 0.946 0.838

In these examples, the explicit option frequently reaches the maximum tree depth. This is because FishSUMS, one of the simulators included in the model, does not have estimates for sole, causing a flat posterior geometry. Because this warning primarily reflects sampling efficiency rather than invalid inference, it is suppressed when it occurs in isolation. However, if multiple warnings are given by stan, all warnings are printed. The explicit option results in a lower ESS, which may lead to some warnings, but these warnings may not be concerning the variable of interest in your model. Therefore it may not have an influence on your model. See this vignette for further information.

References

Stan Development Team. 2020. “RStan: The R Interface to Stan.” https://mc-stan.org/.