Summary

This document describes an option in EcoEnsemble V1.2.0 that samples the short term discrepancies and variables of interest explicitly alongside the model parameters, hereafter referred to as the explicit approach, rather than viewing them as latent states and then integrating them out of the likelihood using the Kalman filter.

By default, the explicit option is chosen within fit_ensemble_model by the sampler argument, kalman is an option to use the Kalman filter.

The argument sampler in the fit_ensemble_model allows one to decide the fitting algorithm. Currently there are two options, explicit and kalman, with explicit as the default.

#generate priors
priors <- EnsemblePrior(4)

#run the model
fit_sample <- fit_ensemble_model(observations = list(SSB_obs, Sigma_obs),
                                 simulators = list(list(SSB_ewe, Sigma_ewe, "EwE"),
                                                   list(SSB_lm,  Sigma_lm,  "LeMans"),
                                                   list(SSB_miz, Sigma_miz, "mizer"),
                                                   list(SSB_fs,  Sigma_fs,  "FishSUMS")),
                                 priors = priors,
                                 sampler = "kalman")

The main motivation for this change is improved computational efficiency. Below for the hierarchical model, bulk and tail effective sample size per second (ESS/s) are compared between the explicit and kalman options. The ESS shown here is computed for the variable of interest.

The explicit option is more efficient while sampling from the same distribution as the kalman approach. Resulting in larger ESS/s and may require more iterations as there is generally lower ESS per iteration.

It does not mean that the explicit is more efficient in every model, as there are cases where we identified the kalman to be the better approach. The user should evaluate their options prior to running.

Overview

This document will describe the new approach and prove its sampling from the same distribution, followed by a description of the computational changes.

Equivalence of the samplers

Let \(\mathbf{\phi}\) be a vector containing the fixed parameters, specifically the (vectorised) covariance matrices, the (vectorised) AR(1) parameter matrices as well as the shared long-term discrepancy \(\delta\): \[ \mathbf{\phi} = (\Lambda_{1},\Lambda_{2},\ldots,\Lambda_{m},\Lambda_{\eta},\Lambda_{y},C_{\gamma},R_{1}, R_{2},\ldots R_{m},R_{\eta},\mathbf{\delta})\,. \] If we have a time series of length \(T\), the aim is to fit these parameters given the data \(\mathbf{w}^{(1)},\mathbf{w}^{(2)},\ldots,\mathbf{w}^{(T)}\). When the option sampler = "explicit" is chosen in fit_ensemble_model, the sampler generates samples from the distribution

\[ p(\mathbf{\phi},\mathbf{\theta}^{(1)},\ldots,\mathbf{\theta}^{(T)}\,|\,\mathbf{w}^{(1)},\mathbf{w}^{(2)},\ldots,\mathbf{w}^{(T)})\,. \] A sample from the posterior of \(\mathbf{\phi}\) is then produced by simple marginalisation

\[ p(\mathbf{\phi}\,|\,\mathbf{w}^{(1)},\ldots,\mathbf{w}^{(T)}) = \int p(\mathbf{\phi},\mathbf{\theta}^{(1)},\ldots,\mathbf{\theta}^{(T)}\,|\,\mathbf{w}^{(1)},\mathbf{w}^{(2)},\ldots,\mathbf{w}^{(T)}) \,d\mathbf{\theta}^{(1)}\cdots d\mathbf{\theta}^{(T)}\,. \] Alternatively, directly sampling from the joint distribution of \(\mathbf{\phi}\) and \(\mathbf{\theta}^{(1)},\ldots,\mathbf{\theta}^{(T)}\) can be avoided by noting that

\[ \begin{split} &p(\mathbf{\phi}\,|\,\mathbf{w}^{(1)},\ldots,\mathbf{w}^{(T)}) = \int p(\mathbf{\phi},\mathbf{\theta}^{(1)},\ldots,\mathbf{\theta}^{(T)}\,|\,\mathbf{w}^{(1)},\mathbf{w}^{(2)},\ldots,\mathbf{w}^{(T)}) \,d\mathbf{\theta}^{(1)}\cdots d\mathbf{\theta}^{(T)}\\ &\propto \int p(\mathbf{w}^{(1)},\mathbf{w}^{(2)},\ldots,\mathbf{w}^{(T)}\,|\,\mathbf{\phi},\mathbf{\theta}^{(1)},\ldots,\mathbf{\theta}^{(T)})p(\mathbf{\phi},\mathbf{\theta}^{(1)},\ldots,\mathbf{\theta}^{(T)}) \,d\mathbf{\theta}^{(1)}\cdots d\mathbf{\theta}^{(T)}\\ &=p(\mathbf{\phi})\int \bigg(\prod_{t = 1}^{T}p(\mathbf{w}^{(t)}\,|\,\mathbf{\phi},\mathbf{\theta}^{(t)})\bigg)\bigg(\prod_{t = 1}^{T - 1}p(\mathbf{\theta}^{(t + 1)}\,|\,\mathbf{\phi},\mathbf{\theta}^{(t)})\bigg)p(\mathbf{\theta}^{(1)}\,|\,\mathbf{\phi})\,d\mathbf{\theta}^{(1)}\cdots d\mathbf{\theta}^{(T)}\\ &= p(\mathbf{\phi})\int \bigg(\prod_{t = 1}^{T-1}p(\mathbf{w}^{(t + 1)}\,|\,\mathbf{\phi},\mathbf{\theta}^{(t + 1)})p(\mathbf{\theta}^{(t + 1)}\,|\,\mathbf{\phi},\mathbf{\theta}^{(t)})\bigg)p(\mathbf{\theta}^{(1)}\,|\,\mathbf{\phi})p(\mathbf{w}^{(1)}\,|\,\mathbf{\phi},\mathbf{\theta}^{(1)})\,d\mathbf{\theta}^{(1)}\cdots d\mathbf{\theta}^{(T)}\,.\\ \end{split} \] The integral can be evaluated iteratively using the Kalman filter and the fixed parameters \(\mathbf{\phi}\) can be sampled using MCMC.

Output

The four currently available configurations of a EcoEnsemble model were tested with the new explicit option and compared with the kalman option. The runs used the non-hierarchical model, specified by lkj priors, the hierarchical model, specified by hierarchical priors, the drivers model and the hierarchical drivers model, both specified by drivers = T. Code and data is taken from the vignettes, IncludingDrivers and EcoEnsemble.

Posterior Comparison

The plot below compares posterior summaries between the explicit and kalman options. It shows the difference in posterior means and distributions for the shared state-space parameters used in posterior state reconstruction for 14 chains. As each chain is independent, each chain would have persistent discrepancies between explicit and kalman if the posterior distributions are different.

The parameters with the greatest difference is plotted below.

There are no considerable differences in the posterior output between the explicit or kalman sampler.

Performance

The table below summarises the speed increase of the explicit over the kalman option, for the example models found in the vignettes. ESS is per second spent sampling, discarding the burn-in time.

	Drivers		Model		Model + hierarchical		Drivers + hierarchical
Diagnostic	E	K	E	K	E	K	E	K
Run time
Sampling run time (s)	1004	7031	2559	11482	6381	17373	1014	11632
Convergence
Max R-hat	1.01	1.01	1.02	1	1.06	1.01	1.01	1.01
Bulk diagnostics
Min ESS	415	735	181	982	97.6	1161	966	1251
Median ESS	3288	3590	2875	3794	2158	3847	3139	3460
Min ESS/s	0.414	0.104	0.0707	0.0856	0.0153	0.0668	0.953	0.108
Median ESS/s	3.28	0.511	1.12	0.33	0.338	0.221	3.1	0.297
Min ESS/N	0.104	0.184	0.0453	0.246	0.0244	0.29	0.242	0.313
Median ESS/N	0.822	0.898	0.719	0.948	0.539	0.962	0.785	0.865
Tail diagnostics
Min ESS	840	232	153	1107	158	651	1020	749
Median ESS	3311	3404	2892	3412	2397	3423	3157	3331
Min ESS/s	0.837	0.033	0.0599	0.0964	0.0248	0.0375	1.01	0.0644
Median ESS/s	3.3	0.484	1.13	0.297	0.376	0.197	3.11	0.286
Min ESS/N	0.21	0.058	0.0383	0.277	0.0396	0.163	0.255	0.187
Median ESS/N	0.828	0.851	0.723	0.853	0.599	0.856	0.789	0.833

Between the two options, explicit results in a lower ESS for each model configuration, but kalman has a lower ESS per second. ESS per run is higher for kalman in the non-drivers cases. So in the explicit case, further iterations may be necessary to achieve the same ESS. Regardless, the run time for both the sampling and the warm-up is much faster for the explicit sampler. The pattern seen is not different between the bulk and tail ESS.

Sampler Warnings

The two options induce different posterior geometries, so the stan (Stan Development Team (2020)) No-U-Turn Sampler’s diagnostics can differ. The table below summarises the diagnostics for each model formulation.

	Drivers		Model		Model + hierarchical		Drivers + hierarchical
Diagnostic	E	K	E	K	E	K	E	K
Divergences	1	57	4	13	0	9	8	37
Treedepth hits	0	0	3996	0	4000	0	0	0
Max treedepth seen	9	8	10	8	10	8	9	9
Min E-BFMI	0.828	0.825	0.726	0.777	0.769	0.797	0.847	0.82
Median E-BFMI	0.863	0.847	0.81	0.819	0.852	0.812	0.946	0.838

In these examples, the explicit option frequently reaches the maximum tree depth. This is because FishSUMS, one of the simulators included in the model, does not have estimates for sole, causing a flat posterior geometry. Because this warning primarily reflects sampling efficiency rather than invalid inference, it is suppressed when it occurs in isolation. However, if multiple warnings are given by stan, all warnings are printed. The explicit option results in a lower ESS, which may lead to some warnings, but these warnings may not be concerning the variable of interest in your model. Therefore it may not have an influence on your model. See this vignette for further information.

EcoEnsemble Speed-Up

Luke E. K. Broadbent, Thomas I. J. Bartos, Michael J. Thomson and Michael A. Spence

2026-04-27