This document describes an option in EcoEnsemble V1.2.0
that samples the short term discrepancies and variables of interest
explicitly alongside the model parameters, hereafter referred to as the
explicit approach, rather than viewing them as latent
states and then integrating them out of the likelihood using the Kalman filter.
By default, the explicit option is chosen within
fit_ensemble_model by the sampler argument,
kalman is an option to use the Kalman filter.
The argument sampler in the
fit_ensemble_model allows one to decide the fitting
algorithm. Currently there are two options, explicit and
kalman, with explicit as the default.
#generate priors
priors <- EnsemblePrior(4)
#run the model
fit_sample <- fit_ensemble_model(observations = list(SSB_obs, Sigma_obs),
simulators = list(list(SSB_ewe, Sigma_ewe, "EwE"),
list(SSB_lm, Sigma_lm, "LeMans"),
list(SSB_miz, Sigma_miz, "mizer"),
list(SSB_fs, Sigma_fs, "FishSUMS")),
priors = priors,
sampler = "kalman")The main motivation for this change is improved computational
efficiency. Below for the hierarchical model, bulk and tail effective
sample size per second (ESS/s) are compared between the
explicit and kalman options. The ESS shown
here is computed for the variable of interest.
The explicit option is more efficient while sampling
from the same distribution as the kalman approach.
Resulting in larger ESS/s and may require more iterations as there is
generally lower ESS per iteration.
It does not mean that the explicit is more efficient in
every model, as there are cases where we identified the
kalman to be the better approach. The user should evaluate
their options prior to running.
This document will describe the new approach and prove its sampling from the same distribution, followed by a description of the computational changes.
Let \(\mathbf{\phi}\) be a vector
containing the fixed parameters, specifically the (vectorised)
covariance matrices, the (vectorised) AR(1) parameter matrices as well
as the shared long-term discrepancy \(\delta\): \[
\mathbf{\phi} =
(\Lambda_{1},\Lambda_{2},\ldots,\Lambda_{m},\Lambda_{\eta},\Lambda_{y},C_{\gamma},R_{1},
R_{2},\ldots R_{m},R_{\eta},\mathbf{\delta})\,.
\] If we have a time series of length \(T\), the aim is to fit these parameters
given the data \(\mathbf{w}^{(1)},\mathbf{w}^{(2)},\ldots,\mathbf{w}^{(T)}\).
When the option sampler = "explicit" is chosen in
fit_ensemble_model, the sampler generates samples from the
distribution
\[ p(\mathbf{\phi},\mathbf{\theta}^{(1)},\ldots,\mathbf{\theta}^{(T)}\,|\,\mathbf{w}^{(1)},\mathbf{w}^{(2)},\ldots,\mathbf{w}^{(T)})\,. \] A sample from the posterior of \(\mathbf{\phi}\) is then produced by simple marginalisation
\[ p(\mathbf{\phi}\,|\,\mathbf{w}^{(1)},\ldots,\mathbf{w}^{(T)}) = \int p(\mathbf{\phi},\mathbf{\theta}^{(1)},\ldots,\mathbf{\theta}^{(T)}\,|\,\mathbf{w}^{(1)},\mathbf{w}^{(2)},\ldots,\mathbf{w}^{(T)}) \,d\mathbf{\theta}^{(1)}\cdots d\mathbf{\theta}^{(T)}\,. \] Alternatively, directly sampling from the joint distribution of \(\mathbf{\phi}\) and \(\mathbf{\theta}^{(1)},\ldots,\mathbf{\theta}^{(T)}\) can be avoided by noting that
\[ \begin{split} &p(\mathbf{\phi}\,|\,\mathbf{w}^{(1)},\ldots,\mathbf{w}^{(T)}) = \int p(\mathbf{\phi},\mathbf{\theta}^{(1)},\ldots,\mathbf{\theta}^{(T)}\,|\,\mathbf{w}^{(1)},\mathbf{w}^{(2)},\ldots,\mathbf{w}^{(T)}) \,d\mathbf{\theta}^{(1)}\cdots d\mathbf{\theta}^{(T)}\\ &\propto \int p(\mathbf{w}^{(1)},\mathbf{w}^{(2)},\ldots,\mathbf{w}^{(T)}\,|\,\mathbf{\phi},\mathbf{\theta}^{(1)},\ldots,\mathbf{\theta}^{(T)})p(\mathbf{\phi},\mathbf{\theta}^{(1)},\ldots,\mathbf{\theta}^{(T)}) \,d\mathbf{\theta}^{(1)}\cdots d\mathbf{\theta}^{(T)}\\ &=p(\mathbf{\phi})\int \bigg(\prod_{t = 1}^{T}p(\mathbf{w}^{(t)}\,|\,\mathbf{\phi},\mathbf{\theta}^{(t)})\bigg)\bigg(\prod_{t = 1}^{T - 1}p(\mathbf{\theta}^{(t + 1)}\,|\,\mathbf{\phi},\mathbf{\theta}^{(t)})\bigg)p(\mathbf{\theta}^{(1)}\,|\,\mathbf{\phi})\,d\mathbf{\theta}^{(1)}\cdots d\mathbf{\theta}^{(T)}\\ &= p(\mathbf{\phi})\int \bigg(\prod_{t = 1}^{T-1}p(\mathbf{w}^{(t + 1)}\,|\,\mathbf{\phi},\mathbf{\theta}^{(t + 1)})p(\mathbf{\theta}^{(t + 1)}\,|\,\mathbf{\phi},\mathbf{\theta}^{(t)})\bigg)p(\mathbf{\theta}^{(1)}\,|\,\mathbf{\phi})p(\mathbf{w}^{(1)}\,|\,\mathbf{\phi},\mathbf{\theta}^{(1)})\,d\mathbf{\theta}^{(1)}\cdots d\mathbf{\theta}^{(T)}\,.\\ \end{split} \] The integral can be evaluated iteratively using the Kalman filter and the fixed parameters \(\mathbf{\phi}\) can be sampled using MCMC.
The four currently available configurations of a
EcoEnsemble model were tested with the new
explicit option and compared with the kalman
option. The runs used the non-hierarchical model, specified by
lkj priors, the hierarchical model, specified by
hierarchical priors, the drivers model and the
hierarchical drivers model, both specified by
drivers = T. Code and data is taken from the vignettes, IncludingDrivers and EcoEnsemble.
The plot below compares posterior summaries between the
explicit and kalman options. It shows the
difference in posterior means and distributions for the shared
state-space parameters used in posterior state reconstruction for 14
chains. As each chain is independent, each chain would have persistent
discrepancies between explicit and kalman if
the posterior distributions are different.
The parameters with the greatest difference is plotted below.
There are no considerable differences in the posterior output between
the explicit or kalman sampler.
The table below summarises the speed increase of the
explicit over the kalman option, for the
example models found in the vignettes. ESS is per second spent sampling,
discarding the burn-in time.
| Diagnostic | E | K | E | K | E | K | E | K |
|---|---|---|---|---|---|---|---|---|
| Run time | ||||||||
| Sampling run time (s) | 1004 | 7031 | 2559 | 11482 | 6381 | 17373 | 1014 | 11632 |
| Convergence | ||||||||
| Max R-hat | 1.01 | 1.01 | 1.02 | 1 | 1.06 | 1.01 | 1.01 | 1.01 |
| Bulk diagnostics | ||||||||
| Min ESS | 415 | 735 | 181 | 982 | 97.6 | 1161 | 966 | 1251 |
| Median ESS | 3288 | 3590 | 2875 | 3794 | 2158 | 3847 | 3139 | 3460 |
| Min ESS/s | 0.414 | 0.104 | 0.0707 | 0.0856 | 0.0153 | 0.0668 | 0.953 | 0.108 |
| Median ESS/s | 3.28 | 0.511 | 1.12 | 0.33 | 0.338 | 0.221 | 3.1 | 0.297 |
| Min ESS/N | 0.104 | 0.184 | 0.0453 | 0.246 | 0.0244 | 0.29 | 0.242 | 0.313 |
| Median ESS/N | 0.822 | 0.898 | 0.719 | 0.948 | 0.539 | 0.962 | 0.785 | 0.865 |
| Tail diagnostics | ||||||||
| Min ESS | 840 | 232 | 153 | 1107 | 158 | 651 | 1020 | 749 |
| Median ESS | 3311 | 3404 | 2892 | 3412 | 2397 | 3423 | 3157 | 3331 |
| Min ESS/s | 0.837 | 0.033 | 0.0599 | 0.0964 | 0.0248 | 0.0375 | 1.01 | 0.0644 |
| Median ESS/s | 3.3 | 0.484 | 1.13 | 0.297 | 0.376 | 0.197 | 3.11 | 0.286 |
| Min ESS/N | 0.21 | 0.058 | 0.0383 | 0.277 | 0.0396 | 0.163 | 0.255 | 0.187 |
| Median ESS/N | 0.828 | 0.851 | 0.723 | 0.853 | 0.599 | 0.856 | 0.789 | 0.833 |
Between the two options, explicit results in a lower ESS
for each model configuration, but kalman has a lower ESS
per second. ESS per run is higher for kalman in the
non-drivers cases. So in the explicit case, further
iterations may be necessary to achieve the same ESS. Regardless, the run
time for both the sampling and the warm-up is much faster for the
explicit sampler. The pattern seen is not different between
the bulk and tail ESS.
The two options induce different posterior geometries, so the stan (Stan Development Team (2020)) No-U-Turn Sampler’s diagnostics can differ. The table below summarises the diagnostics for each model formulation.
| Diagnostic | E | K | E | K | E | K | E | K |
|---|---|---|---|---|---|---|---|---|
| Divergences | 1 | 57 | 4 | 13 | 0 | 9 | 8 | 37 |
| Treedepth hits | 0 | 0 | 3996 | 0 | 4000 | 0 | 0 | 0 |
| Max treedepth seen | 9 | 8 | 10 | 8 | 10 | 8 | 9 | 9 |
| Min E-BFMI | 0.828 | 0.825 | 0.726 | 0.777 | 0.769 | 0.797 | 0.847 | 0.82 |
| Median E-BFMI | 0.863 | 0.847 | 0.81 | 0.819 | 0.852 | 0.812 | 0.946 | 0.838 |
In these examples, the explicit option frequently
reaches the maximum tree depth. This is because FishSUMS, one of the
simulators included in the model, does not have estimates for sole,
causing a flat posterior geometry. Because this warning primarily
reflects sampling efficiency rather than invalid inference, it is
suppressed when it occurs in isolation. However, if multiple warnings
are given by stan, all warnings are printed. The explicit
option results in a lower ESS, which may lead to some warnings, but
these warnings may not be concerning the variable of interest in your
model. Therefore it may not have an influence on your model. See this vignette for further information.