Geostatistics refers to the study of a spatially distributed variable of interest, which in theory is defined at every point over a bounded study region of interest. Statistical modelling and analysis for spatially oriented point-referenced outcomes play a crucial role in diverse scientific applications such as earth and environmental sciences, ecology, epidemiology, and economics. With the advent of Markov chain Monte Carlo (MCMC) algorithms, Bayesian hierarchical models have gained massive popularity in analyzing such point-referenced or, geostatistical data. These models involve latent spatial processes characterized by spatial process parameters, which besides lacking substantive relevance in scientific contexts, are also weakly identified and hence, impedes convergence of MCMC algorithms. Thus, even for moderately large datasets (~\(10^3\) or higher), the computation for MCMC becomes too onerous for practical use.
We introduce the R package spStack
that implements
Bayesian inference for a class of geostatistical models, where we
obviate the issues mentioned by sampling from analytically available
posterior distributions conditional upon some candidate values of the
spatial process parameters and, subsequently assimilate inference from
these individual posterior distributions using Bayesian predictive
stacking. Besides delivering competitive predictive performance as
compared to fully Bayesian inference using MCMC, our proposed algorithm
is embarrassingly parallel, thus drastically improves runtime and
elevating the utility of the package for a diverse group of
practitioners with limited computational resources at their disposal.
This package, to the best of our knowledge, is the first to implement
stacking for Bayesian analysis of spatial data.
Technical details surrounding the methodology can be found in the
articles Zhang, Tang, and Banerjee (2025) which discuss the case
where the distribution of the point-referenced outcomes are Gaussian,
and, in Pan et al. (2025) where the case of
non-Gaussian outcomes is explored. The code for this package is written
primarily in C/C++ with additional calls to FORTRAN routines for
optimized linear algebra operations. We leverage the
F77_NAME
macro to interface with legacy FORTRAN functions
in conjunction with efficient matrix computation libraries such as BLAS (Basic Linear Algebra
Subprograms) and LAPACK (Linear
Algebra Package) to implement our stacking algorithm.
The remainder of the vignette evolves as follows - the next two sections discuss Bayesian hierarchical spatial models for Gaussian and non-Gaussian outcomes, which is followed by a section providing brief details on predictive stacking and a section dedicated for illustration of functions in the package.
where we fix the noise-to-spatial variance ratio \(\delta^2 = \tau^2 / \sigma^2\), the process parameters \({\theta_{\text{sp}}}\) and the hyperparameters \(\mu_\beta\), \(V_\beta\), \(a_\sigma\) and \(b_\sigma\). In this package, we use the Matern covariogram specified by spatial decay parameter \(\phi\) and smoothness parameter \(\nu\) i.e., \({\theta_{\text{sp}}}= \{\phi, \nu\}\), given by \[\begin{equation} R(s, s'; {\theta_{\text{sp}}}) = \frac{(\phi \lvert s - s' \rvert)^\nu}{2^{\nu - 1} \Gamma(\nu)} K_\nu (\phi \lvert s - s' \rvert)). \end{equation}\] We utilize a composition sampling strategy to sample the model parameters from their joint posterior distribution which can be written as \[\begin{equation} p(\sigma^2, \beta, z \mid y) = p(\sigma^2 \mid y) \times p(\beta \mid \sigma^2, y) \times p(z \mid \beta, \sigma^2, y). \end{equation}\] We proceed by first sampling \(\sigma^2\) from its marginal posterior, then given the samples of \(\sigma^2\), we sample \(\beta\) and subsequently, we sample \(z\) conditioned on the posterior samples of \(\beta\) and \(\sigma^2\) (Banerjee 2020). More details can be found in Zhang, Tang, and Banerjee (2025).
The function spLMexact()
delivers samples from this
posterior distribution conditional on fixed hyperparameters. For
predictive stacking, use the function spLMstack()
.
Analyzing non-Gaussian spatial data typically requires introducing spatial dependence in generalized linear models through the link function of an exponential family distribution. Let \(y(s)\) be the outcome at location \(s \in \mathcal{D}\) endowed with a probability law from the natural exponential family, which we denote by \[\begin{equation} y(s) \sim \mathsf{EF}(x(s)^{ \scriptstyle \top }\beta + z(s); b, \psi_y) \end{equation}\] for some positive parameter \(b > 0\) and unit log partition function \(\psi_y\). Fixed effects regression and spatial dependence, e.g., \(x(s)^{{ \scriptstyle \top }}\beta + z(s)\), is introduced in the natural parameter, where \(x(s)\) is a \(p \times 1\) vector of predictors referenced with respect to \(s\), \(\beta\) is a \(p \times 1\) vector of slopes measuring the trend, \(z(s)\) is a zero-centered spatial process on \(\mathcal{D}\) specified by a scale parameter \(\sigma_z\) and a spatial correlation function \(R(\cdot, \cdot ; {\theta_{\text{sp}}})\) with \({\theta_{\text{sp}}}\) consisting of spatial-temporal decay and smoothness parameters.
Unlike in Gaussian likelihoods, inference is considerably encumbered by the inability to analytically integrate out the random effects and reduce the dimension of the parameter space. Iterative algorithms such as Markov Chain Monte Carlo (MCMC), thus attempt to sample from a very high-dimensional posterior distribution, and convergence is often hampered by high auto-correlations and weakly identified spatial process parameters \({\theta_{\text{sp}}}\).
This model is implemented using the function
spGLMexact()
when using fixed hyperparameters, and
spGLMstack()
when using predictive stacking.
We consider the following three point-referenced data -
Poisson count data: Here \(b = 1\) and \(\psi_y(t) = e^t\). \[\begin{equation}
\begin{split}
y(s_i) &\sim \mathsf{Poisson}(\lambda(s_i)), \quad i = 1, \dots,
n.\\
\lambda(s_i) & = \exp(x(s_i)^{ \scriptstyle \top }\beta + z(s_i))
\end{split}
\end{equation}\] This is accessed by setting
family = "poisson"
in the above functions.
Binomial count data: Here \(b = m(s_i)\) for each \(i\) and \(\psi_y(t) = \log(1 + e^t)\). \[\begin{equation}
\begin{split}
y(s_i) &\sim \mathsf{Binomial}(m(s_i), \pi(s_i)), \quad i = 1,
\dots, n.\\
\pi(s_i) & = \mathrm{ilogit}(x(s_i)^{ \scriptstyle \top }\beta +
z(s_i))
\end{split}
\end{equation}\] This is accessed by setting
family = "binomial"
in the above functions.
Binary data: Here \(b
= 1\) and \(\psi_y(t) = \log(1 +
e^t)\). \[\begin{equation}
\begin{split}
y(s_i) &\sim \mathsf{Bernoulli}(\pi(s_i)), \quad i = 1, \dots, n.\\
\pi(s_i) & = \mathrm{ilogit}(x(s_i)^{ \scriptstyle \top }\beta +
z(s_i))
\end{split}
\end{equation}\] This is accessed by setting
family = "binary"
in the above functions.
Following Bradley and Clinch (2024), we introduce a Bayesian hierarchical spatial model as \[\begin{equation} \begin{split} y(s_i) \mid \beta, z, \xi & \sim \mathsf{EF}\left(x(s_i)^{ \scriptstyle \top }\beta + z(s_i) + \xi_i - \mu_i; b_i, \psi_y\right), i = 1, \ldots, n\\ \beta \mid \sigma^2_\beta &\sim \mathsf{N}(0, \sigma^2_\beta V_\beta), \quad \sigma^2_\beta \sim \mathsf{IG}(\nu_\beta/2, \nu_\beta/2)\\ z \mid \sigma^2_z &\sim \mathsf{N}\left(0, \sigma^2_z R(\chi; {\theta_{\text{sp}}})\right), \quad \sigma^2_z \sim \mathsf{IG}(\nu_z/2, \nu_z/2),\\ \xi \mid \beta, z, \sigma^2_\xi, \alpha_\epsilon &\sim \mathsf{GCM_c}\left(\tilde{\mu}_\xi, H_\xi, \epsilon, \kappa_\xi; \psi_\xi\right), \end{split} \end{equation}\] where \(\mu = (\mu_1, \ldots, \mu_n)^{ \scriptstyle \top }\) denotes the discrepancy parameter. We fix the spatial process parameters \({\theta_{\text{sp}}}\), the boundary adjustment parameter \(\epsilon\) and the hyperparameters \(V_\beta\), \(\nu_\beta\), \(\nu_z\) and \(\sigma^2_\xi\). The term \(\xi\) is known as the fine-scale variation term which is given a conditional generalized conjugate multivariate distribution (\(\mathrm{GCM_c}\)) as prior. For details, see Pan et al. (2025).
We consider a rich family of Bayesian spatial-temporal model with spatially-temporally varying regression coefficients. Suppose \(\ell = (s, t)\), with location \(s \in \mathcal{D}\) and time \(t \in \mathcal{T}\), denote a spatial-temporal coordinate in \(\mathcal{L} = \mathcal{D} \times \mathcal{T}\).
Let \(\mathcal{L} = \{\ell_1, \ldots, \ell_n\}\) be a fixed set of \(n\) distinct space-time coordinates in \(\mathcal{D}\), where \(y(\mathcal{L}) = (y(\ell_1), \dots, y(\ell_n))^\top\), which we simply denote by \(y\), is the vector of observed outcomes, each distributed as a member of the natural exponential family with log partition function \(\psi_y\). Suppose, \(x(\ell_i)\) is a \(p\times 1\) vector of predictors, \(\beta\) is the corresponding \(p \times 1\) vector of slopes (fixed effects), \(\tilde{x}(\ell_i)\) is \(r\times 1\) (\(r \leq p\)) consisting of predictors in \(x(\ell_i)\) that are posited to have spatially-temporally varying regression coefficients \(z(\ell_i) = (z_1(\ell_i), \ldots, z_r(\ell_i))^\top\), where each \(z_j(\ell_i)\) is a spatially-temporally varying coefficient for the predictor \(\tilde{x}_j(\ell_i)\), \(\xi_i\) is a fine-scale variation term and \(\mu_i\) is the discrepancy parameter (see above). We introduce spatially-temporally varying coefficients in \(\eta(\ell)\) as \[\begin{equation} \begin{split} y(\ell_i) &\mid \beta, z(\ell_i), \xi_i, \mu_i \overset{\text{ind}}{\sim} \mathsf{EF}\left(\eta(\ell_i) + \xi_i - \mu_i; b_i, \psi_y \right), \ i=1,\ldots,n\;,\\ \eta(\ell) &= x(\ell)^{ \scriptstyle \top }\beta + \tilde{x}(\ell)^{{ \scriptstyle \top }}z(\ell), \quad \beta \mid \sigma^2_\beta, \mu_\beta, V_\beta \sim \mathsf{N}(\mu_\beta, \sigma^2_\beta V_\beta), \quad \sigma^2_\beta \sim \pi_\beta(\sigma^2_\beta) \;,\\ z(\ell) &\mid \theta_z, {\theta_{\text{sp}}}\sim \mathsf{GP}(0, C_z(\cdot, \cdot; {\theta_{\text{sp}}}, \theta_z))\;,\quad \theta_z \sim \pi_{z}(\theta_z)\;, \\ \xi &\mid \beta, z, \mu, \alpha_\epsilon, \kappa_\epsilon, \sigma^2_\xi \sim \mathsf{GCM_c}(\tilde{\mu}_\xi, H_\xi, \alpha_\xi, \kappa_\xi, D_\xi, \pi_\xi; \psi_\xi), \ \sigma^2_\xi \sim \pi_\xi(\sigma^2_\xi), \ p(\mu) \propto 1 \;, \end{split} \end{equation}\] where \(z(\ell) = (z_1(\ell), \ldots, z_r(\ell))^{ \scriptstyle \top }\) is a multivariate Gaussian process with a separable cross-covariance function \(C_z(\cdot, \cdot; {\theta_{\text{sp}}}, \theta_z)\), characterized by process parameters \({\theta_{\text{sp}}}\) which controls the within-process spatial-temporal correlation, and \(\theta_z\) which controls the between-process covariance matrix . Given \(\theta_z\), the \(nr \times 1\) vector \(z = (z_1^{ \scriptstyle \top }, \ldots, z_r^{ \scriptstyle \top })^{ \scriptstyle \top }\), where \(z_j = (z_j(\ell_1), \ldots, z_j(\ell_n))^{ \scriptstyle \top }\) for \(j = 1, \ldots, r\), follows a multivariate Gaussian distribution with mean \(0\) and \(nr \times nr\) covariance matrix \(C_z(\mathcal{L}; {\theta_{\text{sp}}}, \theta_z)\).
This model is implemented by the function stvcGLMexact()
under fixed hyperparameters, and stvcGLMstack()
when using
predictive stacking. We also implement different specifications for
\(C_z\) in this package as follows.
Independent spatial-temporal process: We
consider \(r\) Gaussian
spatial-temporal processes \[\begin{equation}\label{eq:z_ind}
\begin{split}
\text{Independent processes: } z_j(\ell) \mid\sigma_{z_j}^2,
{\theta_{\text{sp}}}_j & \overset{\text{ind}}{\sim} \mathsf{GP}(0,
\sigma_{z_j}^2 R_j(\cdot, \cdot; {\theta_{\text{sp}}}_j)),\\
\sigma_{z_j}^2 & \sim \mathsf{IG}(\nu_{z_j}/2, \nu_{z_j}/2), \quad j
= 1, \ldots, r,
\end{split}
\end{equation}\] where \(\sigma_{z_j}^2\) is the variance parameter
corresponding to process \(z_j(\ell)\).
This corresponds to the covariance matrix \(C_z(\mathcal{L}; {\theta_{\text{sp}}}, \theta_z) =
\oplus_{j = 1}^r \sigma_{z_j}^2 R_j({\theta_{\text{sp}}}_j)\)
with \({\theta_{\text{sp}}}= \{
{\theta_{\text{sp}}}_j : j = 1, \ldots, r\}\), where \({\theta_{\text{sp}}}_j\) denotes covariance
kernel parameters for the \(j\)th
process, and \(\theta_z = \{\sigma^2_{z_1},
\ldots, \sigma^2_{z_r}\}\). This is accessed by setting the
option process.type = "independent"
in the above
functions.
Independent shared spatial-temporal process:
This corresponds to the above with \({\theta_{\text{sp}}}_j =
{\theta_{\text{sp}}}\) and \(\sigma^2_{zj} = \sigma^2_z\) for all \(j = 1, \ldots, r\). This is accessed by
setting the option process.type = "independent.shared"
in
the above functions.
Multivariate spatial-temporal process: We can
introduce dependence among the elements of the \(r\times 1\) vector \(z(\ell)\) using \[\begin{equation}\label{eq:multi_z}
\text{Multivariate process: }z(\ell) \mid\Sigma \sim \mathsf{GP}(0,
R(\cdot, \cdot; {\theta_{\text{sp}}})\Sigma), \quad \Sigma \sim
\mathsf{IW}(\nu_z + 2r, \Psi)\;,
\end{equation}\] where \(\mathcal{GP}
(0, R(\cdot, \cdot; {\theta_{\text{sp}}})\Sigma)\) is an \(r\times 1\) multivariate Gaussian process
with matrix-valued cross-covariance function \(R(\cdot, \cdot;
{\theta_{\text{sp}}})\Sigma\) and \(\Sigma\) is an \(r \times r\) positive definite random
matrix. This corresponds to the spatial-temporal covariance matrix \(C_z(\mathcal{L}; {\theta_{\text{sp}}}, \theta_z) =
\Sigma \otimes R({\theta_{\text{sp}}})\) with \(\theta_z = \Sigma\). We place an
inverse-Wishart prior on the scale parameter with shape \(\nu_z + 2r\) and \(r\times r\) positive definite scale matrix
\(\Psi\), given by \(\pi(\theta_z) = \mathsf{IW}(\Sigma \mid\nu_z + 2r,
\Psi)\). This is accessed by setting the option
process.type = "multivariate"
in the above
functions.
Following Yao et al. (2018), we consider a set of candidate models based on a grid of values of the parameters in \(\{ {\theta_{\text{sp}}}, \delta^2 \}\) for the Gaussian case, and \(\{ {\theta_{\text{sp}}}, \epsilon \}\) for the non-Gaussian case, as will be supplied by the user. We build a set of candidate models based on the Cartesian product of the collection of values for each individual parameter as \(\mathcal{M} = \{M_1, \ldots, M_G\}\). Then, for each \(g = 1, \ldots, G\), we sample from the posterior distribution \(p(\sigma^2, \beta, z \mid y, M_g)\) under the model \(M_g\) and find leave-one-out predictive densities \(p(y_i \mid y_{-i}, M_g)\). Then we solve the optimization problem \[\begin{equation} \begin{split} \max_{w_1, \ldots, w_G}& \, \frac{1}{n} \sum_{i = 1}^n \log \sum_{g = 1}^G w_g p(y_i \mid y_{-i}, M_g) \\ \text{subject to} & \quad w_g \geq 0, \sum_{g = 1}^G w_g = 1 \end{split} \end{equation}\] to find the optimal stacking weights \(\hat{w}_1, \ldots, \hat{w}_G\). After obtaining the optimal stacking weights, posterior inference of any quantity of interest subsequently proceed from the stacked posterior, \[\begin{equation} \tilde{p}(\cdot \mid y) = \sum_{g = 1}^G \hat{w}_g p(\cdot \mid y, M_g). \end{equation}\]