Abstract
sjSDM is a scalable joint species distribution model based on Monte-Carlo intergration of the joint likelihood.
Load the package via
library(sjSDM)sjSDM depends on the Anaconda distribution of python and pytorch. You will get a warning if you don’t have python installed. sjSDM depends on Anaconda which will be installed automatically if it is not available.
To install the CPU version, run:
install_sjSDM()For the GPU version (NVIDIA GPUs are only supported), run:
install_sjSDM(method = "gpu")More details on and trouble-shooting for installing the necessary dependencies is explained in the vignette dependencies, call
vignette("Dependencies", package = "sjSDM")To cite sjSDM in a publication, use
citation("sjSDM")sjSDM has a function to create test data. Here, we create a dataset with 3 environmental predictors, 5 species and 100 sites.
com = simulate_SDM(env = 3L, species = 5L, sites = 100L)The model is then fit with the function sjSDM(). You have to provide predictors (can be also a data.frame) and response as matrices.
model = sjSDM(Y = com$response, env = com$env_weights, iter = 100L, se=TRUE)Things you can do with the model output
coef(model)
summary(model)
getCov(model)sjSDM supports formula description for the predictors.
E.g. interaction with intercept and calculate p-values:
model = sjSDM(Y = com$response,
env = linear(data = com$env_weights, formula = ~ X1*X2,), iter = 50L, se = TRUE)
summary(model)E.g. quadratic effect without intercept:
model = sjSDM(Y = com$response,
env = linear(data = com$env_weights, formula = ~0+ I(X1^2)), iter = 50L, se = TRUE)
summary(model)jSDMs account for correlation between species within communities (sites), in real datasets, however, communities (sites) are often additionally correlated (== spatial autocorrelation).
Let’s first simulate test data: 1) Simulate jSDM without a link (normal response)
com = simulate_SDM(env = 3L, species = 5L, sites = 100L,
link = "identical", response = "identical")
X = com$env_weights
Y = com$responseXYcoords = matrix(rnorm(200), 100, 2)+2
WW = as.matrix(dist(XYcoords))
spatialResiduals = mvtnorm::rmvnorm( 5L, sigma = exp(-WW))Ysp = Y + t(spatialResiduals)
Y = ifelse(Ysp < 0, 0, 1) # multivariate probit modelUsing spatial eigenvectors as additional predictors is a common approach:
SPeigen = generateSpatialEV(XYcoords)
model = sjSDM(Y, env = linear(X, ~.), spatial = linear(SPeigen, ~0+.), iter = 100L)
summary(model)Type I analysis of variance for the two/three different components (environment, biotic associations, and space (optional)).
an = anova(model)
plot(an)We can also partition the effects of the different components
imp = importance(model)
plot(imp)sjSDM supports l1 (lasso) and l2 (ridge) regularization: * alpha is the weighting between lasso and ridge * alpha = 0.0 corresponds to pure lasso * alpha = 1.0 corresponds to pure ridge
model = sjSDM(Y = com$response, env = linear(data = com$env_weights, formula = ~0+ I(X1^2),lambda = 0.5), iter = 50L)
summary(model)We can do the same for the species associations:
model = sjSDM(Y = com$response,
env = linear(data = com$env_weights, formula = ~0+ I(X1^2),lambda = 0.5),
biotic = bioticStruct(lambda =0.1),
iter = 50L)
summary(model)model = sjSDM(Y, env = linear(X, ~X1+X2), spatial = linear(XYcoords, ~0+X1:X2, lambda = 0.4))
summary(model)com = simulate_SDM(env = 3L, species = 5L, sites = 100L)
X = com$env_weights
Y = com$response
# three fully connected layers with relu as activation function
model = sjSDM(Y = Y,
env = DNN(data = X, formula = ~., hidden = c(10L, 10L, 10L), activation = "relu"),
iter = 50L, se = TRUE)
summary(model)The methods for sjSDM() work also for the non-linear model:
getCov(model) # species association matrix
pred = predict(model) # predict on fitted data
pred = predict(model, newdata = X) # predict on new dataExtract and set weights of model:
weights = getWeights(model) # get layer weights and sigma
setWeights(model, weights)Plot the training history:
plot(model)model = sjSDM(Y, env = linear(X, ~X1+X2), spatial = DNN(XYcoords, ~0+X1:X2, lambda = 0.4))
summary(model)model = sjSDM(Y, env = DNN(X, ~X1+X2), spatial = DNN(XYcoords, ~0+X1:X2, lambda = 0.4))
summary(model)sjSDM supports other responses than presence-absence data: Simulate non-presence-absence data:
com = simulate_SDM(env = 3L, species = 5L, sites = 100L,link = "identical", response = "count") # samples from a Poisson distribution
X = com$env_weights
Y = com$responsemodel = sjSDM(Y, env = linear(X, ~.), se = TRUE, iter = 50L, family = poisson("log"))
summary(model)model = sjSDM(log(Y+0.01), env = linear(X, ~.), se = TRUE, iter = 50L, family = gaussian("identity"))
summary(model)The log-normalization is one way to transform species counts into quasi-probabilities in the range of [0,1]:
zero_one = function(x) function(x) (x-min(x))/(max(x) -min(x))
Y_pa = zero_one(log(Y+0.001))We can use the quasi probabilities the same way as presence-absence data:
model = sjSDM(Y_pa, env = linear(X, ~.), se = TRUE, iter = 50L, family = binomial("logit"))
summary(model)