1. About the Example Dataset


Summary


Description

To keep the package vignettes self-contained, TemporalModelR ships a small synthetic dataset that the entire workflow can run against in seconds, without requiring you to download external occurrence or environmental data. The dataset is deliberately small but complete, including everything a real temporally explicit SDM workflow would need. The small dataset is meant to represent a simple but changing landscape to visualize the utility of this package and the variety of the types of data that it may be useful for.

This vignette describes the dataset in detail so that the workflow vignettes (Preprocessing temporally explicit data, Modeling, Post-processing) can refer back to a single source for what’s in inst/extdata/ and data() rather than explaining the dataset through each other vignette. If you’re working through the package for the first time, read this first.


Overview

The included dataset is generated over the following spatial and temporal dimensions:

Spatial. A 15 × 30 cell grid at 100 m resolution, giving a 3000 m × 1500 m study area in a custom synthetic local CRS (a Transverse Mercator projection anchored at the equator and prime meridian).

Temporal. Fifteen years (labeled 1 through 15) and four seasons (Spring, Summer, Autumn, Winter).

The example landscape has three primary environmental variables driving suitability for our example species: Elevation, Forest Cover, and Precipitation. Elevation is representative of a temporally static variable which will not change over the 15 year study period. Forest cover is representative here of a temporally dynamic variable which changes across time and is measured at a single time step (annually). Precipitation is representative here of a temporally dynamic variable which is measured at compound time steps (here, measurements are made seasonally so that each precipitation measurement is associated with both a year and season). We also include a simplified ‘annual precipitation’ dataset for alternative simplified examples.

Our ‘example species’ can be found in mid-high elevations, in areas of high forest cover, and moderate to high precipitation.

Over the time period of the example dataset, we deliberately show an example of deforestation on the landscape in our forest cover dataset, as well as interannual variability and noise in our precipitation dataset. These allow for us to visualize areas of suitability loss over time in addition to the interannual dynamics of suitability over time. These signals are intentionally placed to highlight TemporalModelR’s ability to show this spatiotemporal variability on the landscape.


Landscape rasters

The bundled raw rasters can be found in inst/extdata/rasters_raw/ and contain:

These can all be loaded from the system for any example analyses:

library(TemporalModelR)
library(terra)
#> terra 1.8.54
library(sf)
#> Linking to GEOS 3.13.1, GDAL 3.11.0, PROJ 9.6.0; sf_use_s2() is TRUE

raw_dir <- system.file("extdata/rasters_raw",
                       package = "TemporalModelR")

Workflow vignettes typically use one of two predictor sets:

Elevation

The elevation surface is fully static across the time series and is the only purely static predictor:

elev <- rast(file.path(raw_dir, "elevation.tif"))

plot(elev, main = "Elevation (m)")

Forest cover and annual precipitation across years

Forest cover and annual precipitation are the two dynamic annual predictors. Plotting them side by side with each row representing one year makes the temporal change in each visible at the same time. We visualize every other year below:

years_to_plot <- seq(1, 15, by = 2)

forest_files  <- file.path(raw_dir,
                           paste0("forest_cover_", years_to_plot, ".tif"))
pr_ann_files  <- file.path(raw_dir,
                           paste0("pr_ann_",      years_to_plot, ".tif"))

### Interleave forest and precip so each row of the plot grid is one year
forest_pr_paths        <- c(rbind(forest_files, pr_ann_files))
forest_pr_stack        <- rast(forest_pr_paths)
names(forest_pr_stack) <- c(rbind(paste("Forest_yr", years_to_plot),
                                  paste("Pr_ann_yr", years_to_plot)))

plot(forest_pr_stack, nc = 2)

The left column shows forest cover thinning in two locations: a gradual loss on the northeast hill starting around year 4 and a faster loss in a southwest-central patch starting around year 7. The right column shows annual precipitation with a slight overall decline plus the wet (year 3 and year 9) and dry (year 11) years that stand out from their neighbors.

Seasonal precipitation within a year

Seasonal precipitation multiplies the annual base by season: Spring and Autumn are the wettest times of year, Summer is driest, and Winter is intermediate. Year 1 across all four seasons:

season_names <- c("Spring", "Summer", "Autumn", "Winter")

prseas_y1_stack <- rast(file.path(raw_dir,
                                  paste0("prseas_1_",
                                         season_names, ".tif")))

names(prseas_y1_stack) <- season_names

plot(prseas_y1_stack,
     range = c(0, max(values(prseas_y1_stack), na.rm = TRUE)))

The spatial structure is preserved across seasons; the seasons differ in overall magnitude.

Occurrence data

We also generated an example dataset of 150 ‘species occurrence locations’ across the 15 year / 4 season time frame. The example points represent a high-elevation forest specialist with moderate to high moisture requirements.

First, points are generated for every location/year/season combination above a simple threshold for each variable of interest, with only combinations meeting all four environmental filters counting as a candidate occurrence site:

Winter is excluded from sampling entirely, so the filter is applied only across the three remaining seasons (Spring, Summer, Autumn) × 15 years = 45 candidate year-season slices.

We apply spatial and temporal autocorrelation to a random sampling algorithm to subset our candidate points across time into only 150 samples, resulting in a clustered, ecologically plausible occurrence dataset distributed across space, year, and season, with realistic survey biases.

The final example points database can be called from the system:

pts_file <- system.file("extdata/points/synthetic_occurrence_points.csv",
                        package = "TemporalModelR")
pts <- utils::read.csv(pts_file)

head(pts)
#>      x    y year season pres
#> 1 2250  350    1 Autumn    1
#> 2 2050  250    1 Autumn    1
#> 3 2350  450    1 Autumn    1
#> 4  250  850    1 Spring    1
#> 5  850 1050    1 Spring    1
#> 6   50 1150    1 Spring    1


nrow(pts)
#> [1] 150


table(pts$year, pts$season)
#>     
#>      Autumn Spring
#>   1       3      9
#>   2       5      4
#>   3       4      4
#>   4       5      7
#>   5       5      4
#>   6       0      3
#>   7      10      6
#>   8       5     11
#>   9       3      6
#>   10      5      7
#>   11      4     10
#>   12      3      1
#>   13      3      8
#>   14      0      4
#>   15      2      9

To see the distribution of points across both space and time, plot each year-season combination on its own panel. Each row of the grid corresponds to one of the 15 years; each column corresponds to one of the three sampled seasons (Spring, Summer, Autumn). Empty panels indicate year-season combinations with no points:

seasons <- c("Spring", "Summer", "Autumn")
study_extent <- ext(0, 3000, 0, 1500)

opar <- par(no.readonly = TRUE)

par(mfrow = c(15, 3),
    mar   = c(1.5, 1.5, 1.5, 0.5),
    oma   = c(2, 2, 2, 1))

for (yr in 1:15) {
  for (sea in seasons) {
    sub <- pts[pts$year == yr & pts$season == sea, ]

    plot(NULL,
         xlim = c(0, 3000), ylim = c(0, 1500),
         asp  = 1, xaxt = "n", yaxt = "n",
         xlab = "", ylab = "",
         main = paste0("Year ", yr, " - ", sea),
         cex.main = 0.9)

    rect(0, 0, 3000, 1500, border = "grey70")

    if (nrow(sub) > 0) {
      points(sub$x, sub$y, pch = 19, cex = 0.7, col = "darkblue")
    }
  }
}


par(opar)

Together, this points dataset and the rasters above make up the landscape and species occurrence data for all of the example applications presented in this package’s vignettes.


Pre-computed objects and other bundled files

Alongside the raw inputs, the package ships pre-computed outputs of the full preprocessing and modeling pipelines as data() objects to be called into vignettes. Two sets exist, one for the annual workflow and one for the seasonal workflow. The workflow to generate these is shown in the package vignettes, but stable saved copies are included in the package data so users can jump straight to any phase of the workflow without re-running upstream steps.

Pre-computed data() objects

Intermediate raster and point files

Additionally, inst/extdata/ contains raster and point files corresponding to intermediate steps throughout various vignettes. These are bundled so that users may call them directly and avoid re-running previous analyses just to produce them. Each subdirectory can be loaded from the system with system.file():

pred_dir <- system.file("extdata/predictions",
                        package = "TemporalModelR")
list.files(pred_dir, pattern = "\\.tif$")

The bundled subdirectories are: