Temporally Explicit Species Distribution Modeling in R

Connor Hughes, Mariana Castaneda-Guzman, Luis E. Escobar

Background
Package description
Installing the package
Workflow in TemporalModelR
Citation
Note on AI usage
Contributing
Getting help

Background

Ecological niche modeling (ENM) and species distribution modeling (SDM) are widely used to reconstruct and predict species environmental tolerances and geographic distributions, expanding ecological and biogeographic theory. Despite their widespread use, one aspect that has received relatively little attention is the temporally dynamic environment.

Environmental conditions shaping species distributions vary over time, including climate, resource availability, and habitat availability. As a result, the suitability of any given location may change as key environmental conditions become more or less favorable for a species. Most existing implementations of ENM assume a static relationship between species and environment: occurrences are pooled across years, predictors are time-averaged, and a single model is fit to the combined dataset. This works well when relationships are genuinely stable, but it carries three costs when they aren’t:

Temporal mismatch. Pairing each occurrence with long-term average conditions can misrepresent the environment the species actually experienced at the moment it was recorded, leading to over-generalized niche estimates and higher rates of both extrapolation and omission.
Lost training data. Standard spatial rarefaction retains one point per pixel, discarding temporally independent observations from the same location. When environments at a location have changed over time, those repeat observations carry unique information about the species niche.
No temporal dynamics in the output. Static models produce a single predictive surface and cannot directly answer questions about how suitable conditions for a species have changed over time, or what environmental drivers are responsible for those changes.

Temporally-explicit ENM offer an alternative practice which preserve the time stamp on each observation. Each occurrence is paired with the environmental conditions it experienced at the time step it was recorded, rather than against a long-term average or other static variable. The resulting models are sensitive to year-to-year (or season-to-season) variation in the species-environment relationship, and predictions are projected as time series rather than as a single static surface.

TemporalModelR is an R package for building temporally explicit SDMs end-to-end, from raw occurrence records to time-resolved trend analyses. It provides a consistent and replicable workflow for implementing temporally explicit ENM.

Package description

TemporalModelR operates across two complementary dimensions to support temporally explicit modeling:

E-space (Environmental Space) represents a location in terms of its environmental conditions, independent of geography. A species niche in E-space is its tolerance for a given set of environmental variables.
G-space (Geographic Space) represents a location in terms of where it physically sits on the landscape. Every point in G-space has a corresponding location in E-space.

Because environmental conditions change over time, the same point in G-space may move through E-space across years, decades, or seasons. Traditional, temporally-static workflows assume both G-space and E-space to be stable in time. TemporalModelR assumes only that the niche (i.e., the species tolerance in E-space) is stable across the study period, while the E-space coordinates of any G-space location may change over time. Species observations are therefore matched to E-space data correct in both space and time, the niche is modeled in time-independent E-space. The resulting niche estimate is projected back onto G-space at explicit time periods to produce temporally dynamic predictions across time rather than a single static surface.

The package structures the SDM workflow around three phases:

Preprocessing: aligning environmental rasters to a common grid, rarefying occurrence data in space and time, extracting environmental values matched to observation time, scaling rasters to a standardized range, partitioning into spatially and/or temporally independent cross-validation folds, and generating fold-stratified pseudoabsences for presence/absence models.
Modeling: fitting one of four supported algorithms (GLM, GAM, random forest, hypervolume) per cross-validation fold, with automatic threshold selection and per-fold evaluation metrics in both E-space (time-independent) and G-space (time-specific).
Postprocessing: combining per-fold predictions into a consensus binary stack for more straightforward assessments, classifying each pixel’s temporal trajectory via changepoint detection into never-suitable, always-suitable, no pattern, increasing, decreasing, or fluctuating in suitability, and aggregating patterns across user-defined spatial units like administrative regions or watersheds.

Every function in the package is built around the concept of time placeholders in raster filenames (forest_cover_2005.tif, pr_seas_2003_Spring.tif). Patterns map clean predictor names to filenames; time_cols defines which columns in your occurrence data correspond to those placeholders. For example, a column in species occurrence data which labeled year with subsequent values 2001, 2002, 2005, and a user defined variable name as forest_cover_year, would match to rasters forest_cover_2001, forest_cover_2002, and forest_cover_2005. Likewise, a databse with both columns year and month and a user defined variable name of pr_m_year_month would match to rasters like pr_m_1997_09.

This makes the workflow flexible to whether your time scale is annual, monthly, daily, or anything in between, so long as you have rasters data at the same temporal level. It also allows different predictors to vary at different time scales within the same model, flexibly allowing temporally static variables (e.g. elevation) to be processed in the same suite and used in the same models as temporally dynamic variables, which may themselves be measured at different time steps (e.g. annual forest cover estimates, monthly precipitation).

Installing the package

Note: Internet connection is required to install the package.

To install the latest release of TemporalModelR from CRAN use the following line of code:

install.packages("TemporalModelR")

The development version of TemporalModelR can be installed using the code below.

# install.packages("devtools")
devtools::install_github("CJHughes926/TemporalModelR")

Workflow in TemporalModelR

A typical TemporalModelR workflow follows these steps:

Preprocess — align rasters, rarefy occurrences, extract and optionally scale environmental values, partition into folds, and generate pseudoabsences (if needed)
Model — fit a GLM, GAM, random forest, or hypervolume model, assess time-independent model performance. Project the fitted model across space and time, assess temporally-explicit model performance
Postprocess — summarize predictions to consensus surfaces, identify temporal patterns, and aggregate by spatial unit

A brief description of each step is presented below. For full walkthrough of each phase, see the vignettes hosted on the TemporalModelR site under the Articles menu, also linked inline throughout this README.

The package ships a small synthetic dataset that every vignette runs against, so the full workflow can be reproduced end-to-end without downloading any external data. The dataset includes raw and aligned environmental rasters, an occurrence point database, intermediate point files, and pre-computed data() objects from each phase of the workflow. Reading the data set vignette first is recommended, as it provides the shared context every other vignette refers back to.

For a description of the synthetic data and the bundled data() objects used in the workflow vignettes, see the About the Example Dataset vignette.

Preprocessing

The preprocessing pipeline transforms raw occurrence records and environmental rasters into the structured inputs that downstream models expect. Six functions cover this phase. raster_align() standardizes rasters to a common projection, extent, and resolution to prevent misalignment in future analyses. spatiotemporal_rarefaction() subsets data to one point per pixel per time step, reducing pseudoreplication from your data but preserving truly unique temporally independent observations from the same location. temporally_explicit_extraction() extracts environmental values from the raster that matches each point’s observation time step, producing a table of observations with their relevant E-space conditions correct in both space and time, along with the means and standard deviations of each variable across the data set. scale_rasters() optionally uses those scaling parameters to z-score every raster, ensuring variables share a comparable range and contribute evenly to subsequent modeling algorithms sensitive to that. spatiotemporal_partition() builds cross-validation folds via one of four methods: purely spatial, purely temporal, spatiotemporal (combining both), or random as a baseline. Finally, generate_absences() produces fold-stratified pseudoabsences relevant for presence/absence models.

For a detailed workflow showcasing each preprocessing function, see the Preprocessing temporally explicit data vignette.

Modeling

TemporalModelR supports four modeling algorithms, all of which share the same data requirements (partition_result, pseudoabsence_result for presnce/absence models, a model_formula or model_vars). One model is fit per cross-validation fold; each is evaluated on its held-out test set, and for any continuous models a threshold is selected and applied to produce only binary results.

build_temporal_glm() fits one generalized linear model per fold with a binomial link function. Supports any combination of linear, polynomial, and interactive terms via standard R formula syntax.
build_temporal_gam() fits one generalized additive model per fold via mgcv::gam(). Allows for complex nonlinear relationships between environmental gradients and species presence.
build_temporal_rf() fits one random forest classifier per fold via randomForest::randomForest(). Uses ensembles of decision trees to capture complex nonlinear relationships and interactions.
build_temporal_hv() constructs n-dimensional hypervolumes per fold via Gaussian kernel density estimation or one-class support vector machine. The presence-only option; no pseudoabsences required.

Each model builder produces evaluation metrics across folds in E-space (time-independent), capturing overall niche estimation quality regardless of when the test points were observed. E-space evaluation gives a stable picture of model performance even in years where occurrence records are sparse.

Once a model is fit, generate_spatiotemporal_predictions() projects it across user-defined time steps. The function accepts any of the four model types via the model_result argument and writes one prediction raster per fold per time step alongside per-time-step evaluation metrics. The same call handles single-timestep and multi-timestep workflows given the relevant raster data is available: for example time_steps = data.frame(year = 1:15) produces 15 annual maps; expand.grid(year = 1:15, season = "Spring") produces 15 spring snapshots; expand.grid(year = 1:15, season = c("Spring", "Summer")) produces 30 maps spanning two seasons.

Once predictions are projected into G-space, time-specific assessment metrics become available, including cumulative binomial probability (CBP) tests assessing whether predictions in a given time step are better than random. G-space evaluation complements the E-space metrics from model fitting and both may be considered in model evaluation: G-space is robust and specific in years with substantial sample sizes but loses meaning when few records exist for a given time step, while E-space gives a more stable overall picture independent of time. Reporting both gives the most complete view of model performance.

For a full walkthrough of each modeling algorithm, including formula syntax, parameter choices, G-space projection, and diagnostic interpretation, see the four modeling vignettes: Modeling with a GLM, Modeling with a GAM, Modeling with a Random Forest, and Modeling with a Hypervolume.

Postprocessing

The postprocessing phase converts the per-fold prediction rasters into interpretable summaries. summarize_raster_outputs() collapses fold-level predictions into a binary consensus stack (where a user-specified number of folds agree on suitability) and a frequency raster (proportion of time steps each pixel was suitable) which explored the stability of G-space suitability projections across time. analyze_temporal_patterns() uses the fastcpd package to detect structured changes in each pixel’s suitability across the time series, classifying trajectories as never-suitable, always-suitable, no pattern, increasing, decreasing, or fluctuating. The changepoint analysis accounts for temporal autocorrelation by including each pixel’s previous-time-step value as a predictor in the change point detection model; users can optionally enable spatial autocorrelation, which adds the proportion of a pixel’s first-order (queen’s case) neighbors predicted as suitable as an additional covariate. analyze_trends_by_spatial_unit() aggregates pixel-level patterns across user-defined polygons (administrative units, watersheds, ecoregions), producing useful tables summarizing losses and gains in suitability over time across a these regions, and is useful for identifying regional variation in suitability trends.

For the full postprocessing workflow including consensus threshold selection, changepoint diagnostics, and zone aggregation, see the Postprocessing predictions vignette.

Citation

If you use TemporalModelR in your research, please cite:

Hughes, C., Castaneda-Guzman, M., & Escobar, L.E. (2026). TemporalModelR: An R Package for temporally-explicit species distribution modeling. [Journal information to be added]

Note on AI usage

To maintain high standards of code quality and documentation, we have used AI LLM tools while constructing code for this package. We used these tools for grammatical polishing and exploring technical implementation strategies for specialized functions. We manually checked and tested all code and documentation refined with these tools.

Contributing

We welcome contributions to improve TemporalModelR. To maintain the integrity and performance of the package, we follow a few core principles:

Quality over quantity: we prioritize well-thought-out, stable improvements over frequent, minor changes. Please ensure your code is well-documented and follows the existing style of the package.
Minimal dependencies: one of the goals of TemporalModelR is to remain efficient. We prefer solutions that use base R or existing dependencies. Proposals that introduce new package dependencies will be evaluated for their necessity.
AI-assisted code: if you use AI tools to generate code alternatives or improvements, please manually verify the logic and accuracy of the output and demonstrate the benefit in your Pull Request.
Testing: new features should include examples, and tests should be performed to ensure they work as intended and do not break existing workflows.

If you have an idea for a major change, please open an Issue first to discuss it with the maintainers.

Getting help

Report bugs or request features: GitHub Issues
Ask questions: GitHub Discussions