---
title: "Coarse-to-fine spatial downscaling (CF-DS): Application example"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{spCF_downscale}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
references:
  - id: murakami2026cfds
    type: report
    author:
      - family: Murakami
        given: Daisuke
      - family: Chun
        given: Yongwan
      - family: Yoshida
        given: Takahiro
      - family: Seya
        given: Hajime
    title: "Scalable coarse-to-fine spatial downscaling"
    issued:
      year: 2026
    archive: ArXiv
---

```{=html}
<style>
figure, img {
  border: none !important;
  box-shadow: none !important;
}
</style>
```

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

This vignette demonstrates how to perform spatial downscaling (areal interpolation) using the spCF package. Spatial downscaling aims to predict disaggregate-level (fine-scale) responses from aggregate-level (coarse-scale) observations. CF-DS predicts the disaggregate-level spatial process through a coarse-to-fine approach that sequentially captures spatial patterns across increasingly finer scales, while imposing an aggregation constraint so that the disaggregate-level predictions aggregate exactly to the observed aggregate-level values. This framework is particularly suitable for moderate- to large-sized datasets. See @murakami2026cfds for further details.

## Data and setup

Let us load the required packages

```{r setup}
library(spCF)
library(sf)
library(CARBayesdata)
```

The *pollutionhealthdata* and *GGHB.IZ* datasets included in the *CARBayesdata* package are used in this example. The data comprise the observed respiratory hospitalisation counts (`observed`) across 271 intermediate zones in Greater Glasgow. The 271 zones are treated as disaggregate-level units and grouped into 30 artificial aggregate-level regions. The goal is to downscale the responses observed for these 30 regions to the 271 zones.

```{r}
set.seed(123)
data(GGHB.IZ)
data(pollutionhealthdata)
d  <- pollutionhealthdata[pollutionhealthdata$year == 2010, ]
ar <- merge(GGHB.IZ, d, by = "IZ")
```

The disaggregate-level data consist of center coordinates, covariates (`pm10`, `jsa`, and `price`), and a proportional allocation weight (`prop_weight`) used to distribute the aggregate-level response across disaggregate units. Here, expected number of hospitalisation counts (`expected`) is used.

```{r}
### Disaggregate-level data (271 units)
coords      <- st_coordinates(suppressWarnings(st_centroid(ar)))
x           <- data.frame(pm10 = ar$pm10, jsa = ar$jsa, price = ar$price)
prop_weight <- as.numeric(ar$expected)
```

For demonstration purposes, the aggregate-level data are generated by clustering the disaggregate-level units into 30 regions. The variable `agg_id` indicates the corresponding aggregate-level unit:

```{r}
### Aggregate-level ID (30 units)
agg_id      <- as.integer(stats::kmeans(coords, centers = 30)$cluster)
```

The aggregate-level response `Y` is then obtained by aggregating the observed counts within each aggregate-level unit. Two types of response are possible: `Y_type = "sum"` for downscaling extensive (count-like) data (e.g., population), where $Y_I$ is the sum over each aggregate-level unit, and `Y_type = "mean"` for intensive (density-like) data (e.g., population density), where $Y_I$ is the mean. Here the count is extensive, so `"sum"` is used:

```{r}
### Y_type = "sum"  : Y_I = sum(response for each aggregate-level unit)
### Y_type = "mean" : Y_I = mean(response for each aggregate-level unit)
Y_type      <- "sum"
Y           <- as.numeric(stats::aggregate(ar$observed, by = list(agg_id),
                          FUN = if (Y_type == "sum") sum else mean)[, 2])
```

## Coarse-to-fine spatial downscaling (CF-DS)

In CF-DS, the spatial process is defined as a sum of scale-wise processes, where the number of spatial scales, R, is optimized via holdout validation. A smaller R, corresponding to early stopping, allows the spatial process to capture only large-scale patterns, whereas a larger R enables the spatial process to represent small-scale patterns. The `cf_downscale_hv` function performs holdout validation as follows:

```{r}
mod_hv   <- cf_downscale_hv(Y = Y, Y_type = Y_type, x = x,
                            prop_weight = prop_weight,
                            coords = coords, agg_id = agg_id)
```

By default, 75% of the aggregate-level units are used for training and the remaining units are used for validation (`train_rat = 0.75`). Alternatively, the training units can be specified explicitly using the `id_train` argument. As shown in the output, the validation score is monitored as learning proceeds from the coarsest scale (Scale 1) to finer scales, and the learning terminates once the aggregation constraint is satisfied and the validation score no longer improves.

After holdout validation, the full model is trained using the `cf_downscale` function as follows:

```{r}
mod      <- cf_downscale(Y = Y, x = x, prop_weight = prop_weight,
                         coords = coords, agg_id = agg_id, mod_hv = mod_hv)
```

By default, a per-area multiplicative adjustment (`adj = TRUE`, the default) is applied to satisfy the aggregation constraint, so that the downscaled predictions aggregate exactly to the observed `Y`. Alternatively, if `adj = FALSE`, the constraint is satisfied only approximately, which may be preferable when `Y` contains noise.

The estimated regression coefficients, standard deviations of the scalewise components, and error statistics are displayed as follows:

```{r}
mod
```
Here, `validation_R2` and `validation_RMSE` are evaluated by aggregating the predictions to the aggregate-level validation units obtained from the holdout validation in `cf_downscale_hv`. The high R2 value in this example is mainly due to the small number of aggregate-level units: because there are only 30 units in total, the validation set contains only 8 units.

## Mapping

### Downscaling result

The aggregate-level response and the downscaled disaggregate-level predictive means are mapped as follows:

```{r, fig.height=3.5, fig.width=7.5}
### Aggregate-level polygons and response
ar$agg_id <- agg_id
agg_poly  <- stats::aggregate(ar["agg_id"], by = list(agg_id = agg_id),
                              FUN = function(z) z[1])
agg_poly$Y<- Y

### Disaggregate-level predictive mean
ar$pred   <- mod$pred$pred

plot(agg_poly["Y"], nbreaks = 20, key.pos = 4, axes = TRUE, lwd=0.2,
     main = "Aggregated data (30 units)")
plot(ar["pred"], nbreaks = 20, key.pos = 4, axes = TRUE, border = NA,
     main = "Downscaling result (271 units)")
```

The downscaled map recovers the disaggregate-level spatial pattern of respiratory hospitalisations while preserving consistency with the aggregate-level observations.

### Predictive standard deviations

The standard deviations of the disaggregate-level predictions, which quantify the downscaling uncertainty, are mapped as follows:

```{r, fig.height=4, fig.width=4.5}
ar$pred_sd <- mod$pred$pred_sd
plot(ar["pred_sd"], pal = function(n) hcl.colors(n, "Viridis"), nbreaks = 9, border = NA,
     key.pos = 4, axes = TRUE)
```

### Reference
