Global Historical Climatology Network (GHCN) provides daily climate
measures from stations across the world. prcp_aus extracts
daily precipitation and minimum temperature from GHCN for 639 stations
in Australia from 2016 to 2020. This is where these stations locate in
an Australia map:
state_map <- st_simplify(ozmaps::abs_ste, dTolerance = 4000)
ggplot() +
geom_sf(data = state_map, inherit.aes = FALSE,
color = "grey80", alpha = 0.4, linetype = 3) +
geom_point(data = prcp_aus , aes(x = long, y = lat)) +
theme_void()This is a lot of stations to look at for one climate variable and they can’t all fit into a glyph map. What we can do is to group stations into clusters and look at the aggregated series in the glyph map. In this vignette, I will introduce how to perform aggregation using a hierarchical data structure in cubble.
First, let’s summarise the daily data into weekly sum for each station:
prcp_aus
#> # cubble: id [663]: nested form
#> # bbox: [113.53, -43.66, 153.64, -10.05]
#> # temporal: wk [dbl], prcp [dbl]
#> id long lat elev name wmo_id ts
#> <chr> <dbl> <dbl> <dbl> <chr> <dbl> <list>
#> 1 ASN00001006 128. -15.5 3.8 wyndham aero 95214 <tibble [53 × 2]>
#> 2 ASN00001007 126. -13.8 6 troughton island 94102 <tibble [53 × 2]>
#> 3 ASN00001013 128. -15.5 11 wyndham 94214 <tibble [53 × 2]>
#> 4 ASN00001018 126. -16.4 546 mount elizabeth 94211 <tibble [53 × 2]>
#> 5 ASN00001019 127. -14.3 23 kalumburu 94100 <tibble [53 × 2]>
#> 6 ASN00001020 126. -14.1 51 truscott 95101 <tibble [53 × 2]>
#> 7 ASN00001025 126. -15.4 385 doongan 94215 <tibble [53 × 2]>
#> 8 ASN00002012 128. -18.2 422 halls creek airport 94212 <tibble [53 × 2]>
#> 9 ASN00002032 128. -17.0 203 warmun 94213 <tibble [53 × 2]>
#> 10 ASN00002056 129. -15.8 44 kununurra aero 94216 <tibble [53 × 2]>
#> # … with 653 more rowsFirst we need to assign each station a cluster number. Here we use a
simple kmean algorithm based on the distance matrix to create 20
clusters. This creates station_nested as a station level
nested cubble with a cluster column indicating the group each station
belongs to.
station_nested
#> # cubble: id [663]: nested form
#> # bbox: [113.53, -43.66, 153.64, -10.05]
#> # temporal: wk [dbl], prcp [dbl]
#> id long lat elev name wmo_id ts cluster
#> <chr> <dbl> <dbl> <dbl> <chr> <dbl> <list> <int>
#> 1 ASN00001006 128. -15.5 3.8 wyndham aero 95214 <tibble> 15
#> 2 ASN00001007 126. -13.8 6 troughton island 94102 <tibble> 15
#> 3 ASN00001013 128. -15.5 11 wyndham 94214 <tibble> 15
#> 4 ASN00001018 126. -16.4 546 mount elizabeth 94211 <tibble> 15
#> 5 ASN00001019 127. -14.3 23 kalumburu 94100 <tibble> 15
#> 6 ASN00001020 126. -14.1 51 truscott 95101 <tibble> 15
#> 7 ASN00001025 126. -15.4 385 doongan 94215 <tibble> 15
#> 8 ASN00002012 128. -18.2 422 halls creek airport 94212 <tibble> 15
#> 9 ASN00002032 128. -17.0 203 warmun 94213 <tibble> 15
#> 10 ASN00002056 129. -15.8 44 kununurra aero 94216 <tibble> 15
#> # … with 653 more rowsTo create a group level cubble, use switch_key() with
the new key variable, cluster:
cluster_nested <- station_nested %>% switch_key(cluster)
cluster_nested %>% head(5)
#> # cubble: cluster [5]: nested form
#> # bbox: [118.84, -43.66, 152.87, -10.05]
#> # temporal: id [chr], wk [dbl], prcp [dbl]
#> cluster .val ts
#> <int> <list> <list>
#> 1 1 <tibble [18 × 6]> <tibble [954 × 3]>
#> 2 2 <tibble [51 × 6]> <tibble [2,677 × 3]>
#> 3 3 <tibble [19 × 6]> <tibble [1,007 × 3]>
#> 4 4 <tibble [6 × 6]> <tibble [318 × 3]>
#> 5 5 <tibble [63 × 6]> <tibble [3,339 × 3]>The resulted cluster_nested now has cluster
as the key and all the station level time invariant variables are nested
inside .val. Currently, there is no cluster level time
invariant variables and we can add the centroid of each cluster by
get_centroid():
(cluster_nested <- cluster_nested %>% get_centroid())
#> # cubble: cluster [20]: nested form
#> # bbox: [113.53, -43.66, 153.64, -10.05]
#> # temporal: id [chr], wk [dbl], prcp [dbl]
#> cluster .val ts hull cent_long cent_lat
#> <int> <list> <list> <list> <dbl> <dbl>
#> 1 1 <tibble [18 × 6]> <tibble [954 × 3]> <tibble> 141. -22.9
#> 2 2 <tibble [51 × 6]> <tibble [2,677 × 3]> <tibble> 147. -41.8
#> 3 3 <tibble [19 × 6]> <tibble [1,007 × 3]> <tibble> 121. -30.6
#> 4 4 <tibble [6 × 6]> <tibble [318 × 3]> <tibble> 143. -11.9
#> 5 5 <tibble [63 × 6]> <tibble [3,339 × 3]> <tibble> 150. -32.3
#> 6 6 <tibble [56 × 6]> <tibble [2,951 × 3]> <tibble> 151. -28.2
#> 7 7 <tibble [21 × 6]> <tibble [1,113 × 3]> <tibble> 145. -29.5
#> 8 8 <tibble [61 × 6]> <tibble [3,233 × 3]> <tibble> 117. -31.2
#> 9 9 <tibble [58 × 6]> <tibble [3,074 × 3]> <tibble> 148. -35.3
#> 10 10 <tibble [55 × 6]> <tibble [2,915 × 3]> <tibble> 145. -37.8
#> 11 11 <tibble [37 × 6]> <tibble [1,961 × 3]> <tibble> 142. -36.1
#> 12 12 <tibble [29 × 6]> <tibble [1,537 × 3]> <tibble> 149. -23.6
#> 13 13 <tibble [16 × 6]> <tibble [848 × 3]> <tibble> 130. -27.3
#> 14 14 <tibble [26 × 6]> <tibble [1,378 × 3]> <tibble> 143. -17.7
#> 15 15 <tibble [19 × 6]> <tibble [1,007 × 3]> <tibble> 125. -17.7
#> 16 16 <tibble [48 × 6]> <tibble [2,544 × 3]> <tibble> 139. -33.9
#> 17 17 <tibble [20 × 6]> <tibble [1,060 × 3]> <tibble> 116. -23.7
#> 18 18 <tibble [17 × 6]> <tibble [901 × 3]> <tibble> 134. -17.3
#> 19 19 <tibble [24 × 6]> <tibble [1,272 × 3]> <tibble> 132. -13.0
#> 20 20 <tibble [19 × 6]> <tibble [1,007 × 3]> <tibble> 137. -30.3You can also use face_temporal() to get the cluster
level long cubble:
(cluster_long <- cluster_nested %>% face_temporal(ts))
#> # cubble: wk, cluster [20]: long form
#> # bbox: [113.53, -43.66, 153.64, -10.05]
#> # spatial: id [chr], long [dbl], lat [dbl], elev [dbl], name [chr], wmo_id
#> # [dbl], hull [list], cent_long [dbl], cent_lat [dbl]
#> cluster id wk prcp
#> <int> <chr> <dbl> <dbl>
#> 1 1 ASN00015602 1 256
#> 2 1 ASN00015602 2 1554
#> 3 1 ASN00015602 3 8
#> 4 1 ASN00015602 4 184
#> 5 1 ASN00015602 5 2022
#> 6 1 ASN00015602 6 80
#> 7 1 ASN00015602 7 44
#> 8 1 ASN00015602 8 28
#> 9 1 ASN00015602 9 102
#> 10 1 ASN00015602 10 1658
#> # … with 35,086 more rowsNow we should have access to both station and cluster level in the nested and long form. Let’s summarise them within a diagram:
With these data, we can make a glyph map to understand the precipitation pattern in Australia:
Or to inspect the station membership of each cluster:
ggplot() +
geom_sf(data = state_map, inherit.aes = FALSE,
color = "grey80", alpha = 0.4, linetype = 3) +
geom_point(data = station_nested, aes(x = long, y = lat), size = 0.5) +
ggforce::geom_mark_hull(
data = cluster_nested %>% tidyr::unnest(hull),
expand = 0, radius = 0,
aes(x = long, y = lat, group = cluster)) +
theme_void()