Many packages exist to visualize temporal data (e.g., geological or evolutionary biological data). deeptime similarly has a few novel ways to help you plot your temporal data in useful ways. We’ll first load some packages and example data so we can demonstrate some of this functionality.
# Load deeptime
library(deeptime)
# Load other packages
library(ggplot2)
library(dplyr)
# Load palaeoverse for tetrapod occurrence data
library(palaeoverse)
data(tetrapods)
Do you have a bunch of temporal occurrences of taxa or some sort of
geological phenomenon, but you don’t want to go through all of the pain
of figuring out how to visualize those occurrences and their temporal
ranges? And it needs to be customizable and have a pretty geological or
stratigraphic timescale on the side? Well, then
geom_points_range()
is your friend!
geom_points_range()
is like
ggplot2::geom_pointrange()
in that it plots points and
their ranges. However, the “raw” data that goes into
ggplot2::geom_pointrange()
is the lower, upper, and the
coordinates for an individual point for each group. First, we’re too
lazy to calculate our own ranges! Second, only one point per group? But
we have a whole bunch of occurrences for each group that need to be
plotted! The difference with geom_points_range()
is that
the raw data it takes in is all of your grouped temporal data. It then
does all of the work for you to plot those individual occurrences AND
the ranges of those occurrences for each group. Let’s try it out with
some Permian vertebrate occurrence data from the
palaeoverse
:
# sort the occurrences from most common genera to least common genera
# assume the age is just the mean of the max and min
occdf <- tetrapods %>%
filter(accepted_rank == "genus") %>%
select(occurrence_no, accepted_name, max_ma, min_ma) %>%
mutate(accepted_name = reorder(accepted_name, accepted_name, length)) %>%
arrange(desc(accepted_name)) %>%
mutate(age = (max_ma + min_ma) / 2)
# get a reasonable subset of those occurrences
occdf <- occdf[1:300, ]
# plot those occurrences
ggplot(data = occdf) +
geom_points_range(aes(x = age, y = accepted_name)) +
theme_classic()
And then, of course, we want to add a timescale:
ggplot(data = occdf) +
geom_points_range(aes(x = age, y = accepted_name)) +
scale_x_reverse() +
coord_geo(pos = list("bottom", "bottom"), dat = list("stages", "periods"),
abbrv = list(TRUE, FALSE), expand = TRUE, size = "auto") +
theme_classic()
What if we aren’t certain about some of our data points? Maybe we don’t necessarily know if they are assigned to the correct genus or perhaps we are uncertain about their age? Fortunately, we have all of the tools of ggplot available to us! First we’ll simulate some binary “certainty” values, then we’ll plot certainty as additional aesthetics:
occdf$certainty <- factor(sample(0:1, nrow(occdf), replace = TRUE))
ggplot(data = occdf) +
geom_points_range(aes(x = age, y = accepted_name,
fill = certainty, linetype = certainty), shape = 21) +
scale_x_reverse() +
scale_fill_manual(values = c("white", "black")) +
scale_linetype_manual(values = c("dashed", "solid")) +
coord_geo(pos = list("bottom", "bottom"), dat = list("stages", "periods"),
abbrv = list(TRUE, FALSE), expand = TRUE, size = "auto") +
theme_classic()
Finally, we can sort the taxa so that they are arranged in order of their earliest occurrence:
occdf$accepted_name <- reorder(occdf$accepted_name, occdf$age, max,
decreasing = TRUE)
ggplot(data = occdf) +
geom_points_range(aes(x = age, y = accepted_name,
fill = certainty, linetype = certainty), shape = 21) +
scale_x_reverse() +
scale_fill_manual(values = c("white", "black")) +
scale_linetype_manual(values = c("dashed", "solid")) +
coord_geo(pos = list("bottom", "bottom"), dat = list("stages", "periods"),
abbrv = list(TRUE, FALSE), expand = TRUE, size = "auto") +
theme_classic()
Note that our uncertain lines (0) always overlap with our certain lines (1), so there is a continuous line spanning each taxon’s entire range. However, if we tweak some of the data, we can change that, causing a gap in the ranges. Note that in this engineered example, the uncertain and certain ranges for Diictodon no longer overlap, so there is a gap between them:
oldest_certain <- occdf %>%
filter(accepted_name == "Diictodon", certainty == 1) %>%
pull(age) %>%
max()
n_uncertain <- sum(occdf$accepted_name == "Diictodon" & occdf$certainty == 0)
# make the uncertain points all much older
occdf$age[occdf$accepted_name == "Diictodon" & occdf$certainty == 0] <-
oldest_certain + runif(n_uncertain, 15, 30)
ggplot(data = occdf) +
geom_points_range(aes(x = age, y = accepted_name,
fill = certainty, linetype = certainty), shape = 21) +
scale_x_reverse() +
scale_fill_manual(values = c("white", "black")) +
scale_linetype_manual(values = c("dashed", "solid")) +
coord_geo(pos = list("bottom", "bottom"), dat = list("stages", "periods"),
abbrv = list(TRUE, FALSE), expand = TRUE, size = "auto") +
theme_classic()
However, if we want a line connecting these groups of points, we can
fix this by using the background_line
argument, which can
be a list of aesthetic values to use for the background line
segments:
ggplot(data = occdf) +
geom_points_range(aes(x = age, y = accepted_name,
fill = certainty, linetype = certainty), shape = 21,
background_line = list(linetype = "dashed")) +
scale_x_reverse() +
scale_fill_manual(values = c("white", "black")) +
scale_linetype_manual(values = c("dashed", "solid")) +
coord_geo(pos = list("bottom", "bottom"), dat = list("stages", "periods"),
abbrv = list(TRUE, FALSE), expand = TRUE, size = "auto") +
theme_classic()
Finally, while I’ve showcased this geom with the use case of plotting occurrence data, note that the potential usage for this function is much broader. Basically any set of data with a categorical and a continuous variable could be visualized like this (when appropriate).
You may also want to color your data based on its age.
deeptime has scale_color_geo()
and
scale_fill_geo()
for this very purpose! Note that currently
these scales only work with discrete data. The default behavior is for
the color/fill aesthetic values to match the names of the intervals in
dat
. Here, we’ll use the coral_div_dis data from the first
vignette tutorial:
ggplot(coral_div_dis, aes(x = n, y = diet, fill = period)) +
geom_col() +
scale_fill_geo(periods) +
xlab("Coral Genera") +
ylab("Diet") +
theme_classic()
Another way to split data by categories is by faceting.
deeptime has a suite of facetting functions that allow
you to facet by geological intervals (facet_grid_geo()
,
facet_wrap_geo()
, facet_nested_geo()
, and
facet_nested_wrap_geo()
). Here, we’ll use the same coral
diversity data as above, but facet wrap it by period. Note that we need
to convert the period
column to an ordered factor so the
facets appear in chronological order.
coral_div_dis <- coral_div_dis %>%
mutate(period = factor(period, levels = rev(periods$name)))
ggplot(coral_div_dis, aes(x = n, y = diet)) +
geom_col() +
scale_fill_geo(periods) +
xlab("Coral Genera") +
ylab("Diet") +
theme_classic() +
facet_wrap_geo(~period, colors = periods)
Just like with adding timescales to plots, you may want to show hierarchical timescale information. For example, we may want to show which of these periods are in the Mesozoic and which are in the Cenozoic. We can do this by adding a new column to our data that indicates the era of each period, then using both the period and era columns to make a nested facet:
coral_div_dis <- coral_div_dis %>%
mutate(era = ifelse(period %in% c("Paleogene", "Neogene", "Quaternary"),
"Cenozoic", "Mesozoic")) %>%
mutate(era = factor(era, levels = c("Cenozoic", "Mesozoic")))
ggplot(coral_div_dis, aes(x = n, y = diet)) +
geom_col() +
scale_fill_geo(periods) +
xlab("Coral Genera") +
ylab("Diet") +
theme_classic() +
facet_nested_wrap_geo(~era + period, colors = rbind(periods, eras), nrow = 2)