The functions listed in this vignette apply to linear regression models, linear mixed models, and GAMMs (i.e., the functions are tested with lm, glm, lmer, glmer, gam, and bam models).
suppressMessages(library(itsadug))
info('version')
## Package itsadug, version 1.0.1
The code below was used to fit a GAMM model m1 to the data set simdat from the package itsadug. The data set simdat is simulated time series data with arbitrary predictors.
data(simdat)
# For illustration purposes, we build a GAMM model
# with a nonlinear interaction, two groups, and
# random wiggly smooths for Subjects:
m1 <- bam(Y ~ Group + te(Time, Trial, by=Group)
+ s(Time, Subject, bs='fs', m=1),
data=simdat)
acf_residThe function acf_resid is a wrapper around the functions acf_plot and acf_n_plots. It allows for different ways of checking the ACF.
The default acf function R plots the autocorrelation function of the residuals as if the residuals are a single time series:
acf(resid(m1))
Alternatively, the function acf_resid of the package itsadug could be used. This function offers different possibilities, as listed below:
acf_resid(m1)
Individual time series could be provided as a named list, or as a vector with model predictors.
# Option A: include named list
acf_resid(m1, split_pred=list(simdat$Subject,simdat$Trial))
# Option B: include model predictors
# This method only works for predictors that are included in the model.
acf_resid(m1, split_pred=c("Subject","Trial"))
By default, function acf_resid calls acf_plot to calculate the averages of the time series. However, different measures can be provided with the argument fun in acf_plots:
# Minimum ACF per lag:
acf_resid(m1, split_pred=c("Subject","Trial"), fun=min)
# Maximum ACF per lag:
acf_resid(m1, split_pred=c("Subject","Trial"), fun=max)
The function optionally returns the acf values, which can be used for generate more advanced ACF plots:
# Median ACF per lag:
acf_resid(m1, split_pred="Subject", fun=median, lwd=3,
main="Distribution of ACF")
# Calculate 25% and 75% quantiles:
acf1 <- acf_resid(m1, split_pred="Subject",
fun=function(x){quantile(x, .25)}, plot=FALSE)
acf2 <- acf_resid(m1, split_pred="Subject",
fun=function(x){quantile(x, .75)}, plot=FALSE)
# Plot these as error bars in different colors:
len <- length(acf1)-1
fill_area(x=0:len, y=acf2, from=acf1, col=alpha(1))
addInterval(pos=0:len, acf1, acf2, horiz=FALSE, col=alpha(1))
# add legend:
legend('topright',
fill=alpha(1),
border=alpha(1),
legend='25-75%',
bty='n')
The function acf_resid makes use of the function acf_n_plots to plot individual time series when the argument n is specified.
By default n time series are plotted that represent \(N\) quantiles (with respect to the value of lag 1).
acf_resid(m1, split_pred=c("Subject","Trial"), n=6)
## Quantiles to be plotted:
## 0% 20% 40% 60% 80% 100%
## -0.31881507 -0.01289427 0.08513155 0.21909939 0.53231119 0.96714554
Optionally, the function outputs the quantiles:
out <- acf_resid(m1, split_pred=c("Subject","Trial"), n=6, plot=FALSE)
## Quantiles to be plotted:
## 0% 20% 40% 60% 80% 100%
## -0.31881507 -0.01289427 0.08513155 0.21909939 0.53231119 0.96714554
# print the head of the elements in the first quantile:
head(out[[1]][['elements']])
## event lag1
## 1 c05.-10 -0.04423006
## 2 c11.-10 -0.10037337
## 3 a05.-9 -0.18734255
## 4 a09.-9 -0.17250376
## 5 a13.-9 -0.02298813
## 6 c02.-9 -0.20476818
# print the quantile:
out[[1]][['quantile']]
## 0% 20%
## -0.31881507 -0.01289427
When random=TRUE, \(N\) random events are being plotted:
acf_resid(m1, split_pred=c("Subject","Trial"), n=6, random=TRUE)
With the argument cond (see help(acf_pn_plots)) specific events could be plotted:
simdat$Event <- with(simdat, interaction(Subject, Trial))
acf_resid(m1, split_pred=list(Event=simdat$Event), n=6,
cond=list(Event=c('c05.-10', 'c11.-10', 'a05.-9', 'a09.-9', 'a13.-9', 'c02.-9')))
## Quantiles to be plotted:
## 0% 20% 40% 60% 80% 100%
## -0.20476818 -0.18734255 -0.17250376 -0.10037337 -0.04423006 -0.02298813
The function acf_resid optionally gives back information about individual timeseries:
# default output is the acf values:
(out <- acf_resid(m1, split_pred=c("Subject","Trial"), plot=FALSE))
## 0 1 2 3 4 5
## 1.00000000 0.23712054 0.22873738 0.22579123 0.22026437 0.20615686
## 6 7 8 9 10 11
## 0.20717297 0.19301394 0.18658474 0.18870720 0.17914887 0.17146109
## 12 13 14 15 16 17
## 0.15874921 0.15707638 0.14718378 0.14104510 0.13889794 0.12968832
## 18 19 20
## 0.11796424 0.10745294 0.09860553
# Alternatively, more information could be retrieved:
out <- acf_resid(m1, split_pred=c("Subject","Trial"), plot=FALSE, return_all=TRUE)
# out is a list of info:
names(out)
## [1] "acf" "acftable" "dataframe" "n" "series" "FUN"
# 1. acf gives the acf values:
out[['acf']]
## 0 1 2 3 4 5
## 1.00000000 0.23712054 0.22873738 0.22579123 0.22026437 0.20615686
## 6 7 8 9 10 11
## 0.20717297 0.19301394 0.18658474 0.18870720 0.17914887 0.17146109
## 12 13 14 15 16 17
## 0.15874921 0.15707638 0.14718378 0.14104510 0.13889794 0.12968832
## 18 19 20
## 0.11796424 0.10745294 0.09860553
# 2. acftable provides the individual acf's in wide table format:
head(out[['acftable']], 3)
## 0 1 2 3 4 5 6
## a01.-10 1 0.2208330 0.1351628 0.2910060 0.1277496 0.2439650 0.2377283
## a02.-10 1 0.4069722 0.4542935 0.3196872 0.4171387 0.3109420 0.3262143
## a03.-10 1 0.1448119 0.2602716 0.2236864 0.1480399 0.3161058 0.1507453
## 7 8 9 10 11 12
## a01.-10 0.2283752 0.1663220 0.1378972 0.1784408 0.2712004 0.05067271
## a02.-10 0.3280109 0.3654185 0.3015577 0.2539762 0.2672726 0.25087072
## a03.-10 0.3021498 0.2405207 0.2565739 0.1676992 0.2306338 0.18238364
## 13 14 15 16 17 18
## a01.-10 0.01943445 0.2884304 0.13345619 0.1178192 0.2474626 0.1080486
## a02.-10 0.34361377 0.3056743 0.28053685 0.2541574 0.2225718 0.1528064
## a03.-10 0.21911029 0.2525816 0.04745451 0.2099518 0.2957930 0.1640577
## 19 20
## a01.-10 0.07749091 0.07787484
## a02.-10 0.19932699 0.13487574
## a03.-10 0.25687585 0.04938008
dim(out[['acftable']])
## [1] 756 21
# 3. dataframe prvides a data frame with the acf, n, and ci information
# in long table format:
head(out[['dataframe']])
## event acf lag n ci Subject Trial
## 1 a01.-1 1.00000000 0 100 0.19 a01 -1
## 2 a01.-1 0.09764361 1 100 0.19 a01 -1
## 3 a01.-1 0.03373664 2 100 0.19 a01 -1
## 4 a01.-1 0.18912723 3 100 0.19 a01 -1
## 5 a01.-1 0.12477850 4 100 0.19 a01 -1
## 6 a01.-1 0.08529486 5 100 0.19 a01 -1
# 4. n provides the number of data points underlying each ACF:
head(out[['n']])
## n event
## 1 100 a01.-10
## 2 100 a02.-10
## 3 100 a03.-10
## 4 100 a04.-10
## 5 100 a05.-10
## 6 100 a06.-10
# 5. series and FUN provide info on input and function:
out[['series']]
## [1] "resid_gam(model)"
out[['FUN']]
## function (x, ...)
## UseMethod("mean")
## <bytecode: 0x7fe56001a8e8>
## <environment: namespace:base>
The data frames are useful for plotting the ACFs using other packages. This is an example from the vignette of the article of @BatesEtal :
# Plot individual participants with the package lattice:
library(lattice)
out <- acf_resid(m1, split_pred=c("Subject"), plot=FALSE, return_all=TRUE)$dataframe
civec = out[out$lag==0,]$ci
xyplot(acf ~ lag | event, type = "h", data = out, col.line = "black",
panel = function(...) {
panel.abline(h = civec[panel.number()], col.line = "grey")
panel.abline(h = -civec[panel.number()], col.line = "grey")
panel.abline(h = 0, col.line = "black")
panel.xyplot(...)
},
strip = strip.custom(bg = "grey90"),
par.strip.text = list(cex = 0.8),
xlab="lag", ylab="autocorrelation")
When an AR1 model is included in a gam or bam model, the function acf_resid autmatically corrects for it:
# genetare AR start column:
simdat <- start_event(simdat, column="Time", event="Event")
head(simdat)
# run GAMM with AR1 model:
m1 <- bam(Y ~ Group + te(Time, Trial, by=Group)
+ s(Time, Subject, bs='fs', m=1),
data=simdat, rho=.65, AR.start=simdat$start.event)
# plot normal acf, without correction for rho:
acf(resid(m1))
# plot normal acf with acf_plot:
acf_resid(m1)
# plot normal acf with acf_plot:
acf_plot(resid(m1), split_by=list(simdat$Subject))
# plot corrected acf plot with acf_plot:
acf_plot(resid_gam(m1, incl_na=TRUE), split_by=list(simdat$Subject))
## Group Time Trial Condition Subject Y Event start.event
## 1 Adults 0.00000 -10 -1 a01 0.7554469 a01.-10 TRUE
## 2 Adults 20.20202 -10 -1 a01 2.7834759 a01.-10 FALSE
## 3 Adults 40.40404 -10 -1 a01 1.9696963 a01.-10 FALSE
## 4 Adults 60.60606 -10 -1 a01 0.6814298 a01.-10 FALSE
## 5 Adults 80.80808 -10 -1 a01 1.6939195 a01.-10 FALSE
## 6 Adults 101.01010 -10 -1 a01 2.3651969 a01.-10 FALSE
acf_plot The function acf_plot is used for generating the ACF for individual time series, and may plot the averaged ACF. In contrast with acf_resid the input needs to be a vector, and the grouping predictors are provided to the argument split_by as a list with vectors.
acf_plot(resid_gam(m1))
acf_plot(resid_gam(m1, incl_na=TRUE), split_by=list(simdat$Subject))
acf_n_plots The function acf_n_plots is used for generating \(N\) ACF plots of individual time series. In contrast with acf_resid the input needs to be a vector, and the grouping predictors are provided to the argument split_by as a list with vectors.
acf_n_plots(resid_gam(m1, incl_na=TRUE), split_by=list(simdat$Subject), n=6, random=TRUE)