This vignette details the structure of incidence objects, as produced by the incidence function.
# Structure of an incidence object.
We generate a toy dataset of dates to examine the content of incidence objects.
library(incidence)
set.seed(1)
dat <- sample(1:50, 200, replace = TRUE, prob = 1 + exp(1:50 * 0.1))
sex <- sample(c("female", "male"), 200, replace = TRUE)The incidence by 48h period is computed as:
i <- incidence(dat, interval = 2)
i
#> <incidence object>
#> [200 cases from days 5 to 49]
#>
#> $counts: matrix with 23 rows and 1 columns
#> $n: 200 cases in total
#> $dates: 23 dates marking the left-side of bins
#> $interval: 2 days
#> $timespan: 45 days
plot(i)We also compute incidence by gender:
i.sex <- incidence(dat, interval = 2, group = sex)
i.sex
#> <incidence object>
#> [200 cases from days 5 to 49]
#> [2 groups: female, male]
#>
#> $counts: matrix with 23 rows and 2 columns
#> $n: 200 cases in total
#> $dates: 23 dates marking the left-side of bins
#> $interval: 2 days
#> $timespan: 45 days
plot(i.sex)The object i is a list with the class incidence:
class(i)
#> [1] "incidence"
is.list(i)
#> [1] TRUE
names(i)
#> [1] "dates" "counts" "timespan" "interval" "n"Items in i can be accessed using the same indexing as any lists:
## use name
head(i$dates)
#> [1] 5 7 9 11 13 15
## use numeric indexing
head(i[[2]])
#> [,1]
#> [1,] 3
#> [2,] 0
#> [3,] 1
#> [4,] 0
#> [5,] 1
#> [6,] 0In the following sections, we examine each of the components of the object.
$datesThe $dates component contains all the dates for which incidence have been computed, in the format of the input dataset (e.g. Date, numeric, integer).
class(i$dates)
#> [1] "integer"
class(dat)
#> [1] "integer"
i$dates
#> [1] 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49The dates correspond to the lower bounds of the time intervals used as bins for the incidence. Bins always include the lower bound and exclude the upper bound. In the example provided above, this means that the first bin counts events that happened at day 2-8, the second bin counts events from 9-15, etc.
$countsThe $counts component contains the actual incidence, i.e. counts of events for the defined bins. It is a matrix of integers where rows correspond to time intervals, with one column for each group for which incidence is computed (a single, unamed column if no groups were provided). If groups were provided, columns are named after the groups. We illustrate the difference comparing the two objects i and i.sex:
class(i$counts)
#> [1] "matrix"
storage.mode(i$counts)
#> [1] "integer"
i$counts
#> [,1]
#> [1,] 3
#> [2,] 0
#> [3,] 1
#> [4,] 0
#> [5,] 1
#> [6,] 0
#> [7,] 1
#> [8,] 0
#> [9,] 3
#> [10,] 3
#> [11,] 3
#> [12,] 5
#> [13,] 9
#> [14,] 4
#> [15,] 5
#> [16,] 12
#> [17,] 13
#> [18,] 16
#> [19,] 15
#> [20,] 25
#> [21,] 28
#> [22,] 28
#> [23,] 25
i.sex$counts
#> female male
#> [1,] 1 2
#> [2,] 0 0
#> [3,] 0 1
#> [4,] 0 0
#> [5,] 1 0
#> [6,] 0 0
#> [7,] 0 1
#> [8,] 0 0
#> [9,] 2 1
#> [10,] 2 1
#> [11,] 2 1
#> [12,] 3 2
#> [13,] 5 4
#> [14,] 3 1
#> [15,] 3 2
#> [16,] 6 6
#> [17,] 6 7
#> [18,] 7 9
#> [19,] 10 5
#> [20,] 15 10
#> [21,] 16 12
#> [22,] 19 9
#> [23,] 18 7Note that a data.frame containing dates and counts can be obtained using as.data.frame:
## basic conversion
as.data.frame(i)
#> dates counts
#> 1 5 3
#> 2 7 0
#> 3 9 1
#> 4 11 0
#> 5 13 1
#> 6 15 0
#> 7 17 1
#> 8 19 0
#> 9 21 3
#> 10 23 3
#> 11 25 3
#> 12 27 5
#> 13 29 9
#> 14 31 4
#> 15 33 5
#> 16 35 12
#> 17 37 13
#> 18 39 16
#> 19 41 15
#> 20 43 25
#> 21 45 28
#> 22 47 28
#> 23 49 25
as.data.frame(i.sex)
#> dates female male
#> 1 5 1 2
#> 2 7 0 0
#> 3 9 0 1
#> 4 11 0 0
#> 5 13 1 0
#> 6 15 0 0
#> 7 17 0 1
#> 8 19 0 0
#> 9 21 2 1
#> 10 23 2 1
#> 11 25 2 1
#> 12 27 3 2
#> 13 29 5 4
#> 14 31 3 1
#> 15 33 3 2
#> 16 35 6 6
#> 17 37 6 7
#> 18 39 7 9
#> 19 41 10 5
#> 20 43 15 10
#> 21 45 16 12
#> 22 47 19 9
#> 23 49 18 7
## long format for ggplot2
as.data.frame(i.sex, long = TRUE)
#> dates counts groups
#> 1 5 1 female
#> 2 7 0 female
#> 3 9 0 female
#> 4 11 0 female
#> 5 13 1 female
#> 6 15 0 female
#> 7 17 0 female
#> 8 19 0 female
#> 9 21 2 female
#> 10 23 2 female
#> 11 25 2 female
#> 12 27 3 female
#> 13 29 5 female
#> 14 31 3 female
#> 15 33 3 female
#> 16 35 6 female
#> 17 37 6 female
#> 18 39 7 female
#> 19 41 10 female
#> 20 43 15 female
#> 21 45 16 female
#> 22 47 19 female
#> 23 49 18 female
#> 24 5 2 male
#> 25 7 0 male
#> 26 9 1 male
#> 27 11 0 male
#> 28 13 0 male
#> 29 15 0 male
#> 30 17 1 male
#> 31 19 0 male
#> 32 21 1 male
#> 33 23 1 male
#> 34 25 1 male
#> 35 27 2 male
#> 36 29 4 male
#> 37 31 1 male
#> 38 33 2 male
#> 39 35 6 male
#> 40 37 7 male
#> 41 39 9 male
#> 42 41 5 male
#> 43 43 10 male
#> 44 45 12 male
#> 45 47 9 male
#> 46 49 7 maleNote that incidence has an argument called na_as_group which is TRUE by default, which will pool all missing groups into a separate group, in which case it will be a separate column in $counts.
$timespanThe $timespan component stores the length of the time period covered by the object:
i$timespan
#> [1] 45
range(i$dates)
#> [1] 5 49
diff(range(i$dates)) + 1
#> [1] 45$intervalThe $interval component contains the length of the time interval for the bins:
i$interval
#> [1] 2
diff(i$dates)
#> [1] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2$nThe $n component stores the total number of events in the data:
i$n
#> [1] 200Note that to obtain the number of cases by groups, one can use:
apply(i.sex$counts, 2, sum)
#> female male
#> 119 81