library(tdata)In this vignette, I will introduce you to the main features of the
tdata package. I will use various datasets to demonstrate
how to perform common tasks, such as defining frequency types and
converting data between frequencies.
Please note that currently, only one section is provided in this vignette. Additional examples will be added in subsequent updates.
Let’s get started!
In the first example, I will use oil price data. The required data
can be downloaded from the Quandl package using the
following code (Note that the end date in this example may differ from
yours):
oil_price <- Quandl::Quandl("OPEC/ORB", start_date="2010-01-01")To manipulate data using the tdata package, we generally
need to create a variable. In this example, we’ll create a variable from
the oil price data. First, we’ll use the values in the first column to
define a frequency. Since the first column contains a list of dates,
we’ll use a ‘List-Date’ frequency:
start_freq <- f.list.date(oil_price$Date)Now that we have defined the frequency, we can create a variable using the following code:
var_dl <- variable(oil_price$Value, start_freq, "Oil Price")This creates an array where each element is labeled by a date. We can
print this variable using the print function:
print(var_dl)## Variable:
## Name = Oil Price
## Length = 3466
## Frequency Class = List (Date): Ld
## Start Frequency = 20230608
## Fields: NULL
We can also convert the variable back to a data.frame using the
as.data.frame function:
df_var_dl <- as.data.frame(var_dl)In this section, we’ll convert var_dl to a daily
variable. This can be done by sorting the data and filling in any gaps.
The convert.to.daily function can do this for us:
var_daily <- convert.to.daily(var_dl)Using this function is more efficient than manually sorting the data
and filling in gaps because var_daily, as a daily variable,
only stores a single date: the frequency of the first observation. Other
frequencies (or dates) are inferred from this first date (except for
‘Lists’, this is true for other types of frequencies in the
tdata package). We can print the starting frequency using
the print function:
print(var_daily$startFrequency)## Frequency: 20100104 (Daily: d)
Each frequency in the tdata package has a string
representation and a class ID. We can get these values using the
following code:
class_id <- get.class.id(var_daily$startFrequency)
str_rep <- as.character(var_daily$startFrequency)## [1] "class_id: d, str_rep: 20100104"
Plotting the data is straightforward. We simply convert the data to a
data.frame using the as.data.frame function
and then plot it. However, I won’t plot the daily variable in this
example because, since the original data was a ‘List-Date’, there are
many NA values. In the next section, I’ll aggregate the
data and plot it.
In this section, we’ll convert the daily variable to a weekly
variable. Unlike the previous conversion, this involves aggregating the
data rather than sorting and filling in gaps. To do this, we’ll need to
use an aggregator function that takes an array of data as an argument
and returns a scalar value. Summary statistic functions such as
mean and median are natural choices for this
(we’ll also need to handle NA values). In this example,
I’ll use a built-in function to get the last available data point in
each week as the representative value for that week. Here’s the
code:
var_weekly <- convert.to.weekly(var_daily, "mon", "last")The second argument, "mon", specifies that the week
starts on Monday. Note that the weekly frequency points to the first day
of the week. We can now convert the variable to a
data.frame and plot it using the following code:
df_var_weekly <- as.data.frame(var_weekly)
par(las = 2, cex.axis = 0.8)
plot(factor(rownames(df_var_weekly)),
df_var_weekly$`Oil Price`,
xlab = NULL, ylab = "$",
main = "Weekly Oil Price")Plotting the generated weekly data
There are other frequency types and conversion functions available in
the tdata package that you can explore on your own.