The getspanel package can be downloaded and installed
from CRAN here by simply
using:
install.packages("getspanel")The source code of the package is on GitHub and the development version can be installed using:
# install.packages("devtools")
devtools::install_github("moritzpschwarz/getspanel", ref = "devel")
Once installed we need to load the library:
library(getspanel)Currently the package is called getspanel to align with the gets package, but it’s main function of course remains the isatpanel function.
The isatpanel function implements the empirical break detection algorithm that is described in a paper by Felix Pretis and Moritz Schwarz and was applied to a study by Nico Koch and colleagues on EU Road CO2 emissions, which was published in Nature Energy in 2022.
A quick overview over what has changed:
We can now use the function approach as well as the traditional
gets approach. This means that we can specify a model using
y and mxreg as well as time and
id as vectors, but we can now also simply supply a
data.frame and a function in the form
y ~ x + z + I(x^2) to e.g. specify polynomials. This means
we will then need an index argument, which specifies
the
The ar argument now works
We can now use the fixest package to speed up model
estimation with large i (for short panels, the default
method is still faster).The package can be activated using the new
engine argument.
Using the fixest package also allows us to calculate
clustered standard errors.
We can now be certain that unbalanced panels would work as intended, which was not the case before.
The mxbreak and break.method arguments
have been removed. Instead the function now produces the break matrix
itself. This now implements the following saturation methods in a user
friendly way:
iis: Impulse Indicator Saturation
jsis: Joint Step Indicator Saturation (Common Breaks over time)
csis: Coefficient Step Indicator Saturation (Common Coefficient Breaks over time)
fesis: Fixed Effect Step Indicator Saturation (Breaks in the Group Fixed Effect over time)
cfesis: Coefficient Fixed Effect Step Indicator Saturation (Breaks in the coefficient for each individual)
We first load some data of EU CO2 Emissions in the housing sector.
data("EUCO2residential")
head(EUCO2residential)
# A tibble: 6 × 9
country year lgdp lhdd lcdd urban av.rate pop agg.directem
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Austria 1969 25.6 NA NA 65.2 NA NA NA
2 Austria 1970 25.7 NA NA 65.3 NA NA NA
3 Austria 1971 25.8 NA NA 65.3 NA 7500482 NA
4 Austria 1972 25.8 NA NA 65.3 NA 7544201 NA
5 Austria 1973 25.9 NA NA 65.3 NA 7586115 NA
6 Austria 1974 25.9 NA NA 65.3 NA 7599038 NA
# let's subset this a little bit to speed this up
EUCO2residential <- EUCO2residential[EUCO2residential$year > 2000 &
EUCO2residential$country %in% c("Germany", "Austria",
"Belgium", "Italy",
"Sweden", "Denmark"),]
# let's create a log emissions per capita variable
EUCO2residential$lagg.directem_pc <- log(EUCO2residential$agg.directem/EUCO2residential$pop)
# and let's also turn off printing the intermediate output from isatpanel
options(print.searchoutput = FALSE)Let’s look at how we input what we want to model. Each
isatpanel command takes:
In the gets package style i.e. using vectors and
matrices to specify y, mxreg,
time and id
But also in a form that resembles the lm and
plm specification i.e. inputting a data.frame
(or matrix or tibble), a formula
argument as well as character vectors for index (in the
form
c("group_variable_name", "time_variable_name"))
effect.This already means that the following two commands will give the same result:
Using the new method
is_lm <- isatpanel(data = EUCO2residential,
formula = lagg.directem_pc ~ lgdp + I(lgdp^2) + pop,
index = c("country","year"),
effect = "twoways",
fesis = TRUE)Using the traditional method
is_gets <- isatpanel(y = EUCO2residential$lagg.directem_pc,
mxreg = EUCO2residential$lgdp,
time = EUCO2residential$year,
id = EUCO2residential$country,
effect = "twoways",
fesis = TRUE)From here onwards, I will use the lm notation.
We can plot these simply using the default plotting methods (rely on the ggplot2 package):
plot(is_lm)plot_grid(is_lm)plot_counterfactual(is_lm)This argument works just as in the gets package. The
method simply adds a 0 and 1 dummy for each
observation.
Simply set iis = TRUE.
iis_example <- isatpanel(data = EUCO2residential,
formula = lagg.directem_pc ~ lgdp + I(lgdp^2) + pop,
index = c("country","year"),
effect = "twoways",
iis = TRUE,
fesis = TRUE)plot(iis_example)Traditional Step Indicator Saturation does not make sense in a panel
setting. Therefore, the gets function of
sis is disabled.
It is possible, however, to consider Step Indicator Saturation with
common breaks across individuals. Such indicators would be collinear, if
effects = c("twoways") or effects = c("time")
i.e. if Time Fixed Effects are included.
If, however, effect = "individual" then we can use
jsis = TRUE to select over all individual time fixed
effects.
jsis_example <- isatpanel(data = EUCO2residential,
formula = lagg.directem_pc ~ lgdp + I(lgdp^2) + pop,
index = c("country","year"),
effect = "individual",
jsis = TRUE)plot(jsis_example)Note: This method has only been tested using the
lm implementation (using data,
formula, and index).
This method allows detection of coefficient breaks that are common
across all groups. It is the interaction between jsis and
the relevant coefficient.
To illustrate this, as well as the advantages of using the
lm approach, we include a non-linear term of the lgdp
variable using I(lgdp^2):
csis_example <- isatpanel(data = EUCO2residential,
formula = lagg.directem_pc ~ lgdp + I(lgdp^2) + pop,
index = c("country","year"),
effect = "twoways",
csis = TRUE)plot(csis_example)By default, all coefficients will be interacted and added to the
indicator list - but his can be controlled using the
csis_var, which takes a character vector of column names
i.e. csis_var = "lgdp".
csis_example2 <- isatpanel(data = EUCO2residential,
formula = lagg.directem_pc ~ lgdp + I(lgdp^2) + pop,
index = c("country","year"),
effect = "twoways",
csis = TRUE,
csis_var = "lgdp")This is equivalent to supplying a constant to the mxbreak argument in the old method. This essentially breaks the group-specific intercept i.e. the individual fixed effect.
fesis_example <- isatpanel(data = EUCO2residential,
formula = lagg.directem_pc ~ lgdp + I(lgdp^2) + pop,
index = c("country","year"),
effect = "twoways",
fesis = TRUE)plot(fesis_example)Similar to the csis_var idea, we can specify the
fesis method for a subset of individuals as well using the
fesis_id variable, which takes a character vector of
individuals. In this case we can use
e.g. fesis_id = c("Austria","Denmark").
fesis_example2 <- isatpanel(data = EUCO2residential,
formula = lagg.directem_pc ~ lgdp + I(lgdp^2) + pop,
index = c("country","year"),
effect = "twoways",
fesis = TRUE,
fesis_id = c("Austria","Denmark"))plot(fesis_example2)This method combines the csis and the fesis
approach and detects whether coefficients for individual units break
over time.
This means we can also combine the subsetting in both the variable
and in the individual units using cfesis_id and
cfesis_var.
cfesis_example <- isatpanel(data = EUCO2residential,
formula = lagg.directem_pc ~ lgdp + I(lgdp^2) + pop,
index = c("country","year"),
effect = "twoways",
cfesis = TRUE,
cfesis_id = c("Belgium","Germany"),
cfesis_var = "lgdp",
t.pval = 0.001)plot(cfesis_example)ar argumentIt is now possible to specify an argument to include autoregressive
coefficients, using the ar argument.
fesis_ar1_example <- isatpanel(data = EUCO2residential,
formula = lagg.directem_pc ~ lgdp + I(lgdp^2) + pop,
index = c("country","year"),
effect = "twoways",
fesis = TRUE,
ar = 1)The options for the robust_isatpanel are to use HAC
Standard Errors, use a standard White Standard Error Correction (with
the option of clustering the S.E. within groups or time):
robust_isatpanel(fesis_ar1_example, HAC = TRUE, robust = TRUE, cluster = "group")
$plm_object
Model Formula: y ~ ar1 + lgdp + I.lgdp.2. + pop + indicators
<environment: 0x000001887980b6b0>
Coefficients:
ar1 indicators
0.79587 -0.18504
$robust
t test of coefficients:
Estimate Std. Error t value Pr(>|t|)
ar1 0.795872 0.011568 68.801 < 2.2e-16 ***
indicators -0.185045 0.015963 -11.592 < 2.2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
$HAC
t test of coefficients:
Estimate Std. Error t value Pr(>|t|)
idAustria -1.5389e+03 1.1792e+03 -1.3050 0.19522
idBelgium -1.5386e+03 1.1792e+03 -1.3048 0.19528
idDenmark -1.5390e+03 1.1792e+03 -1.3052 0.19517
idGermany -1.5387e+03 1.1792e+03 -1.3049 0.19526
idItaly -1.5388e+03 1.1792e+03 -1.3050 0.19522
idSweden -1.5391e+03 1.1792e+03 -1.3052 0.19514
time -1.7146e-02 8.3591e-03 -2.0512 0.04315 *
ar1 6.9527e-01 5.4987e-02 12.6443 < 2.2e-16 ***
lgdp 1.1814e+02 8.8929e+01 1.3285 0.18737
I.lgdp.2. -2.2249e+00 1.6687e+00 -1.3333 0.18578
pop 3.5077e-07 1.6777e-07 2.0908 0.03937 *
indicators -2.5696e-01 4.0278e-02 -6.3797 7.509e-09 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1engine argumentAnother new argument is also the engine argument. This
allows us to use an external package to estimate our models. At this
stage, the fixest package can be used.
This also means that we can now use an argument to cluster Standard
Errors using cluster.
fixest_example <- isatpanel(data = EUCO2residential,
formula = lagg.directem_pc ~ lgdp + I(lgdp^2) + pop,
index = c("country","year"),
effect = "twoways",
fesis = TRUE,
engine = "fixest",
cluster = "none")We can verify that, using no clustering of Standard Errors at all, using the fixest package does not change our estimates:
head(fixest_example$isatpanel.result$mean.results)
coef std.error t-stat p-value
lgdp -2.371189e+01 2.914939e+00 -8.13461036 6.615453e-12
I(lgdp^2) 4.088169e-01 5.271872e-02 7.75468191 3.491379e-11
pop -1.300282e-09 1.461405e-08 -0.08897482 9.293391e-01
fesisAustria.2004 4.972763e-02 4.597850e-02 1.08154087 2.829219e-01
fesisBelgium.2004 1.232366e-01 4.756271e-02 2.59103348 1.149348e-02
fesisBelgium.2009 1.010547e-01 5.105385e-02 1.97937523 5.144411e-02Compared to the default estimator:
head(is_lm$isatpanel.result$mean.results)
coef std.error t-stat p-value
lgdp -2.371189e+01 2.914939e+00 -8.13461036 6.615453e-12
I(lgdp^2) 4.088169e-01 5.271872e-02 7.75468191 3.491379e-11
pop -1.300282e-09 1.461405e-08 -0.08897482 9.293391e-01
idBelgium 8.715540e-01 9.241906e-02 9.43045775 2.269246e-14
idDenmark -8.793057e-01 8.029376e-02 -10.95110838 3.232923e-17
idGermany 2.730805e+00 1.614286e+00 1.69164900 9.486428e-02However, changing the cluster specification of course
does. The Standard Error correction with it’s current
implementation is not valid, so allows for many more indicators than
true - clustering is therefore currently not recommended.
fixest_example_cluster <- isatpanel(data = EUCO2residential,
formula = lagg.directem_pc ~ lgdp + I(lgdp^2) + pop,
index = c("country","year"),
effect = "twoways",
fesis = TRUE,
engine = "fixest",
cluster = "individual")plot(fixest_example_cluster)