saeSim is a Package for the R language. It is developed to make the data simulation process more compact and yet flexible enough for customization. It is designed to suffice in the context of small area estimation.
Consider a linear mixed model. It contains two components. A fixed effects part, and an error component. The error component can be split into a random effects part and a model error. All components to be simulated can simply be added to data.
library(saeSim)
setup <- sim_base() %>% sim_gen_x() %>% sim_gen_e() %>%
sim_resp_eq(y = 100 + 2 * x + e) %>% sim_simName("Doku")
setup
## idD idU x e y
## 1 1 1 -1.25381 2.93784 100.43
## 2 1 2 0.54743 -2.12883 98.97
## 3 1 3 -1.40585 1.01607 98.20
## 4 1 4 -1.77577 -2.26078 94.19
## 5 1 5 -0.43167 0.07433 99.21
## 6 1 6 0.08791 -0.62434 99.55
sim_base() is responsible to supply a data.frame to which variables can be added. The default is to create a data.frame with indicator variables idD and idU (2-level-model), which uniquly identify observations. sim_resp will add a variable y as response.
sim. It will return a list containing data.frames as elements:dataList <- sim(setup, R = 10)
simData <- sim_base() %>% sim_gen_x() %>% sim_gen_e() %>% as.data.frame
For the simulation set-up you start with the 'base', which is just a data.frame.
base_id(nDomains, nUnits) - nDomains specify the number of domains/cluster/areas in the data - nUnits the number of observations in each domain.This section gives an overview of data which can be generated.
There are several pre-configured set-ups you can use for an easy start.
sim_base_lm() will return a simulation set-up for a linear model, i.e. one regressor and one error component.sim_base_lmm() a linear mixed modelsim_base_lmc() a linear model with contamination in the model error (5% outliers in each area)sim_base_lmmc() a linear mixed model with contamination in the model error and random effect (5% outliers in each area and 5% of the areas are outliers)If you are interested in aggregated information you can either draw directly from the model when specifying nUnits = 1 or use the aggregate component. Aggregating the data is another component which can be used on the population or sample. The aggregation will simply be done after the sampling, if you haven't specified any sampling component, the population is aggregated (makes sense if you draw samples directly from the model).
sim_base_lm() %>% sim_agg()
## Source: local data frame [6 x 4]
##
## idD x e y
## 1 1 0.69208 -0.87883 99.81
## 2 2 0.07763 -0.88760 99.19
## 3 3 -0.07306 0.53755 100.46
## 4 4 -0.38365 -0.07519 99.54
## 5 5 -0.12094 -0.47715 99.40
## 6 6 -0.82195 0.25876 99.44
You will want to check your results regularly to see how things will work out. When working with sim_setup objects there are some methods supplied to do that, without simulating redundant data all the time:
show - this is the print method for S4-Classes. You don't have to call show explicitlyplot - will call smoothScatter for visualizing the dataautoplot - Will imitate smoothScatter with ggplot2setup <- sim_base_lmm()
plot(setup)
library(ggplot2)
autoplot(setup)
autoplot(setup, "e")
autoplot(setup %>% sim_gen_vc())