% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/Unidiff.R
\name{uni_compare}
\alias{uni_compare}
\title{Compare data frames and Plot Differences}
\usage{
uni_compare(
  dfs,
  benchmarks,
  variables = NULL,
  nboots = 2000,
  n_bench = NULL,
  boot_all = FALSE,
  funct = "rel_mean",
  data = TRUE,
  type = "comparison",
  legendlabels = NULL,
  legendtitle = NULL,
  colors = NULL,
  shapes = NULL,
  summetric = "rmse2",
  label_x = NULL,
  label_y = NULL,
  plot_title = NULL,
  varlabels = NULL,
  name_dfs = NULL,
  name_benchmarks = NULL,
  summet_size = 4,
  silence = TRUE,
  conf_level = 0.95,
  conf_adjustment = NULL,
  percentile_ci = TRUE,
  weight = NULL,
  id = NULL,
  strata = NULL,
  weight_bench = NULL,
  id_bench = NULL,
  strata_bench = NULL,
  adjustment_weighting = "raking",
  adjustment_vars = NULL,
  raking_targets = NULL,
  post_targets = NULL,
  ndigits = 3,
  parallel = FALSE
)
}
\arguments{
\item{dfs}{A character vector containing the names of data frames to compare against the benchmarks.}

\item{benchmarks}{A character vector containing the names of benchmarks to compare the data frames against.
The vector must either be the same length as \code{dfs}, or length 1. If it has length 1 every
df will be compared against the same benchmark. Benchmarks can either be the name of data frames,
the name of a list of tables, or a named vector of means. The tables in the list need to be named as the respective variables
in the data frame of comparison. When they are a named vector of means, the means need to be named as the respective variables
in the dfs.}

\item{variables}{A character vector containing the names of the variables for the comparison. If NULL,
all variables named similarly in both the \code{dfs} and the benchmarks will be compared. Variables missing
in one of the data frames or the benchmarks will be neglected for this comparison.}

\item{nboots}{The number of bootstraps used to calculate standard errors. Must either be >2 or 0.
If >2 bootstrapping is used to calculate standard errors with \code{nboots} iterations. If 0, SE
is calculated analytically. We do not recommend using \code{nboots} =0 because this method is not
yet suitable for every \code{funct} used and every method. Depending on the size of the data and the
number of bootstraps, \code{uni_compare} can take a while.}

\item{n_bench}{A list of vectors containing the number of cases for every variable in the benchmark.
This is only needed, if the benchmark is given as a vector. The list should be as long as the number of dataframes}

\item{boot_all}{If TURE, both, dfs and benchmarks will be bootstrapped. Otherwise
the benchmark estimate is assumed to be constant.}

\item{funct}{A character string, indicating the function to calculate the
difference between the data frames.

Predefined functions are:
\itemize{
\item \code{"d_mean"}, \code{"ad_mean"} A function to calculate the (absolute) difference in mean of
the variables in \code{dfs} and benchmarks with the same name. Only applicable for
metric variables.
\item \code{"d_prop"}, \code{"ad_prop"} A function to calculate the (absolute) difference in proportions of
the variables in \code{dfs} and benchmarks with the same name. Only applicable for dummy
variables.
\item \code{"rel_mean"}, \code{"abs_rel_mean"} A function to calculate the (absolute)
relative difference in mean of the variables in \code{dfs} and benchmarks with the same name.
#' For more information on the formula for difference and analytic variance, see Felderer
et al. (2019). Only applicable for metric variables.
\item \code{"rel_prop"}, \code{"abs_rel_prop"} A function to calculate the (absolute)
relative difference in proportions of the variables in \code{dfs} and benchmarks with
the same name. It is calculated similar to the relative difference in mean
(see Felderer et al., 2019), however the default label for the plot is different.
Only applicable for dummy variables.
}}

\item{data}{If TRUE, a uni_compare_object is returned, containing results of the comparison.}

\item{type}{Define the type of comparison. Can either be \code{"comparison"} or \code{"nonresponse"}.}

\item{legendlabels}{A character string or vector of strings containing a label for the
legend.}

\item{legendtitle}{A character string containing the title of the legend.}

\item{colors}{A vector of colors, that is used in the plot for the
different comparisons.}

\item{shapes}{A vector of shapes applicable in \code{\link[ggplot2:ggplot2-package]{ggplot2::ggplot2()}}, that is
used in the plot for the different comparisons.}

\item{summetric}{If \code{"avg1"}, \code{"mse1"}, \code{"rmse1"}, or \code{"R"}
the respective measure is calculated for the biases of each survey. The values
\code{"mse1"} and \code{"rmse1"} lead to similar results as in \code{"mse2"} and \code{"rmse2"},
with slightly different visualization in the plot. If \code{summetric = NULL}, no summetric
will be displayed in the Plot. When \code{"R"} is chosen, also \code{response_identificator}
is needed.}

\item{label_x, label_y}{A character string or vector of character strings containing a label for
the x-axis and y-axis.}

\item{plot_title}{A character string containing the title of the plot.}

\item{varlabels}{A character string or vector of character strings containing the new names of
variables, also used in plot.}

\item{name_dfs, name_benchmarks}{A character string or vector of character strings containing the
new names of the \code{dfs} and \code{benchmarks}, that is also used in plot.}

\item{summet_size}{A number to determine the size of the displayed \code{summetric} in the plot.}

\item{silence}{If \code{silence = FALSE} a warning will be displayed, if variables a
re excluded from either the data frame or benchmark, for not existing in both.}

\item{conf_level}{A numeric value between zero and one to determine the confidence level of the confidence
interval.}

\item{conf_adjustment}{If \code{conf_adjustment = TRUE} the confidence level of the confidence interval will be
adjusted with a Bonferroni adjustment, to account for the problem of multiple comparisons.}

\item{percentile_ci}{If TURE, cofidence intervals will be calculated using the percentile method.
If False, they will be calculated using the normal method.}

\item{weight, weight_bench}{A character vector determining variables to weight the \code{dfs} or
\code{benchmarks}. They have to be part of the respective data frame. If only one character is provided,
the same variable is used to weigh every \code{df} or \code{benchmark}. If a
weight variable is provided also an \code{id} variable is needed.For
weighting, the \code{survey} package is used.}

\item{id, id_bench}{A character vector determining \code{id} variables used
to weigh the \code{dfs} or \code{benchmarks} with the help of the
\code{survey} package. They have to be part of the respective data frame. If
only one character is provided, the same variable is used to weigh every
\code{df} or \code{benchmark}.}

\item{strata, strata_bench}{A character vector determining strata variables
used to weigh the \code{dfs} or \code{benchmarks} with the help of the
\code{survey} package.They have to be part of the respective data frame.
If only one character is provided, the same variable is used to weight every
\code{df} or \code{benchmark}.}

\item{adjustment_weighting}{A character vector indicating if adjustment
weighting should be used. It can either be \code{"raking"} or \code{"post_start"}.}

\item{adjustment_vars}{Variables used to adjust the survey when using raking
or post stratification.}

\item{raking_targets}{A list of raking targets that can be given to the rake
function of \code{\link[survey]{rake}}, to rake the \code{dfs}.}

\item{post_targets}{A list of post-stratification targets that can be given to the
\code{\link[survey]{postStratify}} function, to post-stratify the \code{dfs}.}

\item{ndigits}{The number of digits to round the numbers in the plot.}

\item{parallel}{Can be either \code{FALSE} or a number of cores that should
be used in the function. If it is \code{FALSE}, only one core will be used and
otherwise the given number of cores will be used.}
}
\value{
A plot based on \code{\link[ggplot2:ggplot2-package]{ggplot2::ggplot2()}} (or data frame if data==TRUE)
which shows the difference between two or more data frames on predetermined variables,
named identical in both data frames.
}
\description{
Returns data or a plot showing the difference of two or more
data frames The differences are calculated on the base of
differing metrics, chosen in the funct argument. All used data frames must
contain at least one column named equal in all data frames, that has equal
values.
}
\examples{

## Get Data for comparison

data("card")

north<-card[card$south==0,]
white<-card[card$black==0,]

## use the function to plot the data 
univar_comp<-sampcompR::uni_compare(dfs = c("north","white"),
                                    benchmarks = c("card","card"),
                                    variables= c("age","educ","fatheduc","motheduc","wage","IQ"),
                                    funct = "abs_rel_mean",
                                    nboots=200,
                                    summetric="rmse2",
                                    data=FALSE)

 univar_comp
 
}
\references{
Felderer, B., Kirchner, A., & Kreuter, FALSE. (2019). The Effect of Survey Mode on Data
Quality: Disentangling Nonresponse and Measurement Error Bias. Journal of Official
Statistics, 35(1), 93–115. https://doi.org/10.2478/jos-2019-0005
}
