datareportR consists of a small but powerful function, render_data_report(), which takes input data and outputs a “data report” as a pdf or HTML document. The generated report allows an analyst to:
View summary statistics on the entire dataset, such as the mean and quartiles of numeric variables, and number of unique values for categorical variables.
Compare the data with an old version and to see a printout of differences.
Both of these functions are explained in more detail below.
datareportR can generate a view of your data that summarizes
numerical and categorical columns (e.g., quantiles, mean, and standard
deviation for numeric columns and category frequency for categorical
columns), as well as presenting key metadata associated with the
dataset, such as number of rows, number of columns, and number of
numeric/categorical columns. Under the hood, it uses the powerful
skim::skimr() function.
A sample call on the iris dataset is as follows and
renders the following output:
datareportR also allows you to compare an input dataset with another
dataset and to inspect the differences. This provides you a readout of
(a) columns with differing values, along with the number of differing
values, and (b) the first 10 rows of differences for each column.Under
the hood, this uses the diffdf::diffdf() function.
Below is a sample call on the iris dataset and comparing
it to a permuted version:
set.seed(12345)
iris = datasets::iris
iris_permuted = iris
# Randomly switch the order of the Species column
iris_permuted$Species <- sample(iris$Species)
datareportR::render_data_report(
df_input = iris,
df_input_old = iris_permuted,
save_report_dir = tempdir()
)By default, both the data overview and data comparison sections of
the report are included. You can choose to not include one section by
specifying include_skim = FALSE or
include_diffdf = FALSE in the function all. Note that you
cannot set both options to be FALSE; datareportR will
complain and throw an error if so.
For example, to disable the data comparison section, use the following function call:
Currently supported output formats for the report are HTML and pdf.
output_format in the function call must be specified as
“html” (the default) or “pdf”. To save the report as a pdf, for example,
use this following function call:
datareportR allows you to control the save location for (a) the raw
data report markdown file and (b) the final data report. These are
specified via the save_rmd_dir/save_rmd_file
and save_report_dir/save_report_file
arguments, respectively. By default, all arguments are configured to
save the Markdown file and output report in the current working
directory.
If you’d like to save the underlying RMarkdown file for the data
report, specify a save_rmd_dir. This argument accepts the
following:
NULL: the underlying file will not be saved to
disk.
Folder path: e.g., output/: the underlying file will
be saved to that folder path.
Similarly, save_rmd_file accepts a full file path with
.Rmd extension: e.g., output/data_report.Rmd. In this case,
render_data_report() will save the underlying RMarkdown
file to the directory in the file name. This option allows you to change
the name of the .Rmd file saved. A .Rmd extension will be
appended automatically if it is not detected.
You can specify both save_rmd_dir and
save_rmd_file, but they must be consistent with each other
(i.e., have the same folder path). datareportR will complain and throw
an error otherwise.
Similarly, save_report_dir accepts the following:
NULL: the data report will be saved in the current
working directory.
Folder path: the report will be saved to that folder.
Specify save_report_file as a full file path e.g.,
output/data_report.html, to save the output report to that
file. A .html or .pdf extension will be appended automatically if it is
not detected. This option allows you to change the name of the saved
report.
For instance, to save the data report RMarkdown file in the
output directory and the report itself as
iris_report.html in the output directory, use
the following function call:
Note that datareportR will complain if any directories it is being asked to save to do not exist.