The package provides functionalities to tidy a summarised result to obtain a dataframe with which is easier to do subsequent calculations.
In this line, the split functions, described in
split and unite functions allow to interact with
name-level columns.
For the estimates, we have the pivotEstimates function,
and for the settings pivotSettings. Finally the
tidy method accommodates the split and pivot
functionalities in the same function.
First, let’s load relevant libraries and create a mock summarised result table.
library(visOmopResults)
library(dplyr)
result <- mockSummarisedResult()
result |> glimpse()
#> Rows: 126
#> Columns: 16
#> $ result_id <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
#> $ cdm_name <chr> "mock", "mock", "mock", "mock", "mock", "mock", "mock…
#> $ result_type <chr> "mock_summarised_result", "mock_summarised_result", "…
#> $ package_name <chr> "visOmopResults", "visOmopResults", "visOmopResults",…
#> $ package_version <chr> "0.2.1", "0.2.1", "0.2.1", "0.2.1", "0.2.1", "0.2.1",…
#> $ group_name <chr> "cohort_name", "cohort_name", "cohort_name", "cohort_…
#> $ group_level <chr> "cohort1", "cohort1", "cohort1", "cohort1", "cohort1"…
#> $ strata_name <chr> "overall", "age_group &&& sex", "age_group &&& sex", …
#> $ strata_level <chr> "overall", "<40 &&& Male", ">=40 &&& Male", "<40 &&& …
#> $ variable_name <chr> "number subjects", "number subjects", "number subject…
#> $ variable_level <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
#> $ estimate_name <chr> "count", "count", "count", "count", "count", "count",…
#> $ estimate_type <chr> "integer", "integer", "integer", "integer", "integer"…
#> $ estimate_value <chr> "4899667", "1557137", "230180", "4279376", "341416", …
#> $ additional_name <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ additional_level <chr> "overall", "overall", "overall", "overall", "overall"…The function pivotEstimates adds columns containing the
estimates values for each combination of columns in
pivotEstimatesBy. For instance, in the following example we
use the columns variable_name, variable_level, and
estimate_name to pivot the estimates.
result |>
pivotEstimates(pivotEstimatesBy = c("variable_name", "variable_level", "estimate_name")) |>
glimpse()
#> Rows: 18
#> Columns: 18
#> $ result_id <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
#> $ cdm_name <chr> "mock", "mock", "mock", "mock", "mo…
#> $ result_type <chr> "mock_summarised_result", "mock_sum…
#> $ package_name <chr> "visOmopResults", "visOmopResults",…
#> $ package_version <chr> "0.2.1", "0.2.1", "0.2.1", "0.2.1",…
#> $ group_name <chr> "cohort_name", "cohort_name", "coho…
#> $ group_level <chr> "cohort1", "cohort1", "cohort1", "c…
#> $ strata_name <chr> "overall", "age_group &&& sex", "ag…
#> $ strata_level <chr> "overall", "<40 &&& Male", ">=40 &&…
#> $ additional_name <chr> "overall", "overall", "overall", "o…
#> $ additional_level <chr> "overall", "overall", "overall", "o…
#> $ `number subjects_NA_count` <int> 4899667, 1557137, 230180, 4279376, …
#> $ age_NA_mean <dbl> 45.00919, 66.04876, 82.00216, 59.45…
#> $ age_NA_sd <dbl> 4.906876, 5.706327, 1.251627, 9.679…
#> $ Medications_Amoxiciline_count <int> 34243, 60972, 92885, 14238, 54725, …
#> $ Medications_Amoxiciline_percentage <dbl> 98.175316, 88.166081, 53.749952, 80…
#> $ Medications_Ibuprofen_count <int> 24412, 16138, 78225, 43763, 83941, …
#> $ Medications_Ibuprofen_percentage <dbl> 55.764227, 62.955280, 62.126432, 27…The argument nameStyle is to customise the names of the
new columns. It uses the glue package syntax. For instance:
result |>
pivotEstimates(pivotEstimatesBy = "estimate_name",
nameStyle = "{toupper(estimate_name)}") |>
glimpse()
#> Rows: 72
#> Columns: 17
#> $ result_id <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
#> $ cdm_name <chr> "mock", "mock", "mock", "mock", "mock", "mock", "mock…
#> $ result_type <chr> "mock_summarised_result", "mock_summarised_result", "…
#> $ package_name <chr> "visOmopResults", "visOmopResults", "visOmopResults",…
#> $ package_version <chr> "0.2.1", "0.2.1", "0.2.1", "0.2.1", "0.2.1", "0.2.1",…
#> $ group_name <chr> "cohort_name", "cohort_name", "cohort_name", "cohort_…
#> $ group_level <chr> "cohort1", "cohort1", "cohort1", "cohort1", "cohort1"…
#> $ strata_name <chr> "overall", "age_group &&& sex", "age_group &&& sex", …
#> $ strata_level <chr> "overall", "<40 &&& Male", ">=40 &&& Male", "<40 &&& …
#> $ variable_name <chr> "number subjects", "number subjects", "number subject…
#> $ variable_level <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
#> $ additional_name <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ additional_level <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ COUNT <int> 4899667, 1557137, 230180, 4279376, 341416, 6125243, 1…
#> $ MEAN <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
#> $ SD <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
#> $ PERCENTAGE <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…The function pivotSettings adds a new column for each of
the settings in the summarised result, if any:
mockSummarisedResult(settings = TRUE) |>
pivotSettings() |>
glimpse()
#> Rows: 126
#> Columns: 17
#> $ result_id <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
#> $ cdm_name <chr> "mock", "mock", "mock", "mock", "mock", "mock", "mock…
#> $ result_type <chr> "mock_summarised_result", "mock_summarised_result", "…
#> $ package_name <chr> "visOmopResults", "visOmopResults", "visOmopResults",…
#> $ package_version <chr> "0.2.1", "0.2.1", "0.2.1", "0.2.1", "0.2.1", "0.2.1",…
#> $ group_name <chr> "cohort_name", "cohort_name", "cohort_name", "cohort_…
#> $ group_level <chr> "cohort1", "cohort1", "cohort1", "cohort1", "cohort1"…
#> $ strata_name <chr> "overall", "age_group &&& sex", "age_group &&& sex", …
#> $ strata_level <chr> "overall", "<40 &&& Male", ">=40 &&& Male", "<40 &&& …
#> $ variable_name <chr> "number subjects", "number subjects", "number subject…
#> $ variable_level <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
#> $ estimate_name <chr> "count", "count", "count", "count", "count", "count",…
#> $ estimate_type <chr> "integer", "integer", "integer", "integer", "integer"…
#> $ estimate_value <chr> "9715278", "8167184", "7595780", "8059089", "7683359"…
#> $ additional_name <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ additional_level <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ mock_default <lgl> TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE,…The function appendSettings is the inverse functionality
to pivot. This function will append columns corresponding to settings to
the end summarised result:
table <- mockSummarisedResult() |>
mutate(mockSummarisedResult = TRUE, vignette = "tidy")
result <- table |> appendSettings(colsSettings = c("mockSummarisedResult", "vignette"))
result |> filter(variable_name == "settings") |> glimpse()
#> Rows: 2
#> Columns: 16
#> $ result_id <int> 1, 1
#> $ cdm_name <chr> "mock", "mock"
#> $ result_type <chr> "mock_summarised_result", "mock_summarised_result"
#> $ package_name <chr> "visOmopResults", "visOmopResults"
#> $ package_version <chr> "0.2.1", "0.2.1"
#> $ group_name <chr> "overall", "overall"
#> $ group_level <chr> "overall", "overall"
#> $ strata_name <chr> "overall", "overall"
#> $ strata_level <chr> "overall", "overall"
#> $ variable_name <chr> "settings", "settings"
#> $ variable_level <chr> NA, NA
#> $ estimate_name <chr> "mockSummarisedResult", "vignette"
#> $ estimate_type <chr> "logical", "character"
#> $ estimate_value <chr> "TRUE", "tidy"
#> $ additional_name <chr> "overall", "overall"
#> $ additional_level <chr> "overall", "overall"Finally, the method tidy incorporates the splitting pf
name-level columns and pivotting of estimates and settings. By default,
it splits group, strata and additional, pivots estimates by the columns
“estimate_name” and also pivots the settings.
result <- mockSummarisedResult(settings = TRUE)
result |>
tidy() |>
glimpse()
#> Rows: 72
#> Columns: 15
#> $ result_id <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
#> $ cdm_name <chr> "mock", "mock", "mock", "mock", "mock", "mock", "mock"…
#> $ result_type <chr> "mock_summarised_result", "mock_summarised_result", "m…
#> $ package_name <chr> "visOmopResults", "visOmopResults", "visOmopResults", …
#> $ package_version <chr> "0.2.1", "0.2.1", "0.2.1", "0.2.1", "0.2.1", "0.2.1", …
#> $ cohort_name <chr> "cohort1", "cohort1", "cohort1", "cohort1", "cohort1",…
#> $ age_group <chr> "overall", "<40", ">=40", "<40", ">=40", "overall", "o…
#> $ sex <chr> "overall", "Male", "Male", "Female", "Female", "Male",…
#> $ variable_name <chr> "number subjects", "number subjects", "number subjects…
#> $ variable_level <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
#> $ count <int> 819876, 9032951, 1567835, 937791, 1733838, 1055507, 53…
#> $ mean <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
#> $ sd <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
#> $ percentage <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
#> $ mock_default <lgl> TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, …Which column pairs to split can be customised with the split
arguments, while pivotEstimatesBy and
nameStyle are for pivotting estimates. If
pivotEstimatesBy is NULL or
character(), estimates will not be modified. Settings will
always be pivotted if present.