2 Introduction

In this vignette, we will explore the OmopSketch functions designed to provide an overview of the clinical tables within a CDM object (observation_period, visit_occurrence, condition_occurrence, drug_exposure, procedure_occurrence, device_exposure, measurement, observation, and death). Specifically, there are four key functions that facilitate this:

2.1 Create a mock cdm

Let’s see an example of its functionalities. To start with, we will load essential packages and create a mock cdm using the mockOmopSketch() database.

library(dplyr)
library(OmopSketch)

# Connect to mock database
cdm <- mockOmopSketch()

3 Summarise clinical tables

Let’s now use summariseClinicalTables()from the OmopSketch package to help us have an overview of one of the clinical tables of the cdm (i.e., condition_occurrence).

summarisedResult <- summariseClinicalRecords(cdm, "condition_occurrence")
#> ℹ Adding variables of interest to condition_occurrence.
#> ℹ Summarising records per person in condition_occurrence.
#> ℹ Summarising condition_occurrence: `in_observation`, `standard_concept`,
#>   `source_vocabulary`, `domain_id`, and `type_concept`.

summarisedResult |> print()
#> # A tibble: 20 × 13
#>    result_id cdm_name       group_name group_level      strata_name strata_level
#>        <int> <chr>          <chr>      <chr>            <chr>       <chr>       
#>  1         1 mockOmopSketch omop_table condition_occur… overall     overall     
#>  2         1 mockOmopSketch omop_table condition_occur… overall     overall     
#>  3         1 mockOmopSketch omop_table condition_occur… overall     overall     
#>  4         1 mockOmopSketch omop_table condition_occur… overall     overall     
#>  5         1 mockOmopSketch omop_table condition_occur… overall     overall     
#>  6         1 mockOmopSketch omop_table condition_occur… overall     overall     
#>  7         1 mockOmopSketch omop_table condition_occur… overall     overall     
#>  8         1 mockOmopSketch omop_table condition_occur… overall     overall     
#>  9         1 mockOmopSketch omop_table condition_occur… overall     overall     
#> 10         1 mockOmopSketch omop_table condition_occur… overall     overall     
#> 11         1 mockOmopSketch omop_table condition_occur… overall     overall     
#> 12         1 mockOmopSketch omop_table condition_occur… overall     overall     
#> 13         1 mockOmopSketch omop_table condition_occur… overall     overall     
#> 14         1 mockOmopSketch omop_table condition_occur… overall     overall     
#> 15         1 mockOmopSketch omop_table condition_occur… overall     overall     
#> 16         1 mockOmopSketch omop_table condition_occur… overall     overall     
#> 17         1 mockOmopSketch omop_table condition_occur… overall     overall     
#> 18         1 mockOmopSketch omop_table condition_occur… overall     overall     
#> 19         1 mockOmopSketch omop_table condition_occur… overall     overall     
#> 20         1 mockOmopSketch omop_table condition_occur… overall     overall     
#> # ℹ 7 more variables: variable_name <chr>, variable_level <chr>,
#> #   estimate_name <chr>, estimate_type <chr>, estimate_value <chr>,
#> #   additional_name <chr>, additional_level <chr>

Notice that the output is in the summarised result format.

We can use the arguments to specify which statistics we want to perform. For example, use the argument recordsPerPerson to indicate which estimates you are interested regarding the number of records per person.

summarisedResult <- summariseClinicalRecords(cdm,
  "condition_occurrence",
  recordsPerPerson = c("mean", "sd", "q05", "q95")
)
#> ℹ Adding variables of interest to condition_occurrence.
#> ℹ Summarising records per person in condition_occurrence.
#> ℹ Summarising condition_occurrence: `in_observation`, `standard_concept`,
#>   `source_vocabulary`, `domain_id`, and `type_concept`.

summarisedResult |>
  filter(variable_name == "records_per_person") |>
  select(variable_name, estimate_name, estimate_value)
#> # A tibble: 4 × 3
#>   variable_name      estimate_name estimate_value
#>   <chr>              <chr>         <chr>         
#> 1 records_per_person mean          84            
#> 2 records_per_person q05           70            
#> 3 records_per_person q95           98            
#> 4 records_per_person sd            8.9736

You can further specify if you want to include the number of records in observation (inObservation = TRUE), the number of concepts mapped (standardConcept = TRUE), which types of source vocabulary does the table contain (sourceVocabulary = TRUE), which types of domain does the vocabulary have (domainId = TRUE) or the concept’s type (typeConcept = TRUE).

summarisedResult <- summariseClinicalRecords(cdm,
  "condition_occurrence",
  recordsPerPerson = c("mean", "sd", "q05", "q95"),
  inObservation = TRUE,
  standardConcept = TRUE,
  sourceVocabulary = TRUE,
  domainId = TRUE,
  typeConcept = TRUE
)
#> ℹ Adding variables of interest to condition_occurrence.
#> ℹ Summarising records per person in condition_occurrence.
#> ℹ Summarising condition_occurrence: `in_observation`, `standard_concept`,
#>   `source_vocabulary`, `domain_id`, and `type_concept`.

summarisedResult |>
  select(variable_name, estimate_name, estimate_value) |>
  glimpse()
#> Rows: 17
#> Columns: 3
#> $ variable_name  <chr> "Number subjects", "Number subjects", "Number records",…
#> $ estimate_name  <chr> "count", "percentage", "count", "mean", "q05", "q95", "…
#> $ estimate_value <chr> "100", "100", "8400", "84", "70", "98", "8.9736", "8400…

Additionally, you can also stratify the previous results by sex and age groups:

summarisedResult <- summariseClinicalRecords(cdm,
  "condition_occurrence",
  recordsPerPerson = c("mean", "sd", "q05", "q95"),
  inObservation = TRUE,
  standardConcept = TRUE,
  sourceVocabulary = TRUE,
  domainId = TRUE,
  typeConcept = TRUE,
  sex = TRUE,
  ageGroup = list("<35" = c(0, 34), ">=35" = c(35, Inf))
)
#> ℹ Adding variables of interest to condition_occurrence.
#> ℹ Summarising records per person in condition_occurrence.
#> ℹ Summarising condition_occurrence: `in_observation`, `standard_concept`,
#>   `source_vocabulary`, `domain_id`, and `type_concept`.

summarisedResult |>
  select(variable_name, strata_level, estimate_name, estimate_value) |>
  glimpse()
#> Rows: 153
#> Columns: 4
#> $ variable_name  <chr> "Number subjects", "Number subjects", "Number records",…
#> $ strata_level   <chr> "overall", "overall", "overall", "overall", "overall", …
#> $ estimate_name  <chr> "count", "percentage", "count", "mean", "q05", "q95", "…
#> $ estimate_value <chr> "100", "100", "8400", "84", "70", "98.0500", "8.9736", …

Notice that, by default, the “overall” group will be also included, as well as crossed strata (that means, sex == “Female” and ageGroup == “>35”).

Also, see that the analysis can be conducted for multiple OMOP tables at the same time:

summarisedResult <- summariseClinicalRecords(cdm,
  c("observation_period", "drug_exposure"),
  recordsPerPerson = c("mean", "sd"),
  inObservation = FALSE,
  standardConcept = FALSE,
  sourceVocabulary = FALSE,
  domainId = FALSE,
  typeConcept = FALSE
)
#> ℹ Adding variables of interest to observation_period.
#> ℹ Summarising records per person in observation_period.
#> ℹ Adding variables of interest to drug_exposure.
#> ℹ Summarising records per person in drug_exposure.

summarisedResult |>
  select(group_level, variable_name, estimate_name, estimate_value) |>
  glimpse()
#> Rows: 10
#> Columns: 4
#> $ group_level    <chr> "observation_period", "observation_period", "observatio…
#> $ variable_name  <chr> "Number subjects", "Number subjects", "Number records",…
#> $ estimate_name  <chr> "count", "percentage", "count", "mean", "sd", "count", …
#> $ estimate_value <chr> "100", "100", "100", "1", "0", "100", "100", "21600", "…

We can also filter the clinical table to a specific time window by setting the dateRange argument.

summarisedResult <- summariseClinicalRecords(cdm, "drug_exposure",
  dateRange = as.Date(c("1990-01-01", "2010-01-01"))) 
#> ℹ Adding variables of interest to drug_exposure.
#> ℹ Summarising records per person in drug_exposure.
#> ℹ Summarising drug_exposure: `in_observation`, `standard_concept`,
#>   `source_vocabulary`, `domain_id`, and `type_concept`.

summarisedResult |>
  omopgenerics::settings()|>
  glimpse()
#> Rows: 1
#> Columns: 10
#> $ result_id          <int> 1
#> $ result_type        <chr> "summarise_clinical_records"
#> $ package_name       <chr> "OmopSketch"
#> $ package_version    <chr> "0.5.1"
#> $ group              <chr> "omop_table"
#> $ strata             <chr> ""
#> $ additional         <chr> ""
#> $ min_cell_count     <chr> "0"
#> $ study_period_end   <chr> "2010-01-01"
#> $ study_period_start <chr> "1990-01-01"

3.1 Tidy the summarised object

tableClinicalRecords() will help you to tidy the previous results and create a gt table.

summarisedResult <- summariseClinicalRecords(cdm,
  "condition_occurrence",
  recordsPerPerson = c("mean", "sd", "q05", "q95"),
  inObservation = TRUE,
  standardConcept = TRUE,
  sourceVocabulary = TRUE,
  domainId = TRUE,
  typeConcept = TRUE,
  sex = TRUE
)
#> ℹ Adding variables of interest to condition_occurrence.
#> ℹ Summarising records per person in condition_occurrence.
#> ℹ Summarising condition_occurrence: `in_observation`, `standard_concept`,
#>   `source_vocabulary`, `domain_id`, and `type_concept`.

summarisedResult |>
  tableClinicalRecords()
Variable name Variable level Estimate name
Database name
mockOmopSketch
condition_occurrence; overall
Number records - N 8,400.00
Number subjects - N (%) 100 (100.00%)
Records per person - Mean (SD) 84.00 (8.97)
q05 70.00
q95 98.05
In observation Yes N (%) 8,400 (100.00%)
Domain Condition N (%) 8,400 (100.00%)
Source vocabulary No matching concept N (%) 8,400 (100.00%)
Standard concept S N (%) 8,400 (100.00%)
Type concept id Unknown type concept: 1 N (%) 8,400 (100.00%)
condition_occurrence; Female
Number records - N 4,424.00
Number subjects - N (%) 52 (100.00%)
Records per person - Mean (SD) 85.08 (8.33)
q05 71.55
q95 98.45
In observation Yes N (%) 4,424 (100.00%)
Domain Condition N (%) 4,424 (100.00%)
Source vocabulary No matching concept N (%) 4,424 (100.00%)
Standard concept S N (%) 4,424 (100.00%)
Type concept id Unknown type concept: 1 N (%) 4,424 (100.00%)
condition_occurrence; Male
Number records - N 3,976.00
Number subjects - N (%) 48 (100.00%)
Records per person - Mean (SD) 82.83 (9.57)
q05 70.00
q95 96.65
In observation Yes N (%) 3,976 (100.00%)
Domain Condition N (%) 3,976 (100.00%)
Source vocabulary No matching concept N (%) 3,976 (100.00%)
Standard concept S N (%) 3,976 (100.00%)
Type concept id Unknown type concept: 1 N (%) 3,976 (100.00%)

4 Summarise record counts

OmopSketch can also help you to summarise the trend of the records of an OMOP table. See the example below, where we use summariseRecordCount() to count the number of records within each year, and then, we use plotRecordCount() to create a ggplot with the trend. We can also use tableRecordCount() to display results in a table of type gt, reactable or datatable. By default it creates a gt table.

summarisedResult <- summariseRecordCount(cdm, "drug_exposure", interval = "years")

summarisedResult |> tableRecordCount(type = "gt")
Time interval
mockOmopSketch
Number records
drug_exposure 1951-01-01 to 1951-12-31 11
1952-01-01 to 1952-12-31 7
1953-01-01 to 1953-12-31 19
1954-01-01 to 1954-12-31 19
1955-01-01 to 1955-12-31 50
1956-01-01 to 1956-12-31 45
1957-01-01 to 1957-12-31 68
1958-01-01 to 1958-12-31 75
1959-01-01 to 1959-12-31 91
1960-01-01 to 1960-12-31 92
1961-01-01 to 1961-12-31 111
1962-01-01 to 1962-12-31 99
1963-01-01 to 1963-12-31 92
1964-01-01 to 1964-12-31 108
1965-01-01 to 1965-12-31 113
1966-01-01 to 1966-12-31 337
1967-01-01 to 1967-12-31 317
1968-01-01 to 1968-12-31 159
1969-01-01 to 1969-12-31 119
1970-01-01 to 1970-12-31 133
1971-01-01 to 1971-12-31 163
1972-01-01 to 1972-12-31 193
1973-01-01 to 1973-12-31 194
1974-01-01 to 1974-12-31 186
1975-01-01 to 1975-12-31 150
1976-01-01 to 1976-12-31 192
1977-01-01 to 1977-12-31 266
1978-01-01 to 1978-12-31 395
1979-01-01 to 1979-12-31 229
1980-01-01 to 1980-12-31 244
1981-01-01 to 1981-12-31 240
1982-01-01 to 1982-12-31 211
1983-01-01 to 1983-12-31 176
1984-01-01 to 1984-12-31 130
1985-01-01 to 1985-12-31 125
1986-01-01 to 1986-12-31 144
1987-01-01 to 1987-12-31 359
1988-01-01 to 1988-12-31 546
1989-01-01 to 1989-12-31 377
1990-01-01 to 1990-12-31 505
1991-01-01 to 1991-12-31 829
1992-01-01 to 1992-12-31 515
1993-01-01 to 1993-12-31 342
1994-01-01 to 1994-12-31 282
1995-01-01 to 1995-12-31 282
1996-01-01 to 1996-12-31 272
1997-01-01 to 1997-12-31 528
1998-01-01 to 1998-12-31 390
1999-01-01 to 1999-12-31 611
2000-01-01 to 2000-12-31 608
2001-01-01 to 2001-12-31 687
2002-01-01 to 2002-12-31 869
2003-01-01 to 2003-12-31 601
2004-01-01 to 2004-12-31 1011
2005-01-01 to 2005-12-31 412
2006-01-01 to 2006-12-31 109
2007-01-01 to 2007-12-31 277
2008-01-01 to 2008-12-31 710
2009-01-01 to 2009-12-31 500
2010-01-01 to 2010-12-31 891
2011-01-01 to 2011-12-31 482
2012-01-01 to 2012-12-31 224
2013-01-01 to 2013-12-31 137
2014-01-01 to 2014-12-31 319
2015-01-01 to 2015-12-31 348
2016-01-01 to 2016-12-31 347
2017-01-01 to 2017-12-31 150
2018-01-01 to 2018-12-31 732
2019-01-01 to 2019-12-31 1045
overall 21600

Note that you can adjust the time interval period using the interval argument, which can be set to either “years”, “months” or “quarters”. See the example below, where it shows the number of records every 18 months:

summariseRecordCount(cdm, "drug_exposure", interval = "quarters") |>
  plotRecordCount()

We can further stratify our counts by sex (setting argument sex = TRUE) or by age (providing an age group). Notice that in both cases, the function will automatically create a group called overall with all the sex groups and all the age groups.

summariseRecordCount(cdm, "drug_exposure",
  interval = "months",
  sex = TRUE,
  ageGroup = list(
    "<30" = c(0, 29),
    ">=30" = c(30, Inf)
  )
) |>
  plotRecordCount()

By default, plotRecordCount() does not apply faceting or colour to any variables. This can result confusing when stratifying by different variables, as seen in the previous picture. We can use VisOmopResults package to help us know by which columns we can colour or face by:

summariseRecordCount(cdm, "drug_exposure",
  interval = "months",
  sex = TRUE,
  ageGroup = list(
    "0-29" = c(0, 29),
    "30-Inf" = c(30, Inf)
  )
) |>
  visOmopResults::tidyColumns()
#> [1] "cdm_name"       "omop_table"     "age_group"      "sex"           
#> [5] "variable_name"  "variable_level" "count"          "time_interval" 
#> [9] "interval"

Then, we can simply specify this by using the facet and colour arguments from plotRecordCount()

summariseRecordCount(cdm, "drug_exposure",
  interval = "months",
  sex = TRUE,
  ageGroup = list(
    "0-29" = c(0, 29),
    "30-Inf" = c(30, Inf)
  )
) |>
  plotRecordCount(facet = omop_table ~ age_group, colour = "sex")

We can also filter the clinical table to a specific time window by setting the dateRange argument.

summariseRecordCount(cdm, "drug_exposure",
  interval = "years",
  sex = TRUE, 
  dateRange = as.Date(c("1990-01-01", "2010-01-01"))) |>
  tableRecordCount(type = "gt")
Time interval Sex
mockOmopSketch
Number records
drug_exposure 1990-01-01 to 1990-12-31 overall 505
Male 297
Female 208
1991-01-01 to 1991-12-31 overall 829
Male 581
Female 248
1992-01-01 to 1992-12-31 overall 515
Male 295
Female 220
1993-01-01 to 1993-12-31 overall 342
Female 259
Male 83
1994-01-01 to 1994-12-31 overall 282
Male 50
Female 232
1995-01-01 to 1995-12-31 overall 282
Female 162
Male 120
1996-01-01 to 1996-12-31 overall 272
Female 124
Male 148
1997-01-01 to 1997-12-31 overall 528
Female 169
Male 359
1998-01-01 to 1998-12-31 overall 390
Female 251
Male 139
1999-01-01 to 1999-12-31 overall 611
Male 232
Female 379
2000-01-01 to 2000-12-31 overall 608
Male 221
Female 387
2001-01-01 to 2001-12-31 overall 687
Female 540
Male 147
2002-01-01 to 2002-12-31 overall 869
Male 608
Female 261
2003-01-01 to 2003-12-31 overall 601
Male 449
Female 152
2004-01-01 to 2004-12-31 overall 1011
Female 301
Male 710
2005-01-01 to 2005-12-31 overall 412
Female 157
Male 255
2006-01-01 to 2006-12-31 overall 109
Male 63
Female 46
2007-01-01 to 2007-12-31 overall 277
Male 14
Female 263
2008-01-01 to 2008-12-31 overall 710
Female 404
Male 306
2009-01-01 to 2009-12-31 overall 500
Male 163
Female 337
overall overall 10340
Male 5240
Female 5100

Finally, disconnect from the cdm

PatientProfiles::mockDisconnect(cdm = cdm)