2 Introduction

In this vignette, we explore how the OmopSketch function databaseCharacteristics() and shinyCharacteristics() can serve as a valuable tool for characterising databases containing electronic health records mapped to the OMOP Common Data Model.

2.1 Create a mock cdm

We begin by loading the necessary packages and creating a mock CDM using the mockOmopSketch() function:

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(OmopSketch)

cdm <- mockOmopSketch()

cdm
#> 
#> ── # OMOP CDM reference (duckdb) of mockOmopSketch ─────────────────────────────
#> • omop tables: person, observation_period, cdm_source, concept, vocabulary,
#> concept_relationship, concept_synonym, concept_ancestor, drug_strength,
#> condition_occurrence, death, drug_exposure, measurement, observation,
#> procedure_occurrence, visit_occurrence, device_exposure
#> • cohort tables: -
#> • achilles tables: -
#> • other tables: -

3 Summarise database characteristics

The databaseCharacteristics() function provides a comprehensive summary of the CDM, returning a summarised result that includes:

result <- databaseCharacteristics(cdm)

3.1 Selecting tables to characterise

By default, the following OMOP tables are included in the characterisation: person, observation_period, visit_occurrence, condition_occurrence, drug_exposure, procedure_occurrence, device_exposure, measurement, observation, death.

You can customise which tables to include in the analysis by specifying them with the omopTableName argument.

result <- databaseCharacteristics(cdm, omopTableName = c("drug_exposure", "condition_occurrence"))

3.2 Stratifying by Sex

To stratify the characterisation results by sex, set the sex argument to TRUE:

result <- databaseCharacteristics(cdm, omopTableName = c("drug_exposure", "condition_occurrence"),
                                  sex = TRUE)

3.3 Stratifying by Age Group

You can choose to characterise the data stratifying by age group by creating a list defining the age groups you want to use.

result <- databaseCharacteristics(cdm, omopTableName = c("drug_exposure", "condition_occurrence"),
                                  ageGroup = list(c(0,50), c(51,100)))

3.4 Filtering by date range and time interval

Use the dateRange argument to limit the analysis to a specific period. Combine it with the interval argument to stratify results by time. Valid values for interval include “overall” (default), “years”, “quarters”, and “months”:

result <- databaseCharacteristics(cdm,
                                 interval = "years",
                                 dateRange = as.Date(c("2010-01-01", "2018-12-31")))

3.5 Including Concept Counts

To include concept counts in the characterisation, set conceptIdCounts = TRUE:

result <- databaseCharacteristics(cdm,
                                  conceptIdCounts = TRUE)

4 Visualise the characterisation results

To explore the characterisation results interactively, you can use the shinyCharacteristics() function. This function generates a Shiny application in the specified directory, allowing you to browse, filter, and visualise the results through an intuitive user interface.

shinyCharacteristics(result = result, directory = "path/to/your/shiny")

4.1 Customise the Shiny App

You can customise the title, logo, and theme of the Shiny app by setting the appropriate arguments:

  • title: The title displayed at the top of the app

  • logo: Path to a custom logo (must be in SVG format)

  • theme: A custom Bootstrap theme (e.g., using bslib::bs_theme())

shinyCharacteristics(result = result, directory = "path/to/my/shiny",
                     title = "Characterisation of my data",
                     logo = "path/to/my/logo.svg",
                     theme = "bslib::bs_theme(bootswatch = 'flatly')")

An example of the Shiny application generated by shinyCharacteristics() can be explored here, where the characterisation of several synthetic datasets is available.