In this vignette, we explore how the OmopSketch function
databaseCharacteristics()
and
shinyCharacteristics()
can serve as a valuable tool for
characterising databases containing electronic health records mapped to
the OMOP Common Data Model.
We begin by loading the necessary packages and creating a mock CDM
using the mockOmopSketch()
function:
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(OmopSketch)
cdm <- mockOmopSketch()
cdm
#>
#> ── # OMOP CDM reference (duckdb) of mockOmopSketch ─────────────────────────────
#> • omop tables: person, observation_period, cdm_source, concept, vocabulary,
#> concept_relationship, concept_synonym, concept_ancestor, drug_strength,
#> condition_occurrence, death, drug_exposure, measurement, observation,
#> procedure_occurrence, visit_occurrence, device_exposure
#> • cohort tables: -
#> • achilles tables: -
#> • other tables: -
The databaseCharacteristics()
function provides a
comprehensive summary of the CDM, returning a summarised
result that includes:
A general database snapshot, using
summariseOmopSnapshot()
A characterisation of the population in observation, built using the CohortConstructor and CohortCharacteristics packages
A summary of the observation period table using
summariseObservationPeriod()
and
summariseInObservation()
A data quality assessment of the clinical tables using
summariseMissingData()
A characterisation of the clinical tables with
summariseClinicalRecords()
and
summariseRecordCount()
result <- databaseCharacteristics(cdm)
By default, the following OMOP tables are included in the characterisation: person, observation_period, visit_occurrence, condition_occurrence, drug_exposure, procedure_occurrence, device_exposure, measurement, observation, death.
You can customise which tables to include in the analysis by
specifying them with the omopTableName
argument.
result <- databaseCharacteristics(cdm, omopTableName = c("drug_exposure", "condition_occurrence"))
To stratify the characterisation results by sex, set the
sex
argument to TRUE
:
result <- databaseCharacteristics(cdm, omopTableName = c("drug_exposure", "condition_occurrence"),
sex = TRUE)
You can choose to characterise the data stratifying by age group by creating a list defining the age groups you want to use.
result <- databaseCharacteristics(cdm, omopTableName = c("drug_exposure", "condition_occurrence"),
ageGroup = list(c(0,50), c(51,100)))
Use the dateRange
argument to limit the analysis to a
specific period. Combine it with the interval
argument to
stratify results by time. Valid values for interval include “overall”
(default), “years”, “quarters”, and “months”:
result <- databaseCharacteristics(cdm,
interval = "years",
dateRange = as.Date(c("2010-01-01", "2018-12-31")))
To include concept counts in the characterisation, set
conceptIdCounts = TRUE
:
result <- databaseCharacteristics(cdm,
conceptIdCounts = TRUE)
To explore the characterisation results interactively, you can use
the shinyCharacteristics()
function. This function
generates a Shiny application in the specified directory
,
allowing you to browse, filter, and visualise the results through an
intuitive user interface.
shinyCharacteristics(result = result, directory = "path/to/your/shiny")
You can customise the title, logo, and theme of the Shiny app by setting the appropriate arguments:
title
: The title displayed at the top of the
app
logo
: Path to a custom logo (must be in SVG
format)
theme
: A custom Bootstrap theme (e.g., using
bslib::bs_theme())
shinyCharacteristics(result = result, directory = "path/to/my/shiny",
title = "Characterisation of my data",
logo = "path/to/my/logo.svg",
theme = "bslib::bs_theme(bootswatch = 'flatly')")
An example of the Shiny application generated by
shinyCharacteristics()
can be explored here,
where the characterisation of several synthetic datasets is
available.