---
title: "Custom CovariateData Builder — Eunomia Demo"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Custom CovariateData Builder — Eunomia Demo}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse  = TRUE,
  comment   = "#>",
  eval = FALSE
)
```

This vignette demonstrates how to use **OdysseusCharacterizationModule** as a
custom covariate builder for the
[FeatureExtraction](https://ohdsi.github.io/FeatureExtraction/) package.
When you pass an OCM covariate settings object to
`FeatureExtraction::getDbCovariateData()`, it returns a standard
`CovariateData` (Andromeda) object that plugs directly into
[CohortMethod](https://ohdsi.github.io/CohortMethod/),
[PatientLevelPrediction](https://ohdsi.github.io/PatientLevelPrediction/),
or any other HADES package that consumes covariates.

## Prerequisites

```{r prerequisites}
for (pkg in c("DatabaseConnector", "Eunomia", "Andromeda")) {
  if (!requireNamespace(pkg, quietly = TRUE)) install.packages(pkg)
}

library(OdysseusCharacterizationModule)
library(DatabaseConnector)
library(Eunomia)
```

## 1. Connect to Eunomia

```{r connect}
connectionDetails <- getEunomiaConnectionDetails()
Eunomia::createCohorts(connectionDetails)
connection <- connect(connectionDetails)
```

```{r common-params}
COHORT_ID  <- 1L          # Celecoxib new users
CDM_SCHEMA <- "main"
```


## 2. Create covariate settings

`createOcmCovariateSettings()` accepts the same parameters as
`planAnalysis()` — analysis windows, base features, cohort features, and
concept-set features.  The object it returns carries an attribute `fun` that
tells FeatureExtraction which builder function to call.

```{r settings-basic}
ocmSettings <- createOcmCovariateSettings(
  analysisWindows = defineAnalysisWindows(
    startDays = c(-365),
    endDays   = c(-1)
  ),
  useBaseFeatures = list(
    condition_occurrence = list(include = TRUE, type = "start"),
    drug_exposure        = list(include = TRUE, atc = FALSE),
    condition_era        = list(include = FALSE),
    drug_era             = list(include = FALSE),
    procedure_occurrence = list(include = FALSE),
    observation          = list(include = FALSE),
    device_exposure      = list(include = FALSE),
    visit_occurrence     = list(include = FALSE),
    measurement          = list(include = FALSE)
  )
)

class(ocmSettings)
#> [1] "covariateSettings"

attr(ocmSettings, "fun")
#> [1] "getDbOcmCovariateData"
```


## 3. Standalone usage — call the builder directly

You do not need FeatureExtraction installed to use the builder.
`getDbOcmCovariateData()` runs the full OCM pipeline and assembles a
CovariateData Andromeda object.

```{r standalone}
covData <- getDbOcmCovariateData(
  connection        = connection,
  cdmDatabaseSchema = CDM_SCHEMA,
  cohortTable       = "main.cohort",
  cohortIds         = c(COHORT_ID),
  rowIdField        = "subject_id",
  covariateSettings = ocmSettings,
  aggregated        = FALSE
)
```


### Inspect the covariates table

Each row is a sparse (rowId, covariateId, covariateValue) triple.
`covariateValue` is 1 for binary features.

```{r covariates}
covDf <- covData$covariates |> as.data.frame()
cat("Total covariate entries:", nrow(covDf), "\n")
cat("Unique patients:",        length(unique(covDf$rowId)), "\n")
cat("Unique covariates:",      length(unique(covDf$covariateId)), "\n")
head(covDf)
```


### Inspect the covariate reference

Maps each `covariateId` to a human-readable name, concept ID, and analysis
ID.

```{r covariate-ref}
refDf <- covData$covariateRef |> as.data.frame()
head(refDf, 10)
```


### Inspect the analysis reference

One row per analysis describing the domain, time window, and whether the
covariate is binary.

```{r analysis-ref}
analysisDf <- covData$analysisRef |> as.data.frame()
analysisDf
```

```{r close-standalone}
Andromeda::close(covData)
```


## 4. FeatureExtraction integration

When FeatureExtraction is available, pass `ocmSettings` as
`covariateSettings`.  FeatureExtraction reads the `fun` attribute, calls
`getDbOcmCovariateData()` internally, and returns the result as a standard
`CovariateData` object.

```{r fe-integration}
if (requireNamespace("FeatureExtraction", quietly = TRUE)) {

  covDataFE <- FeatureExtraction::getDbCovariateData(
    connection            = connection,
    cdmDatabaseSchema     = CDM_SCHEMA,
    cohortDatabaseSchema  = CDM_SCHEMA,
    cohortTable           = "cohort",
    cohortIds             = c(COHORT_ID),
    covariateSettings     = ocmSettings,
    aggregated            = FALSE
  )

  cat("Covariates (via FE):", nrow(as.data.frame(covDataFE$covariates)), "\n")
  Andromeda::close(covDataFE)

} else {
  message("FeatureExtraction not installed — skipping integration demo.")
}
```


## 5. Combining with standard FeatureExtraction covariates

A key benefit of the custom builder pattern is that you can combine OCM
covariates with FeatureExtraction's built-in covariates.  Pass a **list** of
settings objects:

```{r combined}
if (requireNamespace("FeatureExtraction", quietly = TRUE)) {

  feSettings <- FeatureExtraction::createCovariateSettings(
    useDemographicsGender    = TRUE,
    useDemographicsAge       = TRUE,
    useDemographicsIndexYear = TRUE
  )

  combinedCovData <- FeatureExtraction::getDbCovariateData(
    connection            = connection,
    cdmDatabaseSchema     = CDM_SCHEMA,
    cohortDatabaseSchema  = CDM_SCHEMA,
    cohortTable           = "cohort",
    cohortIds             = c(COHORT_ID),
    covariateSettings     = list(feSettings, ocmSettings),
    aggregated            = FALSE
  )

  covDf <- as.data.frame(combinedCovData$covariates)
  cat("Total covariate entries (combined):", nrow(covDf), "\n")
  cat("Unique covariates (combined):",      length(unique(covDf$covariateId)), "\n")

  Andromeda::close(combinedCovData)

} else {
  message("FeatureExtraction not installed — skipping combined demo.")
}
```


## 6. Multiple domains and time windows

```{r multi-domain}
ocmSettingsWide <- createOcmCovariateSettings(
  analysisWindows = defineAnalysisWindows(
    startDays = c(-365, -30, 1),
    endDays   = c(-1,   -1, 30)
  ),
  useBaseFeatures = list(
    condition_occurrence = list(include = TRUE, type = "start"),
    drug_exposure        = list(include = TRUE, atc = FALSE),
    procedure_occurrence = list(include = TRUE),
    condition_era        = list(include = FALSE),
    drug_era             = list(include = FALSE),
    observation          = list(include = FALSE),
    device_exposure      = list(include = FALSE),
    visit_occurrence     = list(include = TRUE, type = "start"),
    measurement          = list(include = TRUE)
  )
)

covDataWide <- getDbOcmCovariateData(
  connection        = connection,
  cdmDatabaseSchema = CDM_SCHEMA,
  cohortTable       = "main.cohort",
  cohortIds         = c(COHORT_ID),
  covariateSettings = ocmSettingsWide,
  aggregated        = FALSE
)

cat("Analyses:",   nrow(as.data.frame(covDataWide$analysisRef)), "\n")
cat("Covariates:", nrow(as.data.frame(covDataWide$covariates)), "\n")

Andromeda::close(covDataWide)
```


## 7. Using concept-set features

Define a custom concept set (e.g. hypertension-related conditions) and
extract it as a binary covariate:

```{r concept-set}
ocmConceptSet <- createOcmCovariateSettings(
  analysisWindows = defineAnalysisWindows(
    startDays = c(-365),
    endDays   = c(-1)
  ),
  useBaseFeatures = list(
    condition_occurrence = list(include = FALSE),
    condition_era        = list(include = FALSE),
    drug_exposure        = list(include = FALSE),
    drug_era             = list(include = FALSE),
    procedure_occurrence = list(include = FALSE),
    observation          = list(include = FALSE),
    device_exposure      = list(include = FALSE),
    visit_occurrence     = list(include = FALSE),
    measurement          = list(include = FALSE)
  ),
  useConceptSetFeatures = list(
    include = TRUE,
    type    = "binary",
    conceptSets = list(
      hypertension = list(
        items = list(
          list(
            concept            = list(CONCEPT_ID = 316866L),
            includeDescendants = TRUE,
            includeMapped      = FALSE,
            isExcluded         = FALSE
          )
        ),
        tables = c("condition_occurrence")
      )
    )
  )
)

covDataCS <- getDbOcmCovariateData(
  connection        = connection,
  cdmDatabaseSchema = CDM_SCHEMA,
  cohortTable       = "main.cohort",
  cohortIds         = c(COHORT_ID),
  covariateSettings = ocmConceptSet,
  aggregated        = FALSE
)

cat("Concept-set covariates:", nrow(as.data.frame(covDataCS$covariates)), "\n")
as.data.frame(covDataCS$covariateRef)

Andromeda::close(covDataCS)
```


## Cleanup

```{r disconnect}
disconnect(connection)
```


## Summary

| Function | Purpose |
|----------|---------|
| `createOcmCovariateSettings()` | Configure OCM features as a `covariateSettings` object |
| `getDbOcmCovariateData()` | Execute the pipeline and return a `CovariateData` Andromeda object |
| `FeatureExtraction::getDbCovariateData(..., covariateSettings = ocmSettings)` | Use OCM as a plug-in builder inside FeatureExtraction |

The returned `CovariateData` object contains:

- **`covariates`** — sparse (rowId, covariateId, covariateValue) table
- **`covariateRef`** — covariate metadata (name, concept ID, analysis ID)
- **`analysisRef`** — analysis metadata (domain, time window, binary flag)
