Getting Started with CIMEHR
Introduction
Electronic Health Record (EHR) data are longitudinal where patients
visit at irregular times, and even when a visit occurs, an outcome or
biomarker may not be measured. CIMEHR
(Clinical Informative
Missingness for Electronic
Health Record data) provides methods
for longitudinal EHR analyses where patients visit irregularly and
outcomes may be selectively measured at visits. The primary methods
jointly characterize three linked processes:
- the visiting process: when a patient has an EHR
visit;
- the observation process: whether the
outcome/biomarker is recorded at a visit;
- the longitudinal outcome process: the outcome
trajectory among the target population.
The main estimator is CIMEHR(). Two extensions,
CIMEHR_timevarying_integral() and
CIMEHR_timevarying_ou(), allow more flexible
time-varying/serially correlated observation-process structures.
In addition, the R package CIMEHR provides simple
benchmark methods and semiparametric joint models for analyses affected
by informative visiting and informative observation.
This vignette starts with the data format, then moves from simpler
baseline methods to the more complex CIMEHR estimators. Examples are
kept small (a 500-subject subset of the bundled dataset) so each chunk
runs in a few seconds.
2. Example dataset
The package includes sim_ehr_data, a long-format
simulated EHR cohort designed to look like an outpatient HbA1c extract.
The bundled dataset is one replicate from the package’s Monte Carlo
simulation framework (see Section 8, Simulation study), specifically the
continuous_phi0 scenario (focal Z = NSES, phi = 0). The
columns are listed below. Latent quantities used internally by the
data-generating process (the shared factor, the visiting frailty, and
per-process random effects) are not included, since they are not
observable in real EHR data.
id |
integer |
Subject identifier |
Age |
continuous |
Standardized age |
Gender |
binary |
1 = male, 0 = female |
Marital |
binary |
1 = married, 0 = not |
Black |
binary |
1 = Black race, 0 = otherwise |
Hispanic |
binary |
1 = Hispanic ethnicity, 0 = otherwise |
NSES |
continuous |
Standardized neighborhood median household income |
time |
continuous |
Visit time on [0, 120] months; NA for
subjects with no visits |
R |
binary |
1 if HbA1c was recorded at the visit, 0 otherwise |
log_HbA1c |
continuous |
Natural log of HbA1c; recommended outcome for analysis.
NA when R = 0 |
C |
continuous |
End-of-follow-up time (120 months for all
subjects) |
data("sim_ehr_data")
nrow(sim_ehr_data)
#> [1] 16945
head(sim_ehr_data)
#> id Age Gender Marital Black Hispanic NSES time R log_HbA1c
#> 1 1 0.7868359 1 1 0 1 -0.7088425 19.80177 1 1.749098
#> 2 1 0.7868359 1 1 0 1 -0.7088425 36.48679 1 1.739825
#> 3 1 0.7868359 1 1 0 1 -0.7088425 46.89528 1 1.765329
#> 4 1 0.7868359 1 1 0 1 -0.7088425 50.41513 1 1.804394
#> 5 1 0.7868359 1 1 0 1 -0.7088425 53.87571 0 NA
#> 6 1 0.7868359 1 1 0 1 -0.7088425 58.62913 1 1.759420
#> C
#> 1 120
#> 2 120
#> 3 120
#> 4 120
#> 5 120
#> 6 120
For the rest of this vignette, we work with a 500-subject subset to
keep the executable examples quick:
set.seed(1)
sub_ids <- sample(unique(sim_ehr_data$id), 500)
ehr500 <- sim_ehr_data[sim_ehr_data$id %in% sub_ids, ]
nrow(ehr500)
#> [1] 4497
length(unique(ehr500$id))
#> [1] 500
The true simulation parameters are stored as an attribute and can be
passed to method_comparisons() to populate true-value,
bias, and RMSE columns:
true_params <- attr(sim_ehr_data, "true_params")
true_params$beta
#> intercept Age Gender Marital Black Hispanic NSES
#> -2.00 0.50 0.10 -0.03 0.15 0.05 -0.50
true_params$gamma
#> intercept Age Gender Marital Black Hispanic NSES
#> -2.80 0.50 -0.10 -0.05 0.15 -0.05 0.30
3. Method comparison
This section compares the statistical methods for longitudinal EHR
data implemented in (or referenced by) the package. Methods are
differentiated by whether they account for informative visiting (IP) and
informative observation (IO), the specific scope of the shared latent
structure linking these processes to the outcome, and the adjustment
mechanism applied. Symbols: ✓ = explicitly handled or modeled; ✗ =
ignored or not handled.
Abbreviations: LMM, linear mixed model; VA/OA,
visit-adjusted / observation-adjusted; JMVL, joint modeling of visiting
and longitudinal data; IIRR, inverse intensity rate ratio; MAR, missing
at random; MI, multiple imputation; LI, linear increment.
Methods implemented in CIMEHR
The following methods have function implementations in the
package:
| Summary statistics |
Summary_stat() |
| Standard LMM (Laird and Ware, 1982) |
Linear_mixed_model() |
| Liang (Liang et al., 2009) |
Joint_modeling_visiting_and_longitudinal_Liang() |
| IIRR-Weighting (Bůžková and Lumley, 2007) |
Inverse_intensity_rate_ratio_weighting() |
| IIRR-Balanced (Yiu and Su, 2025) |
Inverse_intensity_rate_ratio_balancing() |
| Pairwise Likelihood (Chen et al., 2015) |
Pairwise_likelihood() |
| Multiple Imputation (Rubin, 1987) |
Multiple_imputation_IP() |
| Linear Increment (Aalen and Gunnes, 2010) |
Linear_increment_IP() |
| EHRJoint (Du et al., 2025) |
EHRJoint() |
| Proposed (Yang, Shi, and Mukherjee, 2026) |
CIMEHR(),
CIMEHR_timevarying_integral(),
CIMEHR_timevarying_ou() |
VA-LMM, OA-LMM, JMVL-G, and Adapted-Liang appear in the comparison
tables below for context but are not implemented as separate functions;
VA-LMM and OA-LMM are available through
Linear_mixed_model() via the visit_adjust
argument.
Outcome-only
| Summary statistics |
✗ |
✗ |
None |
Reduces the longitudinal series to a single scalar
(e.g., min, mean, median, max) for regression. |
— |
| Standard LMM (Laird and Ware, 1982) |
✗ |
✗ |
None |
Conditions on observed covariates (valid only under
MAR). |
lme4 |
| VA-LMM (Goldstein et al., 2016) |
✓ |
✗ |
None |
Includes visit count as a fixed covariate
proxy. |
lme4 |
| OA-LMM |
✗ |
✓ |
None |
Includes observation count as a fixed
covariate proxy. |
lme4 |
IP-only
| Liang (Liang et al., 2009) |
✓ |
✗ |
Visiting & Outcome |
Joint likelihood estimation via shared frailty linking
visiting and outcome. |
IrregLong |
| JMVL-G (Gasparini et al., 2020) |
✓ |
✗ |
Visiting & Outcome |
Joint likelihood modeling inter-visit times via shared
random effects. |
merlin |
| IIRR-Weighting (Bůžková and Lumley, 2007) |
✓ |
✗ |
None |
Inverse intensity weighting based on visiting
intensity. |
— |
| IIRR-Balanced (Yiu and Su, 2025) |
✓ |
✗ |
None |
Inverse intensity weighting adjusted for covariate
balance. |
— |
| Pairwise Likelihood (Chen et al., 2015) |
✓ |
✗ |
None |
Constructs a likelihood from all within-subject pairs
of observations; conditioning on observed visit times eliminates the
visit intensity from the likelihood. |
— |
Imputation + IP-only
| Multiple Imputation (MI) (Rubin, 1987) |
✓ |
✓ |
Determined by IP-only |
Multiple imputation based on posterior draws, followed
by IP-only methods. |
mice |
| Linear Increment Methods (LI) (Aalen and Gunnes,
2010) |
✓ |
✓ |
Determined by IP-only |
Deterministic linear interpolation between observed
visits to impute missing values, followed by IP-only methods. |
slim |
IP+IO
| Adapted-Liang (Liang et al., 2009) |
✓ |
✓ |
Visiting & Outcome |
Joint model structure defined, but observation
coefficients are constrained to zero. |
— |
| EHRJoint (Du et al., 2025) |
✓ |
✓ |
Visiting & Outcome |
Tripartite model; observation process modeled via
covariates only (no latent link). |
— |
| Proposed (this package) |
✓ |
✓ |
All three processes |
Tripartite model; full shared latent structure links
Visiting, Observation, and Outcome. |
CIMEHR |
IP = informative visiting / informative presence adjustment; IO =
informative observation adjustment. The “Proposed” row corresponds to
CIMEHR(); CIMEHR_timevarying_integral() and
CIMEHR_timevarying_ou() extend this framework with more
flexible observation-process modeling.
4. Simpler benchmark methods first
We define a covariate vector once and reuse it. The focal coefficient
for the bundled dataset is NSES.
covars_all <- c("Age", "Gender", "Marital", "Black", "Hispanic", "NSES")
out_form <- log_HbA1c ~ Age + Gender + Marital + Black + Hispanic + NSES + time
4.1 Summary-statistic regression
Summary_stat() is a naive baseline that collapses each
subject’s observed outcomes to a summary statistic and fits a
cross-sectional regression.
fit_ss <- Summary_stat(
ehr500,
outcome = "log_HbA1c",
formula = ~ Age + Gender + Marital + Black + Hispanic + NSES
)
summary(fit_ss)
#>
#> Summary-Statistic Regression -- Summary
#> ============================================================
#>
#> Fixed Effect (Coefficients):
#> ------------------------------------------------------------
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 1.6524 0.0057 291.3986 0.0000
#> Age 0.0214 0.0031 6.8069 0.0000
#> Gender 0.0055 0.0062 0.8802 0.3792
#> Marital -0.0056 0.0064 -0.8751 0.3820
#> Black 0.0145 0.0066 2.2113 0.0275
#> Hispanic 0.0005 0.0075 0.0679 0.9459
#> NSES -0.0183 0.0033 -5.4568 0.0000
4.2 Linear mixed model
Linear_mixed_model() wraps nlme::lme() and
supports standard, observation-adjusted, and visit-adjusted
variants.
fit_lmm <- Linear_mixed_model(
ehr500,
fixed = out_form,
random = ~ 1 | id,
obs_indicator = "R"
)
summary(fit_lmm)
#>
#> Linear Mixed Model -- Summary
#> ============================================================
#>
#> Fixed Effect
#> ------------------------------------------------------------
#> Value Std.Error DF t-value p-value CI.low CI.high
#> (Intercept) 1.6270 0.0058 1386 282.0957 0.0000 1.6157 1.6383
#> Age 0.0218 0.0031 440 7.0345 0.0000 0.0157 0.0279
#> Gender 0.0044 0.0061 440 0.7168 0.4739 -0.0076 0.0163
#> Marital -0.0049 0.0063 440 -0.7897 0.4301 -0.0172 0.0073
#> Black 0.0129 0.0064 440 2.0076 0.0453 0.0003 0.0255
#> Hispanic 0.0006 0.0074 440 0.0822 0.9345 -0.0138 0.0150
#> NSES -0.0191 0.0033 440 -5.7985 0.0000 -0.0255 -0.0126
#> time 0.0004 0.0000 1386 16.4099 0.0000 0.0004 0.0005
A visit-adjusted variant adds the running visit count as a fixed
covariate before observation-indicator filtering:
fit_va <- Linear_mixed_model(
ehr500,
fixed = out_form,
random = ~ 1 | id,
obs_indicator = "R",
visit_adjust = "VA"
)
summary(fit_va)
#>
#> Linear Mixed Model -- Summary
#> ============================================================
#>
#> Fixed Effect
#> ------------------------------------------------------------
#> Value Std.Error DF t-value p-value CI.low CI.high
#> (Intercept) 1.6274 0.0058 1385 282.9958 0.0000 1.6161 1.6387
#> Age 0.0208 0.0032 440 6.5825 0.0000 0.0146 0.0269
#> Gender 0.0047 0.0061 440 0.7818 0.4348 -0.0071 0.0166
#> Marital -0.0047 0.0062 440 -0.7498 0.4538 -0.0169 0.0076
#> Black 0.0128 0.0064 440 2.0029 0.0458 0.0003 0.0254
#> Hispanic 0.0010 0.0073 440 0.1355 0.8923 -0.0134 0.0153
#> NSES -0.0196 0.0033 440 -5.9581 0.0000 -0.0261 -0.0132
#> time 0.0004 0.0000 1385 8.9270 0.0000 0.0003 0.0005
#> n_visits 0.0005 0.0003 1385 1.5831 0.1136 -0.0001 0.0012
4.3 Pairwise likelihood
Pairwise_likelihood() fits a pairwise composite
likelihood model for observed outcomes. The outcome column is read from
y_col, so we set y_col = "log_HbA1c"
explicitly here (the default is "Y").
obs_data <- ehr500[!is.na(ehr500$R) & ehr500$R == 1, ]
fit_pl <- Pairwise_likelihood(
data = obs_data,
formula = out_form,
y_col = "log_HbA1c",
pair_sample = 1000
)
summary(fit_pl)
#>
#> Pairwise Likelihood -- Summary
#> ============================================================
#>
#> Observations: 1834 | Clusters: 447 | Pairs: 1000
#> Converged: TRUE | Log-likelihood: -74.13
#>
#> Fixed Effect (Coefficients)
#> ------------------------------------------------------------
#> Estimate Std.Error CI.low CI.high
#> Age 0.0237 0.0059 0.0121 0.0352
#> Gender -0.0017 0.0113 -0.0238 0.0204
#> Marital -0.0132 0.0122 -0.0372 0.0107
#> Black 0.0159 0.0120 -0.0076 0.0395
#> Hispanic 0.0162 0.0145 -0.0122 0.0446
#> NSES -0.0236 0.0060 -0.0353 -0.0119
#> time 0.0004 0.0002 0.0001 0.0007
5. Visiting-process and joint methods
5.1 Inverse intensity rate ratio weighting
fit_iirr <- Inverse_intensity_rate_ratio_weighting(
ehr500,
outcome = "log_HbA1c",
visit_covs = covars_all
)
summary(fit_iirr)
#>
#> IIRR Weighting Estimator -- Summary
#> ============================================================
#>
#> Note: Standard errors are not available for this method.
#> Consider bootstrap for inference.
#>
#> Visiting Process -- Fixed Effect (gamma)
#> ------------------------------------------------------------
#> Estimate
#> Age 0.2879
#> Gender -0.1568
#> Marital -0.0318
#> Black 0.1063
#> Hispanic -0.2235
#> NSES -0.1132
#>
#> Longitudinal Outcome -- Fixed Effect (beta)
#> ------------------------------------------------------------
#> Estimate
#> (Intercept) 1.6311
#> Age 0.0214
#> Gender 0.0000
#> Marital -0.0095
#> Black 0.0143
#> Hispanic 0.0022
#> NSES -0.0210
#> time 0.0005
5.2 Inverse intensity rate ratio balancing
Inverse_intensity_rate_ratio_balancing() extends IIRR
with covariate-balancing weights. The balance_covs argument
defaults to visit_covs when omitted. The phi
argument controls a sensitivity tuning parameter.
fit_iirr_bal <- Inverse_intensity_rate_ratio_balancing(
ehr500,
outcome = "log_HbA1c",
visit_covs = covars_all
)
#> Note: initial value of fn function contains non-finite values (starting at index=2)
#> Check initial x and/or correctness of function. Check initial x and/or correctness of function. Falling back to Maximum Likelihood Estimation (MLE) weights.
summary(fit_iirr_bal)
#>
#> IIRR Balancing-Weights Estimator -- Summary
#> ============================================================
#>
#> Note: Standard errors are not available for this method.
#> Consider bootstrap for inference.
#>
#> Sensitivity parameter (phi): 0.0000
#>
#> Visiting Process -- Fixed Effect (gamma)
#> ------------------------------------------------------------
#> Estimate
#> Age 0
#> Gender 0
#> Marital 0
#> Black 0
#> Hispanic 0
#> NSES 0
#>
#> Outcome -- Fixed Effect (beta) -- Balancing Weights
#> ------------------------------------------------------------
#> Estimate
#> (Intercept) 1.6326
#> Age 0.0205
#> Gender -0.0003
#> Marital -0.0137
#> Black 0.0132
#> Hispanic 0.0048
#> NSES -0.0204
#> time 0.0005
#>
#> Outcome -- Fixed Effect (beta) -- MLE Weights
#> ------------------------------------------------------------
#> Estimate
#> (Intercept) 1.6326
#> Age 0.0205
#> Gender -0.0003
#> Marital -0.0137
#> Black 0.0132
#> Hispanic 0.0048
#> NSES -0.0204
#> time 0.0005
5.3 Liang–Lu–Ying joint visiting/outcome model
Joint_modeling_visiting_and_longitudinal_Liang() fits a
joint model for the visiting process and the longitudinal outcome,
returning model-based standard errors for the visiting stage and a
clustered sandwich estimator for the outcome stage (conditional on the
Stage 1 nuisance estimates).
fit_liang <- Joint_modeling_visiting_and_longitudinal_Liang(
ehr500,
outcome = "log_HbA1c",
visit_covs = covars_all,
long_covs = covars_all,
random_covs = "NSES"
)
summary(fit_liang)
#>
#> JMVL (Liang) Joint Model -- Summary
#> ============================================================
#>
#> Standard errors: model-based (visiting) and clustered sandwich (outcome).
#> The outcome sandwich is conditional on the Stage 1 nuisance estimates.
#>
#> Visiting Process -- Fixed Effect (gamma)
#> ------------------------------------------------------------
#> Estimate Std.Error CI.low CI.high
#> Age 0.2879 0.0245 0.2398 0.3360
#> Gender -0.1568 0.0468 -0.2486 -0.0650
#> Marital -0.0318 0.0482 -0.1263 0.0628
#> Black 0.1063 0.0485 0.0112 0.2014
#> Hispanic -0.2235 0.0595 -0.3402 -0.1069
#> NSES -0.1132 0.0245 -0.1613 -0.0652
#>
#> Frailty variance: 0.2331
#>
#> Longitudinal Outcome -- Fixed Effect (beta)
#> ------------------------------------------------------------
#> Estimate Std.Error CI.low CI.high
#> Age 0.0363 0.0968 -0.1535 0.2261
#> Gender -0.0109 0.1953 -0.3936 0.3718
#> Marital -0.0182 0.2022 -0.4144 0.3781
#> Black 0.0209 0.2011 -0.3734 0.4151
#> Hispanic 0.0074 0.2298 -0.4429 0.4577
#> NSES -0.0134 0.0960 -0.2015 0.1746
#>
#> Longitudinal Outcome -- Random Link (theta)
#> ------------------------------------------------------------
#> Estimate Std.Error CI.low CI.high
#> Intercept 0.0386 0.3162 -0.5811 0.6583
#> NSES -0.0032 0.3501 -0.6893 0.6829
5.4 EHRJoint
EHRJoint() is a tripartite joint model. Its observation
process uses a Generalized Linear Model (GLM) with a probit link by
default (set obs_link = "logit" to switch). The fitted
object returns model-based standard errors for the visiting and
observation stages and a clustered sandwich estimator for the outcome
stage.
fit_ej <- EHRJoint(
ehr500,
outcome = "log_HbA1c",
visit_covs = covars_all,
long_covs = covars_all,
random_covs = "NSES"
)
summary(fit_ej)
#>
#> EHRJoint Model -- Summary
#> ============================================================
#>
#> Model note: visiting process uses a visit-count/frailty-compensation component.
#> Observation process uses a Generalized Linear Model (GLM) with a probit link.
#> Standard errors: model-based (visiting, observation) and clustered sandwich (outcome).
#>
#> Visiting Process -- Fixed Effect (gamma)
#> ------------------------------------------------------------
#> Estimate Std.Error CI.low CI.high
#> Age 0.4469 0.0160 0.4155 0.4783
#> Gender -0.1552 0.0301 -0.2142 -0.0962
#> Marital -0.0739 0.0310 -0.1347 -0.0131
#> Black 0.0731 0.0318 0.0107 0.1354
#> Hispanic -0.1013 0.0361 -0.1720 -0.0306
#> NSES 0.2861 0.0150 0.2567 0.3154
#>
#> Frailty variance: 0.2564
#>
#> Observation Process -- Fixed Effect (alpha)
#> ------------------------------------------------------------
#> Estimate Std.Error CI.low CI.high
#> (Intercept) -0.0406 0.0377 -0.1144 0.0333
#> Age -0.1599 0.0218 -0.2026 -0.1173
#> Gender 0.0171 0.0405 -0.0622 0.0964
#> Marital 0.0412 0.0417 -0.0405 0.1229
#> Black 0.1058 0.0423 0.0229 0.1888
#> Hispanic -0.1295 0.0487 -0.2249 -0.0340
#> NSES -0.5511 0.0222 -0.5947 -0.5076
#>
#> Longitudinal Outcome -- Fixed Effect (beta)
#> ------------------------------------------------------------
#> Estimate Std.Error CI.low CI.high
#> Age -0.0090 0.1150 -0.2344 0.2164
#> Gender -0.0418 0.2215 -0.4759 0.3923
#> Marital -0.0759 0.2316 -0.5299 0.3781
#> Black 0.0782 0.2260 -0.3649 0.5212
#> Hispanic 0.1220 0.2706 -0.4083 0.6524
#> NSES 0.0248 0.1000 -0.1712 0.2208
#>
#> Longitudinal Outcome -- Random Link (theta)
#> ------------------------------------------------------------
#> Estimate Std.Error CI.low CI.high
#> (Intercept) -0.6611 0.4011 -1.4473 0.1251
#> NSES -0.9458 0.3824 -1.6953 -0.1964
6. Primary CIMEHR methods
6.1 Base CIMEHR
CIMEHR() jointly models informative visiting,
informative observation, and the longitudinal outcome. Its visiting
process uses Andersen–Gill partial likelihood with log-normal frailty,
and its observation process uses a probit model with latent-random-link
random effects when specified.
fit_cimehr <- CIMEHR(
data = ehr500,
y_col = "log_HbA1c",
covars_visit_XV = covars_all,
covars_outcome_fixed_XY = covars_all,
covars_outcome_random_link_ZY = "NSES",
covars_obs_fixed_XO = covars_all,
covars_obs_random_link_ZO = "NSES"
)
print(fit_cimehr) # Print function only prinnts the outcome process estimates
#>
#> CIMEHR -- Outcome Fixed Effect (beta)
#> ==================================================
#> Estimate Std.Error CI.low CI.high
#> Age 0.0182 0.0984 -0.1747 0.2111
#> Gender 0.0040 0.1954 -0.3790 0.3869
#> Marital -0.0034 0.2040 -0.4031 0.3964
#> Black 0.0096 0.2038 -0.3899 0.4091
#> Hispanic 0.0050 0.2320 -0.4497 0.4597
#> NSES -0.0135 0.1027 -0.2147 0.1877
#>
#> Use summary() for all process estimates, or
#> summary(x, specify = "visiting" / "observation" / "outcome") for one layer.
summary(fit_cimehr)
#>
#> CIMEHR Joint Model -- Summary
#> ============================================================
#>
#> Model note: visiting process uses Andersen-Gill partial likelihood with log-normal frailty where estimated.
#> Observation process uses a probit link; the OU variant adds OU serial correlation via PCL.
#>
#> Visiting Process -- Fixed Effect (gamma)
#> ------------------------------------------------------------
#> Estimate Std.Error CI.low CI.high
#> Age 0.4469 0.0306 0.3870 0.5068
#> Gender -0.1552 0.0602 -0.2733 -0.0371
#> Marital -0.0739 0.0627 -0.1968 0.0490
#> Black 0.0731 0.0653 -0.0550 0.2011
#> Hispanic -0.1013 0.0779 -0.2539 0.0514
#> NSES 0.2861 0.0305 0.2264 0.3458
#>
#> Frailty variance (sigma^2_zeta): 0.2283
#>
#> Observation Process -- Fixed Effect (alpha)
#> ------------------------------------------------------------
#> Estimate Std.Error CI.low CI.high
#> (Intercept) 0.1167 0.0872 -0.0543 0.2876
#> Age -0.2577 0.0469 -0.3496 -0.1658
#> Gender -0.0118 0.0917 -0.1915 0.1678
#> Marital 0.0423 0.0996 -0.1529 0.2375
#> Black 0.1705 0.1032 -0.0318 0.3727
#> Hispanic -0.1320 0.1163 -0.3598 0.0959
#> NSES -0.6846 0.0450 -0.7727 -0.5964
#>
#> Observation Process -- Random Link (delta)
#> ------------------------------------------------------------
#> Estimate Std.Error CI.low CI.high
#> (Intercept) -0.3224 0.0700 -0.4597 -0.1851
#> NSES -0.2488 0.0755 -0.3969 -0.1008
#>
#> Random-effect SD (Sigma_q):
#> (Intercept) NSES
#> 0.8768 0.0000
#>
#> Longitudinal Outcome -- Fixed Effect (beta)
#> ------------------------------------------------------------
#> Estimate Std.Error CI.low CI.high
#> Age 0.0182 0.0984 -0.1747 0.2111
#> Gender 0.0040 0.1954 -0.3790 0.3869
#> Marital -0.0034 0.2040 -0.4031 0.3964
#> Black 0.0096 0.2038 -0.3899 0.4091
#> Hispanic 0.0050 0.2320 -0.4497 0.4597
#> NSES -0.0135 0.1027 -0.2147 0.1877
#>
#> Longitudinal Outcome -- Random Link (theta)
#> ------------------------------------------------------------
#> Estimate Std.Error CI.low CI.high
#> Intercept 0.0187 0.1395 -0.2547 0.2921
#> NSES 0.0054 0.1512 -0.2908 0.3017
6.1.1 Stage-specific summaries and coefficient extraction
You can print one process at a time and extract a particular
coefficient programmatically:
summary_visiting(fit_cimehr)
#>
#> CIMEHR Joint Model -- Summary
#> ============================================================
#>
#> Model note: visiting process uses Andersen-Gill partial likelihood with log-normal frailty where estimated.
#> Observation process uses a probit link; the OU variant adds OU serial correlation via PCL.
#> Layer requested: visiting
#>
#> Visiting Process -- Fixed Effect (gamma)
#> ------------------------------------------------------------
#> Estimate Std.Error CI.low CI.high
#> Age 0.4469 0.0306 0.3870 0.5068
#> Gender -0.1552 0.0602 -0.2733 -0.0371
#> Marital -0.0739 0.0627 -0.1968 0.0490
#> Black 0.0731 0.0653 -0.0550 0.2011
#> Hispanic -0.1013 0.0779 -0.2539 0.0514
#> NSES 0.2861 0.0305 0.2264 0.3458
#>
#> Frailty variance (sigma^2_zeta): 0.2283
summary_observation(fit_cimehr)
#>
#> CIMEHR Joint Model -- Summary
#> ============================================================
#>
#> Model note: visiting process uses Andersen-Gill partial likelihood with log-normal frailty where estimated.
#> Observation process uses a probit link; the OU variant adds OU serial correlation via PCL.
#> Layer requested: observation
#>
#> Observation Process -- Fixed Effect (alpha)
#> ------------------------------------------------------------
#> Estimate Std.Error CI.low CI.high
#> (Intercept) 0.1167 0.0872 -0.0543 0.2876
#> Age -0.2577 0.0469 -0.3496 -0.1658
#> Gender -0.0118 0.0917 -0.1915 0.1678
#> Marital 0.0423 0.0996 -0.1529 0.2375
#> Black 0.1705 0.1032 -0.0318 0.3727
#> Hispanic -0.1320 0.1163 -0.3598 0.0959
#> NSES -0.6846 0.0450 -0.7727 -0.5964
#>
#> Observation Process -- Random Link (delta)
#> ------------------------------------------------------------
#> Estimate Std.Error CI.low CI.high
#> (Intercept) -0.3224 0.0700 -0.4597 -0.1851
#> NSES -0.2488 0.0755 -0.3969 -0.1008
#>
#> Random-effect SD (Sigma_q):
#> (Intercept) NSES
#> 0.8768 0.0000
summary_outcome(fit_cimehr)
#>
#> CIMEHR Joint Model -- Summary
#> ============================================================
#>
#> Model note: visiting process uses Andersen-Gill partial likelihood with log-normal frailty where estimated.
#> Observation process uses a probit link; the OU variant adds OU serial correlation via PCL.
#> Layer requested: outcome
#>
#> Longitudinal Outcome -- Fixed Effect (beta)
#> ------------------------------------------------------------
#> Estimate Std.Error CI.low CI.high
#> Age 0.0182 0.0984 -0.1747 0.2111
#> Gender 0.0040 0.1954 -0.3790 0.3869
#> Marital -0.0034 0.2040 -0.4031 0.3964
#> Black 0.0096 0.2038 -0.3899 0.4091
#> Hispanic 0.0050 0.2320 -0.4497 0.4597
#> NSES -0.0135 0.1027 -0.2147 0.1877
#>
#> Longitudinal Outcome -- Random Link (theta)
#> ------------------------------------------------------------
#> Estimate Std.Error CI.low CI.high
#> Intercept 0.0187 0.1395 -0.2547 0.2921
#> NSES 0.0054 0.1512 -0.2908 0.3017
extract_coefficient(fit_cimehr, stage = "outcome", parameter = "NSES")
#> NSES
#> -0.01349089
extract_coefficient(fit_cimehr, stage = "visiting", parameter = "NSES")
#> NSES
#> 0.2860714
extract_coefficient(fit_cimehr, stage = "observation", parameter = "NSES")
#> NSES
#> -0.6845686
6.2 CIMEHR with Gauss–Hermite quadrature
CIMEHR_timevarying_integral() uses Gauss–Hermite
quadrature and online filtering for a more flexible observation process.
Analytic standard errors are not currently exposed for this variant, so
bootstrap inference is recommended (see Section 9).
fit_integral <- CIMEHR_timevarying_integral(
data = ehr500,
y_col = "log_HbA1c",
covars_visit_XV = covars_all,
covars_outcome_fixed_XY = covars_all,
covars_outcome_random_link_ZY = "NSES",
covars_obs_fixed_XO = covars_all,
covars_obs_random_link_ZO = "NSES",
gh_points_U = 7,
gh_points_q = 5
)
6.3 CIMEHR with OU pairwise composite likelihood
CIMEHR_timevarying_ou() models serial correlation in the
observation process using an Ornstein–Uhlenbeck process and pairwise
composite likelihood.
fit_ou <- CIMEHR_timevarying_ou(
data = ehr500,
y_col = "log_HbA1c",
covars_visit_XV = covars_all,
covars_outcome_fixed_XY = covars_all,
covars_outcome_random_link_ZY = "NSES",
covars_obs_fixed_XO = covars_all,
covars_obs_random_link_ZO = "NSES",
pair_type = "adjacent"
)
7. Comparing methods with method_comparisons()
The helper method_comparisons() fits any user-specified
combination of methods and returns separate tables for the visiting,
observation, and outcome processes. If a method does not estimate a
given process, that entry is shown as NA.
available_comparison_methods()
#> [1] "CIMEHR"
#> [2] "CIMEHR_timevarying_integral"
#> [3] "CIMEHR_timevarying_ou"
#> [4] "EHRJoint"
#> [5] "Inverse_intensity_rate_ratio_balancing"
#> [6] "Inverse_intensity_rate_ratio_weighting"
#> [7] "Joint_modeling_visiting_and_longitudinal_Liang"
#> [8] "Linear_increment"
#> [9] "Linear_mixed_model"
#> [10] "Multiple_imputation"
#> [11] "Pairwise_likelihood"
#> [12] "Summary_stat"
The full call below fits five methods at once on the 500-subject
subset. It takes a minute or two to run; if you are building the
vignette and want to skip the runtime, change eval = TRUE
to eval = FALSE in the chunk header.
cmp_subset <- method_comparisons(
data = ehr500,
methods = c("Summary_stat", "Linear_mixed_model",
"Inverse_intensity_rate_ratio_weighting",
"Joint_modeling_visiting_and_longitudinal_Liang",
"CIMEHR"),
method_args = list(
Summary_stat = list(
outcome = "log_HbA1c",
formula = ~ Age + Gender + Marital + Black + Hispanic + NSES
),
Linear_mixed_model = list(
fixed = out_form, random = ~ 1 | id, obs_indicator = "R"
),
Inverse_intensity_rate_ratio_weighting = list(
outcome = "log_HbA1c", visit_covs = covars_all
),
Joint_modeling_visiting_and_longitudinal_Liang = list(
outcome = "log_HbA1c",
visit_covs = covars_all, long_covs = covars_all, random_covs = "NSES"
),
CIMEHR = list(
y_col = "log_HbA1c",
covars_visit_XV = covars_all,
covars_outcome_fixed_XY = covars_all,
covars_outcome_random_link_ZY = "NSES",
covars_obs_fixed_XO = covars_all,
covars_obs_random_link_ZO = "NSES"
)
),
printSE = TRUE,
print95CI = TRUE,
true_value_verbose = TRUE
)
print(cmp_subset)
#>
#> ---------------------------------
#> visiting process
#> ---------------------------------
#> Method Parameter of interest Estimate
#> Summary_stat <NA> NA
#> Linear_mixed_model <NA> NA
#> Inverse_intensity_rate_ratio_weighting Age 0.2879
#> Inverse_intensity_rate_ratio_weighting Gender -0.1568
#> Inverse_intensity_rate_ratio_weighting Marital -0.0318
#> Inverse_intensity_rate_ratio_weighting Black 0.1063
#> Inverse_intensity_rate_ratio_weighting Hispanic -0.2235
#> Inverse_intensity_rate_ratio_weighting NSES -0.1132
#> Joint_modeling_visiting_and_longitudinal_Liang Age 0.2879
#> Joint_modeling_visiting_and_longitudinal_Liang Gender -0.1568
#> Joint_modeling_visiting_and_longitudinal_Liang Marital -0.0318
#> Joint_modeling_visiting_and_longitudinal_Liang Black 0.1063
#> Joint_modeling_visiting_and_longitudinal_Liang Hispanic -0.2235
#> Joint_modeling_visiting_and_longitudinal_Liang NSES -0.1132
#> CIMEHR Age 0.4469
#> CIMEHR Gender -0.1552
#> CIMEHR Marital -0.0739
#> CIMEHR Black 0.0731
#> CIMEHR Hispanic -0.1013
#> CIMEHR NSES 0.2861
#> SE 95% CI True value Bias RMSE
#> NA <NA> NA NA NA
#> NA <NA> NA NA NA
#> NA <NA> 0.50 -0.2121 0.2121
#> NA <NA> -0.10 -0.0568 0.0568
#> NA <NA> -0.05 0.0182 0.0182
#> NA <NA> 0.15 -0.0437 0.0437
#> NA <NA> -0.05 -0.1735 0.1735
#> NA <NA> 0.30 -0.4132 0.4132
#> 0.0245 (0.2398, 0.336) 0.50 -0.2121 0.2121
#> 0.0468 (-0.2486, -0.065) -0.10 -0.0568 0.0568
#> 0.0482 (-0.1263, 0.0628) -0.05 0.0182 0.0182
#> 0.0485 (0.0112, 0.2014) 0.15 -0.0437 0.0437
#> 0.0595 (-0.3402, -0.1069) -0.05 -0.1735 0.1735
#> 0.0245 (-0.1613, -0.0652) 0.30 -0.4132 0.4132
#> 0.0306 (0.387, 0.5068) 0.50 -0.0531 0.0531
#> 0.0602 (-0.2733, -0.0371) -0.10 -0.0552 0.0552
#> 0.0627 (-0.1968, 0.049) -0.05 -0.0239 0.0239
#> 0.0653 (-0.055, 0.2011) 0.15 -0.0769 0.0769
#> 0.0779 (-0.2539, 0.0514) -0.05 -0.0513 0.0513
#> 0.0305 (0.2264, 0.3458) 0.30 -0.0139 0.0139
#>
#> ---------------------------------
#> observation process
#> ---------------------------------
#> Method Parameter of interest Estimate
#> Summary_stat <NA> NA
#> Linear_mixed_model <NA> NA
#> Inverse_intensity_rate_ratio_weighting <NA> NA
#> Joint_modeling_visiting_and_longitudinal_Liang <NA> NA
#> CIMEHR (Intercept) 0.1167
#> CIMEHR Age -0.2577
#> CIMEHR Gender -0.0118
#> CIMEHR Marital 0.0423
#> CIMEHR Black 0.1705
#> CIMEHR Hispanic -0.1320
#> CIMEHR NSES -0.6846
#> CIMEHR (Intercept) -0.3224
#> CIMEHR NSES -0.2488
#> SE 95% CI True value Bias RMSE
#> NA <NA> NA NA NA
#> NA <NA> NA NA NA
#> NA <NA> NA NA NA
#> NA <NA> NA NA NA
#> 0.0872 (-0.0543, 0.2876) NA NA NA
#> 0.0469 (-0.3496, -0.1658) -0.30 0.0423 0.0423
#> 0.0917 (-0.1915, 0.1678) -0.10 0.0882 0.0882
#> 0.0996 (-0.1529, 0.2375) 0.00 0.0423 0.0423
#> 0.1032 (-0.0318, 0.3727) 0.05 0.1205 0.1205
#> 0.1163 (-0.3598, 0.0959) -0.05 -0.0820 0.0820
#> 0.0450 (-0.7727, -0.5964) -0.60 -0.0846 0.0846
#> NA <NA> NA NA NA
#> NA <NA> NA NA NA
#>
#> ---------------------------------
#> outcome process
#> ---------------------------------
#> Method Parameter of interest Estimate
#> Summary_stat (Intercept) 1.6524
#> Summary_stat Age 0.0214
#> Summary_stat Gender 0.0055
#> Summary_stat Marital -0.0056
#> Summary_stat Black 0.0145
#> Summary_stat Hispanic 0.0005
#> Summary_stat NSES -0.0183
#> Linear_mixed_model (Intercept) 1.6270
#> Linear_mixed_model Age 0.0218
#> Linear_mixed_model Gender 0.0044
#> Linear_mixed_model Marital -0.0049
#> Linear_mixed_model Black 0.0129
#> Linear_mixed_model Hispanic 0.0006
#> Linear_mixed_model NSES -0.0191
#> Linear_mixed_model time 0.0004
#> Inverse_intensity_rate_ratio_weighting (Intercept) 1.6311
#> Inverse_intensity_rate_ratio_weighting Age 0.0214
#> Inverse_intensity_rate_ratio_weighting Gender 0.0000
#> Inverse_intensity_rate_ratio_weighting Marital -0.0095
#> Inverse_intensity_rate_ratio_weighting Black 0.0143
#> Inverse_intensity_rate_ratio_weighting Hispanic 0.0022
#> Inverse_intensity_rate_ratio_weighting NSES -0.0210
#> Inverse_intensity_rate_ratio_weighting time 0.0005
#> Joint_modeling_visiting_and_longitudinal_Liang Age 0.0363
#> Joint_modeling_visiting_and_longitudinal_Liang Gender -0.0109
#> Joint_modeling_visiting_and_longitudinal_Liang Marital -0.0182
#> Joint_modeling_visiting_and_longitudinal_Liang Black 0.0209
#> Joint_modeling_visiting_and_longitudinal_Liang Hispanic 0.0074
#> Joint_modeling_visiting_and_longitudinal_Liang NSES -0.0134
#> CIMEHR Age 0.0182
#> CIMEHR Gender 0.0040
#> CIMEHR Marital -0.0034
#> CIMEHR Black 0.0096
#> CIMEHR Hispanic 0.0050
#> CIMEHR NSES -0.0135
#> CIMEHR Intercept 0.0187
#> CIMEHR NSES 0.0054
#> SE 95% CI True value Bias RMSE
#> 0.0057 (1.6413, 1.6635) NA NA NA
#> 0.0031 (0.0152, 0.0276) 0.50 -0.4786 0.4786
#> 0.0062 (-0.0067, 0.0176) 0.10 -0.0945 0.0945
#> 0.0064 (-0.0181, 0.0069) -0.03 0.0244 0.0244
#> 0.0066 (0.0017, 0.0274) 0.15 -0.1355 0.1355
#> 0.0075 (-0.0141, 0.0151) 0.05 -0.0495 0.0495
#> 0.0033 (-0.0248, -0.0117) -0.50 0.4817 0.4817
#> 0.0058 (1.6157, 1.6383) NA NA NA
#> 0.0031 (0.0157, 0.0279) 0.50 -0.4782 0.4782
#> 0.0061 (-0.0076, 0.0163) 0.10 -0.0956 0.0956
#> 0.0063 (-0.0172, 0.0073) -0.03 0.0251 0.0251
#> 0.0064 (3e-04, 0.0255) 0.15 -0.1371 0.1371
#> 0.0074 (-0.0138, 0.015) 0.05 -0.0494 0.0494
#> 0.0033 (-0.0255, -0.0126) -0.50 0.4809 0.4809
#> 0.0000 (4e-04, 5e-04) NA NA NA
#> NA <NA> NA NA NA
#> NA <NA> 0.50 -0.4786 0.4786
#> NA <NA> 0.10 -0.1000 0.1000
#> NA <NA> -0.03 0.0205 0.0205
#> NA <NA> 0.15 -0.1357 0.1357
#> NA <NA> 0.05 -0.0478 0.0478
#> NA <NA> -0.50 0.4790 0.4790
#> NA <NA> NA NA NA
#> 0.0968 (-0.1535, 0.2261) 0.50 -0.4637 0.4637
#> 0.1953 (-0.3936, 0.3718) 0.10 -0.1109 0.1109
#> 0.2022 (-0.4144, 0.3781) -0.03 0.0118 0.0118
#> 0.2011 (-0.3734, 0.4151) 0.15 -0.1291 0.1291
#> 0.2298 (-0.4429, 0.4577) 0.05 -0.0426 0.0426
#> 0.0960 (-0.2015, 0.1746) -0.50 0.4866 0.4866
#> 0.0984 (-0.1747, 0.2111) 0.50 -0.4818 0.4818
#> 0.1954 (-0.379, 0.3869) 0.10 -0.0960 0.0960
#> 0.2040 (-0.4031, 0.3964) -0.03 0.0266 0.0266
#> 0.2038 (-0.3899, 0.4091) 0.15 -0.1404 0.1404
#> 0.2320 (-0.4497, 0.4597) 0.05 -0.0450 0.0450
#> 0.1027 (-0.2147, 0.1877) -0.50 0.4865 0.4865
#> 0.1395 (-0.2547, 0.2921) NA NA NA
#> NA <NA> NA NA NA
cmp_subset$tables$outcome
#> Method Parameter of interest
#> 1 Summary_stat (Intercept)
#> 2 Summary_stat Age
#> 3 Summary_stat Gender
#> 4 Summary_stat Marital
#> 5 Summary_stat Black
#> 6 Summary_stat Hispanic
#> 7 Summary_stat NSES
#> 8 Linear_mixed_model (Intercept)
#> 9 Linear_mixed_model Age
#> 10 Linear_mixed_model Gender
#> 11 Linear_mixed_model Marital
#> 12 Linear_mixed_model Black
#> 13 Linear_mixed_model Hispanic
#> 14 Linear_mixed_model NSES
#> 15 Linear_mixed_model time
#> 16 Inverse_intensity_rate_ratio_weighting (Intercept)
#> 17 Inverse_intensity_rate_ratio_weighting Age
#> 18 Inverse_intensity_rate_ratio_weighting Gender
#> 19 Inverse_intensity_rate_ratio_weighting Marital
#> 20 Inverse_intensity_rate_ratio_weighting Black
#> 21 Inverse_intensity_rate_ratio_weighting Hispanic
#> 22 Inverse_intensity_rate_ratio_weighting NSES
#> 23 Inverse_intensity_rate_ratio_weighting time
#> 24 Joint_modeling_visiting_and_longitudinal_Liang Age
#> 25 Joint_modeling_visiting_and_longitudinal_Liang Gender
#> 26 Joint_modeling_visiting_and_longitudinal_Liang Marital
#> 27 Joint_modeling_visiting_and_longitudinal_Liang Black
#> 28 Joint_modeling_visiting_and_longitudinal_Liang Hispanic
#> 29 Joint_modeling_visiting_and_longitudinal_Liang NSES
#> 30 CIMEHR Age
#> 31 CIMEHR Gender
#> 32 CIMEHR Marital
#> 33 CIMEHR Black
#> 34 CIMEHR Hispanic
#> 35 CIMEHR NSES
#> 36 CIMEHR Intercept
#> 37 CIMEHR NSES
#> Estimate SE 95% CI True value Bias
#> 1 1.652435e+00 0.0056707035 (1.6413, 1.6635) NA NA
#> 2 2.140045e-02 0.0031439222 (0.0152, 0.0276) 0.50 -0.47859955
#> 3 5.459117e-03 0.0062018680 (-0.0067, 0.0176) 0.10 -0.09454088
#> 4 -5.582944e-03 0.0063798400 (-0.0181, 0.0069) -0.03 0.02441706
#> 5 1.452324e-02 0.0065677511 (0.0017, 0.0274) 0.15 -0.13547676
#> 6 5.072482e-04 0.0074696992 (-0.0141, 0.0151) 0.05 -0.04949275
#> 7 -1.827476e-02 0.0033489993 (-0.0248, -0.0117) -0.50 0.48172524
#> 8 1.627023e+00 0.0057676288 (1.6157, 1.6383) NA NA
#> 9 2.179495e-02 0.0030983099 (0.0157, 0.0279) 0.50 -0.47820505
#> 10 4.361431e-03 0.0060842291 (-0.0076, 0.0163) 0.10 -0.09563857
#> 11 -4.946489e-03 0.0062635374 (-0.0172, 0.0073) -0.03 0.02505351
#> 12 1.290570e-02 0.0064283184 (3e-04, 0.0255) 0.15 -0.13709430
#> 13 6.044320e-04 0.0073507684 (-0.0138, 0.015) 0.05 -0.04939557
#> 14 -1.907900e-02 0.0032903401 (-0.0255, -0.0126) -0.50 0.48092100
#> 15 4.431722e-04 0.0000270064 (4e-04, 5e-04) NA NA
#> 16 1.631125e+00 NA <NA> NA NA
#> 17 2.137436e-02 NA <NA> 0.50 -0.47862564
#> 18 -2.736955e-05 NA <NA> 0.10 -0.10002737
#> 19 -9.474384e-03 NA <NA> -0.03 0.02052562
#> 20 1.428874e-02 NA <NA> 0.15 -0.13571126
#> 21 2.197782e-03 NA <NA> 0.05 -0.04780222
#> 22 -2.097999e-02 NA <NA> -0.50 0.47902001
#> 23 4.634290e-04 NA <NA> NA NA
#> 24 3.630713e-02 0.0968361196 (-0.1535, 0.2261) 0.50 -0.46369287
#> 25 -1.089798e-02 0.1952628475 (-0.3936, 0.3718) 0.10 -0.11089798
#> 26 -1.815188e-02 0.2021551009 (-0.4144, 0.3781) -0.03 0.01184812
#> 27 2.085107e-02 0.2011319982 (-0.3734, 0.4151) 0.15 -0.12914893
#> 28 7.409450e-03 0.2297644846 (-0.4429, 0.4577) 0.05 -0.04259055
#> 29 -1.344890e-02 0.0959530255 (-0.2015, 0.1746) -0.50 0.48655110
#> 30 1.818674e-02 0.0984255929 (-0.1747, 0.2111) 0.50 -0.48181326
#> 31 3.986490e-03 0.1953825128 (-0.379, 0.3869) 0.10 -0.09601351
#> 32 -3.362864e-03 0.2039689000 (-0.4031, 0.3964) -0.03 0.02663714
#> 33 9.584645e-03 0.2038442431 (-0.3899, 0.4091) 0.15 -0.14041536
#> 34 5.007985e-03 0.2319961318 (-0.4497, 0.4597) 0.05 -0.04499202
#> 35 -1.349089e-02 0.1026605627 (-0.2147, 0.1877) -0.50 0.48650911
#> 36 1.871638e-02 0.1394998001 (-0.2547, 0.2921) NA NA
#> 37 5.422782e-03 NA <NA> NA NA
#> RMSE
#> 1 NA
#> 2 0.47859955
#> 3 0.09454088
#> 4 0.02441706
#> 5 0.13547676
#> 6 0.04949275
#> 7 0.48172524
#> 8 NA
#> 9 0.47820505
#> 10 0.09563857
#> 11 0.02505351
#> 12 0.13709430
#> 13 0.04939557
#> 14 0.48092100
#> 15 NA
#> 16 NA
#> 17 0.47862564
#> 18 0.10002737
#> 19 0.02052562
#> 20 0.13571126
#> 21 0.04780222
#> 22 0.47902001
#> 23 NA
#> 24 0.46369287
#> 25 0.11089798
#> 26 0.01184812
#> 27 0.12914893
#> 28 0.04259055
#> 29 0.48655110
#> 30 0.48181326
#> 31 0.09601351
#> 32 0.02663714
#> 33 0.14041536
#> 34 0.04499202
#> 35 0.48650911
#> 36 NA
#> 37 NA
If a method fails, method_comparisons() keeps going by
default, stores the error in cmp$errors, and prints
NA rows for that method.
8. Simulation study
A Monte Carlo simulation study is provided in
simulations/simulation_study.R. The script crosses two
focal-covariate types (continuous NSES and binary
Gender) with four values of the Ornstein-Uhlenbeck
serial-correlation parameter (phi \(\in\) {0, 0.001, 0.1, 0.3}), producing
eight scenarios. Each scenario runs N_SIM = 1000 Monte
Carlo replications; each replication generates a fresh dataset of
n = 2000 participants on the [0, 120] window via
sim_data_gen() and fits four methods:
CIMEHR()
Joint_modeling_visiting_and_longitudinal_Liang()
Inverse_intensity_rate_ratio_weighting()
IrregLong::iiwgee() (external comparator)
For each method × scenario × parameter (beta on each of Age, Gender,
Marital, Black, Hispanic, NSES), the script reports bias, empirical SE,
mean analytic SE, SE ratio, RMSE, and 95% CI coverage. Coverage is
reported as --- for
Inverse_intensity_rate_ratio_weighting() because that
method does not return analytic standard errors; bootstrap inference
would be needed there.
The bundled sim_ehr_data used throughout this vignette
is itself one replicate from this framework (the
continuous_phi0 scenario, sim_id = 1), so the
data-generating spec used by the study and the spec of the example
dataset are identical aside from phi and the focal
covariate.
To run the study:
Rscript simulations/simulation_study.R
Rscript simulations/simulation_study.R --cores 8 --nsim 100
Rscript simulations/simulation_study.R --scenarios continuous_phi0,binary_phi0.3
9. Bootstrap inference
CIMEHR(), CIMEHR_timevarying_ou(),
EHRJoint(), and
Joint_modeling_visiting_and_longitudinal_Liang() already
return analytic standard errors (sandwich or model-based).
bootstrap() is recommended for the methods that do not
(CIMEHR_timevarying_integral(),
Inverse_intensity_rate_ratio_weighting(),
Inverse_intensity_rate_ratio_balancing()), and as an
optional robustness check or for propagating Stage 1 nuisance
uncertainty for the methods that do. The bootstrap resamples subjects
with replacement and refits the selected method on each replicate.
bs <- bootstrap(
fit_iirr,
data = ehr500,
B = 200,
seed = 1
)
print(bs)
summary(bs)
For CIMEHR-family methods, forward covariate specifications through
covars:
bs_cimehr <- bootstrap(
fit_cimehr,
data = ehr500,
B = 200,
seed = 1,
covars = list(
covars_visit_XV = covars_all,
covars_outcome_fixed_XY = covars_all,
covars_outcome_random_link_ZY = "NSES",
covars_obs_fixed_XO = covars_all,
covars_obs_random_link_ZO = "NSES"
)
)
10. Simulating custom data
sim_data_gen() accepts a covariates list
spec; per-process coefficient vectors are keyed by the names declared
there. The default spec is the generic two-covariate
Z/X form used in the package’s internal tests,
but the EHR-style spec used to build sim_ehr_data is an
instructive example:
ehr_covs <- list(
Age = list(type = "continuous", dist = "uniform",
min = 18, max = 99, standardize = TRUE),
Gender = list(type = "binary", prob = 0.5),
Marital = list(type = "binary", prob = 0.4),
Black = list(type = "binary", prob = 0.30),
Hispanic = list(type = "binary", prob = 0.20),
NSES = list(type = "continuous", dist = "normal",
mean = 0, sd = 1, standardize = FALSE)
)
dat <- sim_data_gen(
n = 500,
time.start = 0,
time.end = 120,
seed = 42,
covariates = ehr_covs,
gamma = c(intercept = -2.8, Age = 0.5, Gender = -0.10, Marital = -0.05,
Black = 0.15, Hispanic = -0.05, NSES = 0.3),
sigma_zeta = 0.5,
alpha = c(intercept = 0.2, Age = -0.3, Gender = -0.10, Marital = 0.00,
Black = 0.05, Hispanic = -0.05, NSES = -0.6),
obs_random_covs = "NSES",
delta = c(intercept = -0.3, NSES = -0.3),
sigma_q = c(intercept = 0.5, NSES = 0.2),
phi = 0,
beta = c(intercept = -2.0, Age = 0.5, Gender = 0.10,
Marital = -0.03, Black = 0.15, Hispanic = 0.05,
NSES = -0.5),
beta_t = 0.01,
outcome_random_covs = "NSES",
theta = c(intercept = 0.4, NSES = 0.0),
Sigma_b = matrix(c(0.5, 0, 0, 0.8), 2, 2),
sigma_y = 0.7,
verbose = FALSE
)
head(dat$long_data)
#> id Age Gender Marital Black Hispanic NSES Age_raw time R
#> 1 1 1.450938 0 1 0 1 -1.5515833 92.09929 21.61704 1
#> 2 1 1.450938 0 1 0 1 -1.5515833 92.09929 76.51161 1
#> 3 1 1.450938 0 1 0 1 -1.5515833 92.09929 83.24414 1
#> 4 2 1.526923 0 0 1 0 -0.3881794 93.90311 21.83727 0
#> 5 2 1.526923 0 0 1 0 -0.3881794 93.90311 43.17168 0
#> 6 2 1.526923 0 0 1 0 -0.3881794 93.90311 78.57681 0
#> Y C
#> 1 -0.6228701 120
#> 2 -1.2704788 120
#> 3 -0.5537626 120
#> 4 NA 120
#> 5 NA 120
#> 6 NA 120
References
Aalen, O. O., Borgan, Ø., and Gjessing, H. K. (2008). Survival
and event history analysis: A process point of view. Springer.
Aalen, O. O. and Gunnes, N. (2010). A dynamic approach for
reconstructing missing longitudinal data using the linear increments
model. Biostatistics, 11, 453–472.
Bůžková, P. and Lumley, T. (2007). Longitudinal data analysis for
generalized linear models with follow-up dependent on outcome-related
variables. Canadian Journal of Statistics, 35, 485–500.
Chen, Y., Ning, J., and Cai, C. (2015). Regression analysis of
longitudinal data with irregular and informative observation times.
Biostatistics, 16, 727–739.
Du, J., Shi, X., and Mukherjee, B. (2025). A new statistical approach
for joint modeling of longitudinal outcomes measured in electronic
health records with clinically informative presence and observation
processes.
Gasparini, A., Abrams, K. R., Barrett, J. K., Major, R. W., Sweeting,
M. J., Brunskill, N. J., and Crowther, M. J. (2020). Mixed-effects
models for health care longitudinal data with an informative visiting
process: A Monte Carlo simulation study. Statistica
Neerlandica, 74, 5–23.
Goldstein, B. A., Bhavsar, N. A., Phelan, M., and Pencina, M. J.
(2016). Controlling for informed presence bias due to the number of
health encounters in an electronic health record. American Journal
of Epidemiology, 184, 847–855.
Laird, N. M. and Ware, J. H. (1982). Random-effects models for
longitudinal data. Biometrics, 38, 963–974.
Liang, Y., Lu, W., and Ying, Z. (2009). Joint modeling and analysis
of longitudinal data with informative observation times.
Biometrics, 65, 377–384.
Pullenayegum, E. M. (2026). IrregLong: Analysis of Longitudinal Data
with Irregular Observation Times. R package version 0.4.1.
Rubin, D. B. (1987). Multiple imputation for nonresponse in
surveys. John Wiley & Sons, New York.
Yiu, S. and Su, L. (2025). Accommodating informative visit times for
analysing irregular longitudinal data: A sensitivity analysis approach
with balancing weights estimators. Journal of the Royal Statistical
Society Series C: Applied Statistics, 74, 824–843.
Yang, C.-H., Shi, X., and Mukherjee, B. (2026). Joint modeling of
longitudinal EHR data with shared random effects for informative
visiting and observation processes. arXiv:2602.15374.