Robust test statistics with semTests

Introduction

semTests computes robust p-values for structural equation models fit with lavaan. The standard chi-square test of model fit can reject too often when the data are non-normal. Familiar corrections such as Satorra–Bentler help. The methods in this package keep more information about the shape of the reference distribution by using its estimated eigenvalues.

For the mathematically curious, the reference distribution is a weighted sum \(\sum_j \lambda_j \chi^2_1\). You do not need to calculate any eigenvalues by hand. Fit the model in lavaan, then let pvalues() do that part. The same basic idea covers maximum likelihood, least squares, categorical models, and FIML.

The penalized eigenvalue block-averaging (pEBA) and penalized regression (pOLS) methods were introduced by Foldnes, Moss, and Grønneberg (2025). Foldnes, Grønneberg, and Moss (2026) extended them to nested model comparison.

library("semTests")
library("lavaan")

Choose a focused guide

This vignette is a quick tour. Pick the focused guide that matches your data when you want more depth:

vignette("continuous-data", package = "semTests") covers complete continuous data, the classical ML workflow, GLS/ULS, and nested tests.
vignette("categorical-data", package = "semTests") covers ordered and mixed indicators, pairwise missingness, and categorical nested comparison.
vignette("fiml-missing-data", package = "semTests") covers continuous FIML, the finite-sample evidence for the observed-information default, the lavaan compatibility convention, and nested restriction maps.
vignette("latent-growth", package = "semTests") gives FIML a more applied outing in a latent growth model.
vignette("measurement-invariance", package = "semTests") walks through threshold and loading invariance with ordinal indicators.

Basic usage: goodness of fit

Fit a model with lavaan, then call pvalues(). For ML models, request a robust lavaan test such as estimator = "MLM" or "MLR". This makes lavaan store the extra sample information that semTests needs.

model <- "visual  =~ x1 + x2 + x3
          textual =~ x4 + x5 + x6
          speed   =~ x7 + x8 + x9"
fit <- cfa(model, HolzingerSwineford1939, estimator = "MLM")
pvalues(fit)
#>    peba4_rls 
#> 5.165712e-07 
#> estimator: ML (MLM) | data: continuous | information: expected | df: 24

The result is a named vector with a short footer showing the estimator, data type, information choice, and degrees of freedom actually used. The complete record is available with attr(x, "semtests").

The test-name grammar

Each requested test uses one of TEST, TEST_UG, TEST_ML, TEST_RLS, TEST_UG_ML, or TEST_UG_RLS:

TEST gives the family. Choose pEBA<j> for penalized block averaging, where j is usually between 2 and 6. Other choices include pOLS<gamma>, pall, all, eba<j>, std, sb, ss, and sf.
UG asks for the unbiased Du–Bentler (2022) gamma. Without UG, the standard biased gamma is used.
ML / RLS chooses the base chi-square. ML uses the normal-theory discrepancy and RLS uses Browne’s reweighted least squares statistic. Leave the suffix out to use the default for the fitted model.

Several tests can be requested at once:

pvalues(fit, c("SB_ML", "pEBA4_ML", "pOLS2_ML"))
#>        sb_ml     peba4_ml     pols2_ml 
#> 4.416190e-08 1.529609e-07 1.519741e-07 
#> estimator: ML (MLM) | data: continuous | information: expected | df: 24

The UG gamma and the RLS statistic belong to continuous, complete-data fits from lavaan’s ML family. That includes fits requested as MLM or MLR. Other model families return an error if either option is requested.

Nested model comparison

To compare two nested models, fit both and pass them to pvalues_nested() with the constrained model first. Here we ask whether the textual loadings can be held equal:

constrained <- "visual  =~ x1 + x2 + x3
                textual =~ a*x4 + a*x5 + a*x6
                speed   =~ x7 + x8 + x9"
m1 <- cfa(model, HolzingerSwineford1939, estimator = "MLM")
m0 <- cfa(constrained, HolzingerSwineford1939, estimator = "MLM")
pvalues_nested(m0, m1)
#> pall_ug_ml 
#>  0.0166158 
#> estimator: ML (MLM) | data: continuous | information: expected | df: 2 | nested (method 2000)

PALL is the recommended default for nested comparison (Foldnes, Grønneberg, and Moss, 2026). semTests uses Satorra’s 2000 construction. The 2001 method has been withdrawn because it performs poorly. Both fits must use the same estimator, options, variables, groups, and observations. If the inputs arrive in reverse order, semTests warns and swaps them.

More data families

The package also covers several non-ML estimators. These paths have been checked against a separate implementation. ?semTests-support gives the exact list of supported combinations.

GLS and ULS work directly. ULS needs a robust lavaan test so the fit contains the required sample information:

gls <- cfa(model, HolzingerSwineford1939, estimator = "GLS")
pvalues(gls, "pEBA4")
#>     peba4_ml 
#> 8.854857e-07 
#> estimator: GLS | data: continuous | information: expected | df: 24

uls <- cfa(model, HolzingerSwineford1939,
  estimator = "ULS",
  test = "satorra.bentler"
)
pvalues(uls, "pEBA4")
#>     peba4_ml 
#> 2.527237e-08 
#> estimator: ULS | data: continuous | information: expected | df: 24

Missing data is handled through full-information maximum likelihood. Fit with missing = "fiml" and pass the fit to pvalues() as usual. FIML uses the usual biased gamma and the standard statistic, so suffix-free test names are the right choice.

HS <- HolzingerSwineford1939
set.seed(1)
HS$x1[sample(nrow(HS), 60)] <- NA
fit_fiml <- cfa(model, HS, missing = "fiml", estimator = "MLR")
pvalues(fit_fiml)
#>     peba4_ml 
#> 2.845778e-08 
#> estimator: ML (MLR) (FIML) | data: continuous | information: observed | df: 24 | FIML convention: observed

Observed information is the default, following Savalei (2010). The choice and the optional lavaan-compatibility convention are explained in vignette("fiml-missing-data", package = "semTests"), along with nested FIML examples.

Ordered and mixed indicators

Categorical models use lavaan’s inspected UGamma and unscaled statistic. DWLS, including WLSMV/WLSM/WLSMVS, and ULS, including ULSMV, are supported for single-group and multigroup ordered or mixed-indicator models. Full WLS/ADF is refused because its correction reduces to the ordinary chi-square.

HSord <- HolzingerSwineford1939
ordered_names <- paste0("x", 1:9)
HSord[ordered_names] <- lapply(
  HSord[ordered_names],
  function(x) ordered(cut(x, 3))
)
fit_ordinal <- cfa(model, HSord, ordered = ordered_names)
pvalues(fit_ordinal, c("SB", "SS", "pEBA4"))
#>        sb_ml        ss_ml     peba4_ml 
#> 1.330014e-06 1.966414e-05 7.156389e-06 
#> estimator: DWLS (WLSMV) | data: categorical | information: expected | df: 24

The artificial cut only keeps this tour reproducible. In a real analysis, the decision to treat an indicator as ordered should come from how it was measured.

Categorical fits may use listwise or pairwise missingness. Pairwise support uses lavaan’s pairwise sample statistics and UGamma. Pairwise deletion has its own missingness assumptions and does not provide FIML inference. Nested categorical comparison uses method = "2000" with A.method = "delta".

References

Browne, M. W. (1974). Generalized least squares estimators in the analysis of covariance structures. South African Statistical Journal, 8, 1–24.

Du, H., & Bentler, P. M. (2022). 40-Year Old Unbiased Distribution Free Estimator Reliably Improves SEM Statistics for Nonnormal Data. Structural Equation Modeling, 29(6), 872–887. https://doi.org/10.1080/10705511.2022.2063870

Foldnes, N., Moss, J., & Grønneberg, S. (2025). Improved goodness of fit procedures for structural equation models. Structural Equation Modeling: A Multidisciplinary Journal, 32(1), 1–13. https://doi.org/10.1080/10705511.2024.2372028

Foldnes, N., Grønneberg, S., & Moss, J. (2026). Penalized eigenvalue block averaging: Extension to nested model comparison and Monte Carlo evaluations. Behavior Research Methods, 58, article 107. https://doi.org/10.3758/s13428-026-02968-4

Savalei, V. (2010). Expected versus observed information in SEM with incomplete normal and nonnormal data. Psychological Methods, 15(4), 352–367. https://doi.org/10.1037/a0020143

Satorra, A. (2000). Scaled and adjusted restricted tests in multi-sample analysis of moment structures. In R. D. H. Heijmans, D. S. G. Pollock, & A. Satorra (Eds.), Innovations in Multivariate Statistical Analysis (pp. 233–247). Kluwer Academic. https://doi.org/10.1007/978-1-4615-4603-0_17

Satorra, A., & Bentler, P. M. (1994). Corrections to test statistics and standard errors in covariance structure analysis. In A. von Eye & C. C. Clogg (Eds.), Latent Variables Analysis: Applications for Developmental Research (pp. 399–419). Sage.