---
title: "Robust test statistics with semTests"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Robust test statistics with semTests}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  message = FALSE,
  warning = FALSE
)
options(warn = -1)
```

## Introduction

`semTests` computes robust *p*-values for structural equation models fit with
[`lavaan`](https://lavaan.ugent.be/). The standard chi-square test of model fit
is anti-conservative under non-normality, and the usual corrections
(Satorra–Bentler and friends) only partly fix it. The eigenvalue-based tests in
this package target the *limiting null law* of the test statistic directly: that
law is a weighted sum of independent chi-squares,
\(\sum_j \lambda_j \chi^2_1\), whose weights \(\lambda_j\) are the eigenvalues of
a matrix built from the model and the asymptotic covariance of the sample
moments. Because this holds for any minimum-discrepancy estimator, the tests are
not specific to normal-theory maximum likelihood.

The penalized eigenvalue block-averaging (`pEBA`) and penalized regression
(`pOLS`) methods were introduced by Foldnes, Moss, & Grønneberg (2024) and
extended to nested model comparison by Foldnes, Grønneberg, & Moss (2026).

```{r setup}
library("semTests")
library("lavaan")
```

## Basic usage: goodness of fit

Fit a model with `lavaan`, then call `pvalues()`. Fit with a robust test (e.g.
`estimator = "MLM"` or `"MLR"`) so that lavaan exposes the asymptotic moment
covariance the correction needs.

```{r basic}
model <- "visual  =~ x1 + x2 + x3
          textual =~ x4 + x5 + x6
          speed   =~ x7 + x8 + x9"
fit <- cfa(model, HolzingerSwineford1939, estimator = "MLM")
pvalues(fit)
```

The result is a named vector that also prints a one-line provenance footer
recording the estimator, data type, information matrix, and degrees of freedom
actually used. The full option record is available with
`attr(x, "semtests")`.

### The test-name grammar

Each requested test is a string of the form `(test)_(ug?)_(ml|rls)`:

* **test** — the family: `pEBA<j>` (penalized block-averaging, the recommended
  default; `j` is the number of blocks, typically 2–6), `pOLS<gamma>` (penalized
  regression), `pall`/`all`, `eba<j>`, or a traditional statistic `std`, `sb`
  (Satorra–Bentler), `ss` (scaled-and-shifted), `sf` (scaled *F*).
* **ug** — include `UG` to use the unbiased Du–Bentler (2022) gamma instead of
  the standard biased one.
* **ml / rls** — the base chi-square: `ML` (normal-theory discrepancy) or `RLS`
  (Browne's reweighted least squares). When omitted, a fit-appropriate default
  is chosen.

Several tests can be requested at once:

```{r grammar}
pvalues(fit, c("SB_ML", "pEBA4_ML", "pOLS2_ML"))
```

The `UG` gamma and the `RLS` statistic are defined only for the classical case
(continuous, complete data, `estimator = "ML"`); off that case they are refused
with an informative error rather than silently degrading.

## Nested model comparison

To compare two nested models, fit both and pass them to `pvalues_nested()` with
the **constrained** model first. Here we test weak invariance of the textual
loadings:

```{r nested}
constrained <- "visual  =~ x1 + x2 + x3
                textual =~ a*x4 + a*x5 + a*x6
                speed   =~ x7 + x8 + x9"
m1 <- cfa(model, HolzingerSwineford1939, estimator = "MLM")
m0 <- cfa(constrained, HolzingerSwineford1939, estimator = "MLM")
pvalues_nested(m0, m1)
```

`pall` is the recommended default for nested comparison (Foldnes, Grønneberg, &
Moss, 2026). The reduction `method` defaults to `"2000"` (Satorra 2000);
`"2001"` is also available for complete-data fits.

## Other estimators and missing data (experimental)

The same machinery applies to other estimators. The non-ML paths below are
**experimental** as of this release; the classical ML path is stable. See
`?semTests-support` for the authoritative matrix of what is supported.

GLS and ULS work directly (ULS needs a robust test for the asymptotic
covariance):

```{r estimators}
gls <- cfa(model, HolzingerSwineford1939, estimator = "GLS")
pvalues(gls, "pEBA4")

uls <- cfa(model, HolzingerSwineford1939, estimator = "ULS",
           test = "satorra.bentler")
pvalues(uls, "pEBA4")
```

Missing data is handled through full-information maximum likelihood. Fit with
`missing = "fiml"` and pass the fit to `pvalues()` as usual. FIML uses the
biased gamma and the standard statistic, so the `UG`/`RLS` suffixes do not apply;
the default suffix-free tests are the right choice.

```{r fiml}
HS <- HolzingerSwineford1939
set.seed(1)
HS$x1[sample(nrow(HS), 60)] <- NA
fit_fiml <- cfa(model, HS, missing = "fiml", estimator = "MLR")
pvalues(fit_fiml)
```

FIML inference rests on the observed information under MAR (lavaan's default for
`missing = "fiml"`); a fit using expected information triggers a warning, since
that is valid only under the stronger MCAR assumption.

## References

Du, H., & Bentler, P. M. (2022). 40-Year Old Unbiased Distribution Free
Estimator Reliably Improves SEM Statistics for Nonnormal Data. *Structural
Equation Modeling*, 29(6), 872–887.
<https://doi.org/10.1080/10705511.2022.2063870>

Foldnes, N., Moss, J., & Grønneberg, S. (2024). Improved goodness of fit
procedures for structural equation models. *Structural Equation Modeling: A
Multidisciplinary Journal*, 1–13.
<https://doi.org/10.1080/10705511.2024.2372028>

Foldnes, N., Grønneberg, S., & Moss, J. (2026). Penalized eigenvalue block
averaging: Extension to nested model comparison and Monte Carlo evaluations.
*Behavior Research Methods*. <https://doi.org/10.3758/s13428-026-02968-4>
