Scale reliability and validity

Measurement quality is part of the research design. This vignette works through the reliability and validity evidence from the published Thailand digital marketing study (Sharafuddin, Madhavan, and Wangtueai 2024). The reliability and item reports read scale definitions from an instrument, so they are shown on the bundled tourism demo, a synthetic companion to the study. The validity summary runs on the loadings the paper reported, which needs no raw data.

Reliability from the instrument

reliability_report() reads the scale definitions and reports Cronbach’s alpha and, with the optional psych package, McDonald’s omega. The published study reported strong reliability for every construct. At the higher-order level the alpha and composite reliability were 0.828 and 0.883 for digital marketing effectiveness, 0.792 and 0.906 for service quality, 0.769 and 0.843 for sustainability quality, 0.897 and 0.936 for satisfaction, and 0.921 and 0.950 for behavioural intention.

demo      <- sframe_demo_data()
instr     <- demo$instrument
responses <- demo$responses

reliability_report(responses, instr, omega = TRUE)
#> Reliability Report
#> 
#> Scale: digital_marketing (Digital marketing effectiveness)
#>   Items: 3   N: 120
#>   Alpha:   0.837
#>   Omega h: 0.837
#>   Omega t: 0.837
#> 
#> Scale: service_quality (Service quality)
#>   Items: 3   N: 120
#>   Alpha:   0.844
#>   Omega h: 0.845
#>   Omega t: 0.845
#> 
#> Scale: sustainability (Sustainability perception)
#>   Items: 2   N: 120
#>   Alpha:   0.772
#> 
#> Scale: satisfaction (Tourist satisfaction)
#>   Items: 2   N: 120
#>   Alpha:   0.816
#> 
#> Scale: behavioural_intention (Behavioural intention)
#>   Items: 2   N: 120
#>   Alpha:   0.844

Item diagnostics

Item diagnostics identify sparse items, weak item-total relationships, and floor or ceiling effects. These are the item-level facts behind a retention decision.

demo      <- sframe_demo_data()
instr     <- demo$instrument
responses <- demo$responses

items <- item_report(responses, instr)
names(items)
#> [1] "digital_marketing"     "service_quality"       "sustainability"       
#> [4] "satisfaction"          "behavioural_intention"
items[[1]]
#> $scale_id
#> [1] "digital_marketing"
#> 
#> $label
#> [1] "Digital marketing effectiveness"
#> 
#> $diagnostics
#>   item_id     mean        sd item_rest_r  floor_pct ceiling_pct n_missing
#> 1    dm_1 3.141667 0.9982828  -0.5118417 0.05000000  0.10000000         0
#> 2    dm_2 3.125000 0.9663455  -0.4619588 0.05000000  0.07500000         0
#> 3    dm_3 3.191667 0.9982828  -0.5122414 0.05833333  0.08333333         0

EFA readiness

efa_report() reports the Kaiser-Meyer-Olkin measure and Bartlett’s test as a screening step before a confirmatory model.

efa_report(responses, instr)
#> R was not square, finding R from data
#> Parallel analysis suggests that the number of factors =  4  and the number of components =  4
#> EFA Readiness Diagnostics
#> 
#>   Items:          12
#>   Complete cases: 120
#>   KMO overall:    0.761
#>   Bartlett chi-sq: 665.84  df: 66  p: 0.0000
#>   Suggested factors (parallel analysis): 4
#>   Planned rotation: oblimin
#> 
#> Note: estimate the EFA solution with a dedicated modelling package.

Convergent validity from the published loadings

This step needs no raw data. validity_report() computes composite reliability and average variance extracted from standardised loadings. Feeding the outer loadings reported in the paper reproduces its convergent validity, which is a direct check of the measurement model.

published_loadings <- list(
  DMRE = c(dmre_1 = .815, dmre_2 = .899, dmre_3 = .668, dmre_4 = .838),
  DMAU = c(dmau_1 = .843, dmau_2 = .915, dmau_3 = .920),
  DMEU = c(dmeu_1 = .897, dmeu_2 = .929, dmeu_3 = .932),
  DMPV = c(dmpv_1 = .818, dmpv_2 = .916, dmpv_3 = .900, dmpv_4 = .863),
  DSQA = c(dsqa_1 = .904, dsqa_2 = .920, dsqa_3 = .869),
  DSQT = c(dsqt_1 = .778, dsqt_2 = .883, dsqt_3 = .879, dsqt_4 = .811, dsqt_5 = .713),
  DSUQ = c(dsuq_1 = .780, dsuq_2 = .855, dsuq_3 = .845, dsuq_4 = .551, dsuq_5 = .529),
  TS   = c(ts_1 = .912, ts_2 = .950, ts_3 = .869),
  BI   = c(bi_1 = .937, bi_2 = .911, bi_3 = .940)
)

validity <- validity_report(published_loadings)
validity$reliability
#>      construct composite_reliability       AVE n_items
#> BI          BI             0.9500688 0.8638300       3
#> DMAU      DMAU             0.9221173 0.7980913       3
#> DMEU      DMEU             0.9425391 0.8454247       3
#> DMPV      DMPV             0.9288283 0.7657373       4
#> DMRE      DMRE             0.8826044 0.6552235       4
#> DSQA      DSQA             0.9258026 0.8062590       3
#> DSQT      DSQT             0.9078573 0.6647408       5
#> DSUQ      DSUQ             0.8428441 0.5273784       5
#> TS          TS             0.9359270 0.8298017       3

The average variance extracted exceeds 0.5 for every construct, and the composite reliability exceeds 0.7, so each construct explains more than half the variance in its items. The sustainability construct sits lowest, in line with the two weaker items the paper flagged (DSUQ4 and DSUQ5).

Discriminant validity

The study assessed discriminant validity in two ways. The Fornell-Larcker criterion compares the square root of a construct’s AVE with its correlations with the others. The square roots, 0.810 for digital marketing effectiveness, 0.910 for service quality, 0.726 for sustainability quality, 0.911 for satisfaction, and 0.930 for behavioural intention, each exceeded the off-diagonal correlations. The Heterotrait-Monotrait ratios were all below the 0.90 threshold, the highest being 0.766 between ease of use and accessibility. Both results support discriminant validity.

When construct scores are available, validity_report() returns the Fornell-Larcker matrix and the HTMT matrix directly.

validity_report(published_loadings, construct_scores = scored_constructs)

Collinearity

The study reported variance inflation factors below 5 for every indicator, which indicates no severe multicollinearity. assumption_report() returns variance inflation factors when a regression is specified, which is shown in the analysing-responses vignette.

Cautious interpretation

Reliability and validity summaries are diagnostics, not automatic decisions. Read them with the questionnaire wording, the sampling context, the construct definitions, and the planned model in view.