The binest package estimates group-level means and
standard deviations from counts of observations falling in ordered bins,
when individual observations are not available. Three estimators are
provided:
bin_means(): a fast per-group estimator under
within-group normality. Runs in time linear in the number of groups
times the number of bins.mle_hetop(): maximum-likelihood fit of the
heteroskedastic ordered probit model (HETOP; Reardon, Shear, Castellano
& Ho 2017).fh_hetop(): Bayesian variant of HETOP via Gibbs
sampling (Lockwood, Castellano & Shear 2018).bin_means() is the recommended function. It runs in
milliseconds per group and produces estimates practically identical to
HETOP. The HETOP commands mle_hetop() and
fh_hetop() are superseded by bin_means() but
remain in the package for comparison; they are harder to use and at
least 100 times slower.
For a detailed empirical comparison and the theoretical case for preferring bin means, see the accompanying paper.
This vignette compares all three estimators on a bundled dataset: district-level bin counts and reported mean scale scores from the 2017-18 administration of the State of Texas Assessments of Academic Readiness (STAAR) Grade 6 mathematics test.
library(binest)
#> Loading required package: splines
data(tx_g6_math_2018)
dim(tx_g6_math_2018)
#> [1] 1151 8
head(tx_g6_math_2018, 3)
#> district_id district_name n_tested unsatisfactory approaches meets masters
#> 1 1 CAYUGA ISD 40 8 13 13 6
#> 2 2 ELKHART ISD 85 12 30 35 8
#> 3 3 FRANKSTON ISD 63 3 26 23 11
#> reported_mean
#> 1 1657
#> 2 1653
#> 3 1675The dataset has 1,151 districts, each with bin counts in four proficiency categories and a reported average scale score that serves as ground truth.
The published cut scores defining the bin boundaries are 1,536, 1,653, and 1,772:
bin_means() with known cutpointsThis uses the published cut scores directly. Output
est_raw$group_mean_mle is on the test-score scale;
est_std$group_mean_mle is on a standardized scale
(population-weighted state mean 0, total state SD 1).
t0 <- Sys.time()
fit_bm_known <- bin_means(ngk, cutpoints = cuts)
t_bm_known <- as.numeric(Sys.time() - t0, units = "secs")
cor(fit_bm_known$est_raw$group_mean_mle, truth)
#> [1] NAplot(truth, fit_bm_known$est_raw$group_mean_mle,
pch = 16, cex = 0.5, col = rgb(0, 0, 0, 0.3),
xlab = "True district mean",
ylab = "Estimated mean (test-score scale)",
main = "bin_means (known cuts)")
abline(0, 1, col = "red", lty = 2)bin_means() with cutpoints estimated from
dataWhen cutpoints are not known, bin_means() derives them
from pooled state bin proportions via
qnorm(cumsum(pooled_props)). Output is on the standardized
scale.
t0 <- Sys.time()
fit_bm_null <- bin_means(ngk)
t_bm_null <- as.numeric(Sys.time() - t0, units = "secs")plot(truth, fit_bm_null$est_std$group_mean_mle,
pch = 16, cex = 0.5, col = rgb(0, 0, 0, 0.3),
xlab = "True district mean",
ylab = "Estimated mean (standardized)",
main = "bin_means (cuts from data)")mle_hetop() fits a joint maximum-likelihood model across
all groups. On the full 1,151-district dataset, it does not converge
within a reasonable runtime. We illustrate the method on a
representative random subsample of 50 districts.
set.seed(1)
sub <- sample(nrow(ngk), 50)
t0 <- Sys.time()
fit_mle <- mle_hetop(ngk[sub, ], iterlim = 200)
#> Warning in mle_hetop(ngk[sub, ], iterlim = 200): optimization algorithm may not
#> have converged properly; see 'nlmdetails' element of object
t_mle <- as.numeric(Sys.time() - t0, units = "secs")
cor(fit_mle$est_star$mug, truth[sub])
#> [1] 0.9716946plot(truth[sub], fit_mle$est_star$mug,
pch = 16, cex = 0.5, col = rgb(0, 0, 0, 0.3),
xlab = "True district mean",
ylab = "Estimated mean (standardized)",
main = "HETOP MLE (50-district subsample)")fh_hetop() is the Bayesian variant. Each Gibbs iteration
is linear in the number of groups; on 1,151 districts the model fits in
about nine minutes. The runtime in this vignette is deliberately short
to keep the package build tractable; for production use, longer chains
are recommended.
state_props <- colSums(ngk) / sum(ngk)
state_cum <- cumsum(state_props)
t0 <- Sys.time()
fit_fh <- fh_hetop(
ngk = ngk,
fixedcuts = qnorm(state_cum[1:2]),
p = c(10, 10),
m = c(100, 100),
gridL = c(-5.0, log(0.10)),
gridU = c( 5.0, log(5.0)),
n.iter = 2000,
n.burnin = 1000,
seed = 3142
)
t_fh <- as.numeric(Sys.time() - t0, units = "secs")
cor(fit_fh$fh_hetop_extras$est_star_mug$theta_pm, truth)Because this chunk takes about nine minutes, the package ships with
the per-district HETOP-Bayes posterior means pre-computed and stored in
inst/extdata/fh_hetop_means_tx_g6_math_2018.rds. The
scatterplot below uses those cached values.
fh_means_file <- system.file("extdata",
"fh_hetop_means_tx_g6_math_2018.rds",
package = "binest")
fh_means <- readRDS(fh_means_file)
plot(truth, fh_means,
pch = 16, cex = 0.5, col = rgb(0, 0, 0, 0.3),
xlab = "True district mean",
ylab = "Estimated mean (standardized)",
main = "HETOP Bayes (full data, cached)")| Method | Runtime | Districts |
|---|---|---|
bin_means() (known cutpoints) |
1 s | 1134 / 1151 |
bin_means() (cutpoints from data) |
1 s | 1134 / 1151 |
mle_hetop() (50-district subsample) |
3 s | 50 / 50 |
| fh_hetop (full data, not run above) | 450 s | 1151 / 1151 |
Note 1: bin_means() ran on all 1,151 districts in a
small fraction of a second — roughly 500 times faster than
fh_hetop(), and twice as fast as mle_hetop()
ran on a 50-district subsample.
Note 2: bin_means() returned an estimate for 1,134 of
the 1,151 districts. It returned NA for the remaining 17 districts,
where the mean and SD wer unidentified because 2 of the 4 bins were
empty. These districts contained fewer than 20 students each, and the
Stanford Education Data Archive does not publish estimates for districts
with fewer than 20 students (Fahle et al. 2021). Note that the HETOP
functions return estimates for 2-bin districts, but they do so using
fallback rules that are not part of the main estimation logic.
Fahle, E. M., Chavez, B., Kalogrides, D., Shear, B. R., Reardon, S. F., & Ho, A. D. (2021). Stanford Education Data Archive: Technical Documentation, Version 4.1. Stanford University.
Lockwood, J. R., Castellano, K. E., & Shear, B. R. (2018). Flexible Bayesian models for inferences from coarsened, group-level achievement data. Journal of Educational and Behavioral Statistics, 43(6), 663–692. https://doi.org/10.3102/1076998618795124
Reardon, S. F., Shear, B. R., Castellano, K. E., & Ho, A. D. (2017). Using heteroskedastic ordered probit models to recover moments of continuous test score distributions from coarsened data. Journal of Educational and Behavioral Statistics, 42(1), 3–45. https://doi.org/10.3102/1076998616666279