---
title: "Estimating District Means from Binned Test Scores with binest"
author: "Paul T. von Hippel and David J. Hunter"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Estimating District Means from Binned Test Scores with binest}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment  = "#>"
)
```

# Overview

The `binest` package estimates group-level means and standard deviations
from counts of observations falling in ordered bins, when individual
observations are not available. Three estimators are provided:

* `bin_means()`: a fast per-group estimator under within-group
  normality. Runs in time linear in the number of groups times the
  number of bins.
* `mle_hetop()`: maximum-likelihood fit of the heteroskedastic ordered
  probit model (HETOP; Reardon, Shear, Castellano & Ho 2017).
* `fh_hetop()`: Bayesian variant of HETOP via Gibbs sampling (Lockwood,
  Castellano & Shear 2018).

`bin_means()` is the recommended function. It runs in milliseconds per
group and produces estimates practically identical to HETOP. The HETOP
commands `mle_hetop()` and `fh_hetop()` are superseded by `bin_means()`
but remain in the package for comparison; they are harder to use and at
least 100 times slower.

For a detailed empirical comparison and the theoretical case for
preferring bin means, see the accompanying paper.

This vignette compares all three estimators on a bundled dataset:
district-level bin counts and reported mean scale scores from the
2017-18 administration of the State of Texas Assessments of Academic
Readiness (STAAR) Grade 6 mathematics test.

# The Texas data

```{r}
library(binest)
data(tx_g6_math_2018)
dim(tx_g6_math_2018)
head(tx_g6_math_2018, 3)
```

The dataset has 1,151 districts, each with bin counts in four
proficiency categories and a reported average scale score that
serves as ground truth.

The published cut scores defining the bin boundaries are 1,536,
1,653, and 1,772:

```{r}
ngk  <- with(tx_g6_math_2018,
             cbind(unsatisfactory, approaches, meets, masters))
cuts <- c(1536, 1653, 1772)
truth <- tx_g6_math_2018$reported_mean
```

# Method 1: `bin_means()` with known cutpoints

This uses the published cut scores directly. Output `est_raw$group_mean_mle` is on the test-score scale; `est_std$group_mean_mle`
is on a standardized scale (population-weighted state mean 0, total
state SD 1).

```{r}
t0 <- Sys.time()
fit_bm_known <- bin_means(ngk, cutpoints = cuts)
t_bm_known   <- as.numeric(Sys.time() - t0, units = "secs")
cor(fit_bm_known$est_raw$group_mean_mle, truth)
```

```{r, fig.width = 4.5, fig.height = 4.5}
plot(truth, fit_bm_known$est_raw$group_mean_mle,
     pch = 16, cex = 0.5, col = rgb(0, 0, 0, 0.3),
     xlab = "True district mean",
     ylab = "Estimated mean (test-score scale)",
     main = "bin_means (known cuts)")
abline(0, 1, col = "red", lty = 2)
```

# Method 2: `bin_means()` with cutpoints estimated from data

When cutpoints are not known, `bin_means()` derives them from pooled
state bin proportions via `qnorm(cumsum(pooled_props))`. Output is on
the standardized scale.

```{r}
t0 <- Sys.time()
fit_bm_null <- bin_means(ngk)
t_bm_null   <- as.numeric(Sys.time() - t0, units = "secs")
```

```{r, fig.width = 4.5, fig.height = 4.5}
plot(truth, fit_bm_null$est_std$group_mean_mle,
     pch = 16, cex = 0.5, col = rgb(0, 0, 0, 0.3),
     xlab = "True district mean",
     ylab = "Estimated mean (standardized)",
     main = "bin_means (cuts from data)")
```

# Method 3: HETOP MLE on a 50-district subsample

`mle_hetop()` fits a joint maximum-likelihood model across all groups.
On the full 1,151-district dataset, it does not converge within a
reasonable runtime. We illustrate the method on a
representative random subsample of 50 districts.

```{r}
set.seed(1)
sub <- sample(nrow(ngk), 50)
t0 <- Sys.time()
fit_mle <- mle_hetop(ngk[sub, ], iterlim = 200)
t_mle <- as.numeric(Sys.time() - t0, units = "secs")
cor(fit_mle$est_star$mug, truth[sub])
```

```{r, fig.width = 4.5, fig.height = 4.5}
plot(truth[sub], fit_mle$est_star$mug,
     pch = 16, cex = 0.5, col = rgb(0, 0, 0, 0.3),
     xlab = "True district mean",
     ylab = "Estimated mean (standardized)",
     main = "HETOP MLE (50-district subsample)")
```

# Method 4: HETOP Bayes on the full dataset

`fh_hetop()` is the Bayesian variant. Each Gibbs iteration is linear in
the number of groups; on 1,151 districts the model fits in about nine
minutes. The runtime in this vignette is deliberately short to keep
the package build tractable; for production use, longer chains are
recommended.

```{r, eval = FALSE}
state_props <- colSums(ngk) / sum(ngk)
state_cum   <- cumsum(state_props)

t0 <- Sys.time()
fit_fh <- fh_hetop(
  ngk       = ngk,
  fixedcuts = qnorm(state_cum[1:2]),
  p         = c(10, 10),
  m         = c(100, 100),
  gridL     = c(-5.0, log(0.10)),
  gridU     = c( 5.0, log(5.0)),
  n.iter    = 2000,
  n.burnin  = 1000,
  seed      = 3142
)
t_fh <- as.numeric(Sys.time() - t0, units = "secs")
cor(fit_fh$fh_hetop_extras$est_star_mug$theta_pm, truth)
```

Because this chunk takes about nine minutes, the package ships with
the per-district HETOP-Bayes posterior means pre-computed and stored
in `inst/extdata/fh_hetop_means_tx_g6_math_2018.rds`. The scatterplot
below uses those cached values.

```{r, fig.width = 4.5, fig.height = 4.5}
fh_means_file <- system.file("extdata",
                             "fh_hetop_means_tx_g6_math_2018.rds",
                             package = "binest")
fh_means <- readRDS(fh_means_file)

plot(truth, fh_means,
     pch = 16, cex = 0.5, col = rgb(0, 0, 0, 0.3),
     xlab = "True district mean",
     ylab = "Estimated mean (standardized)",
     main = "HETOP Bayes (full data, cached)")
```

# Summary

| Method | Runtime | Districts |
|--------|---------|-----------|
| `bin_means()` (known cutpoints) | `r sprintf("%.0f s", t_bm_known)` | `r sum(!is.na(fit_bm_known$est_raw$group_mean_mle))` / 1151 |
| `bin_means()` (cutpoints from data) | `r sprintf("%.0f s", t_bm_null)` | `r sum(!is.na(fit_bm_null$est_raw$group_mean_mle))` / 1151 |
| `mle_hetop()` (50-district subsample) | `r sprintf("%.0f s", t_mle)` | 50 / 50 |
| fh_hetop (full data, not run above) | 450 s | 1151 / 1151 |

Note 1: `bin_means()` ran on all 1,151 districts in a small fraction of a second --- roughly 500 times faster than `fh_hetop()`, and twice as fast as `mle_hetop()` ran on a 50-district subsample.

Note 2: `bin_means()` returned an estimate for 1,134 of the 1,151 districts. It returned NA for the remaining 17 districts, where the mean and SD wer unidentified because 2 of the 4 bins were empty. These districts contained fewer than 20 students each, and the Stanford Education Data Archive does not publish estimates for districts with fewer than 20 students (Fahle et al. 2021). Note that the HETOP functions return estimates for 2-bin districts, but they do so using fallback rules that are not part of the main estimation logic.

# References

Fahle, E. M., Chavez, B., Kalogrides, D., Shear, B. R., Reardon, S. F., & Ho, A. D. (2021). *Stanford Education Data Archive: Technical Documentation, Version 4.1.* Stanford University.

Lockwood, J. R., Castellano, K. E., & Shear, B. R. (2018). Flexible Bayesian models for inferences from coarsened, group-level achievement data. *Journal of Educational and Behavioral Statistics, 43*(6), 663–692. https://doi.org/10.3102/1076998618795124

Reardon, S. F., Shear, B. R., Castellano, K. E., & Ho, A. D. (2017). Using heteroskedastic ordered probit models to recover moments of continuous test score distributions from coarsened data. *Journal of Educational and Behavioral Statistics, 42*(1), 3–45. https://doi.org/10.3102/1076998616666279
