---
title: "Large-Scale Poisson Regression with lsReg"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Large-Scale Poisson Regression with lsReg}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

## Overview

This vignette demonstrates large-scale Poisson regression using `lsReg` with
the score test. The score test is computed entirely from the base model fit and
does not require fitting the full model for each candidate covariate, making it
especially efficient for large-scale screening. Because the full model is never
fit, no parameter estimate is produced — only the test statistic.

## Example data

We use the same simulated dataset as the other vignettes. The count outcome
`ypois` was generated from a Poisson model with `x1`, `x2`, `z5`, and `z9` as
predictors. The variables `z1` through `z10` are the candidates to be screened.

```{r load-data}
library(lsReg)

datafile <- system.file("extdata", "simulated_data.rds", package = "lsReg")
dat <- readRDS(datafile)
head(dat[, c("ypois", "x1", "x2", "z1", "z2", "z5", "z9")])
```

## Step 1: Fit the base model

```{r base-model}
basemdl <- glm(ypois ~ x1 + x2, data = dat, family = poisson)
summary(basemdl)
```

## Step 2: Allocate memory

```{r allocate}
mem <- lsReg(basemdl, colstoadd = 1, testtype = "score")
```

## Step 3: Test each candidate covariate

After each call to `addcovar()` the score test statistic is in
`mem$testvalue`. The score test returns a z-score, so p-values are computed
from the standard normal distribution.

```{r run-tests}
zvars <- paste0("z", 1:10)

results <- data.frame(
  variable  = zvars,
  statistic = NA_real_
)

for (i in seq_along(zvars)) {
  xr <- as.matrix(dat[, zvars[i], drop = FALSE])
  addcovar(mem, xr)
  results$statistic[i] <- mem$testvalue[1, 1]
}
```

## Results

```{r results}
results$pvalue <- 2 * pnorm(-abs(results$statistic))
print(results, digits = 4)
```

The variables `z5` and `z9` have the largest score statistics and the smallest
p-values, consistent with the data-generating model.

## Verification against statmod

The `glm.scoretest()` function from the `statmod` package computes the score
test statistic for a single candidate covariate added to a fitted GLM. We use
it to verify the `lsReg` results for `z5` and `z9`.

```{r verify}
library(statmod)

statmod_stat_z5 <- glm.scoretest(basemdl, dat[, "z5"])
statmod_stat_z9 <- glm.scoretest(basemdl, dat[, "z9"])

lsreg_stat_z5 <- results$statistic[results$variable == "z5"]
lsreg_stat_z9 <- results$statistic[results$variable == "z9"]

comparison <- data.frame(
  variable         = c("z5", "z9"),
  statmod_statistic = c(statmod_stat_z5, statmod_stat_z9),
  lsreg_statistic  = c(lsreg_stat_z5,   lsreg_stat_z9)
)
print(comparison, digits = 6)
```

The score statistics from `lsReg` match those from `statmod::glm.scoretest()`
to numerical precision.
