---
title: "Publication-Ready Visualisation"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Publication-Ready Visualisation}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment  = "#>",
  eval     = FALSE
)
```

## Overview

Two functions produce publication-ready figures and tables with minimal
post-processing:

| Function | Output | Typical use |
|---|---|---|
| `plot_forest()` | Forest plot (PNG / PDF / JPG / TIFF) | Regression results from `assoc_*()` |
| `plot_tableone()` | Table 1 (DOCX / HTML / PDF / PNG) | Baseline characteristics |

When `save = TRUE`, both functions write all supported formats in a single call
and return the plot/table object invisibly for further customisation.

---

## `plot_forest()` — Forest Plot

### Minimal example

`plot_forest()` takes a data frame whose **first column** is the row label,
plus any additional display columns. The CI graphic and formatted
`OR (95% CI)` text column are inserted automatically.

```{r forest-minimal}
library(ukbflow)

df <- data.frame(
  item      = c("Exposure vs. control", "Unadjusted", "Fully adjusted"),
  `Cases/N` = c("", "89 / 4 521", "89 / 4 521"),
  p_value   = c(NA_real_, 0.001, 0.006),
  check.names = FALSE
)

p <- plot_forest(
  data      = df,
  est       = c(NA,   1.52, 1.43),
  lower     = c(NA,   1.18, 1.11),
  upper     = c(NA,   1.96, 1.85),
  ci_column = 3L,
  indent    = c(0L,   1L,   1L),
  p_cols    = "p_value",
  xlim      = c(0.5,  3.0)
)
plot(p)
```

### Building the input data frame from `assoc_*()` results

The output of `assoc_coxph()` (and siblings) can be reshaped directly into
the format expected by `plot_forest()`:

```{r forest-from-assoc}
dt  <- ops_toy(scenario = "association")
dt  <- dt[dm_timing != 1L]

res <- assoc_coxph(
  data         = dt,
  outcome_col  = "dm_status",
  time_col     = "dm_followup_years",
  exposure_col = "p20116_i0",
  covariates   = c("bmi_cat", "tdi_cat", "p1558_i0")
)
res <- as.data.frame(res)

# Reshape: one row per model, label column first
df2 <- data.frame(
  item    = c("Smoking status", as.character(res$model)),
  `N`     = c("", paste0(res$n, " / ", res$n_events)),
  p_value = c(NA_real_, res$p_value),
  check.names = FALSE
)

p <- plot_forest(
  data      = df2,
  est       = c(NA,   res$HR),
  lower     = c(NA,   res$CI_lower),
  upper     = c(NA,   res$CI_upper),
  ci_column = 3L,
  indent    = c(0L,   rep(1L, nrow(res))),
  p_cols    = "p_value",
  xlim      = c(0.5,  2.5)
)
plot(p)
```

### Key parameters

**CI appearance**

```{r forest-ci}
# uses df, est, lower, upper from the minimal example above
p <- plot_forest(
  data      = df,
  est       = est, lower = lower, upper = upper,
  ci_column = 3L,
  ci_col    = c("grey50", "steelblue", "steelblue"),  # per-row colours
  ci_sizes  = 0.5,       # point size
  ci_Theight = 0.15,     # cap height
  ref_line  = 1,         # reference line (use 0 for beta coefficients)
  xlim      = c(0.2, 5), ticks_at = c(0.5, 1, 2, 3)
)
```

**Row labels and indentation**

```{r forest-indent}
# indent = 0 → bold parent row; indent >= 1 → indented sub-row (plain)
p <- plot_forest(
  data       = df,
  est        = est, lower = lower, upper = upper,
  ci_column  = 3L,
  indent     = c(0L, 1L, 1L),        # parent + 2 sub-rows
  bold_label = c(TRUE, FALSE, FALSE)  # explicit control (overrides indent default)
)
```

**P-value formatting**

```{r forest-pval}
# p_cols: column names in data that contain raw numeric p-values.
# Values < 10^(-p_digits) are displayed as e.g. "<0.001".
# bold_p = TRUE bolds all p < p_threshold (default 0.05).
p <- plot_forest(
  data        = df,
  est         = est, lower = lower, upper = upper,
  ci_column   = 3L,
  p_cols      = "p_value",
  p_digits    = 3L,
  bold_p      = TRUE,
  p_threshold = 0.05
)
```

**Column headers and alignment**

`header` renames all columns in the *final* rendered table.  The final table
always has `ncol(data) + 2` columns: the original columns, plus the `gap_ci`
graphic column and the auto-generated `OR (95% CI)` text column.  Pass `""`
for the gap column position.

```{r forest-header}
# data has 3 columns → final table has 5 columns (original 3 + gap_ci + OR label)
# Layout with ci_column = 3L: item | Cases/N | gap_ci | OR (95% CI) | p_value
p <- plot_forest(
  data      = df,
  est       = est, lower = lower, upper = upper,
  ci_column = 3L,
  header    = c("Comparison", "Cases / N", "", "HR (95% CI)", "P-value")
  #             col 1          col 2        gap  OR label       col 5
)
```

`align` controls per-column text alignment across all `ncol(data) + 2`
columns: `-1` = left, `0` = centre, `1` = right.  `NULL` (default) left-aligns
column 1 and centres the rest.

```{r forest-align}
p <- plot_forest(
  data      = df,
  est       = est, lower = lower, upper = upper,
  ci_column = 3L,
  align     = c(-1L, 0L, 0L, 0L, 1L)   # label left | Cases/N centre | gap | OR centre | p right
)
```

**Background and borders**

```{r forest-style}
p <- plot_forest(
  data       = df,
  est        = est, lower = lower, upper = upper,
  ci_column  = 3L,
  background = "zebra",       # "zebra" | "bold_label" | "none"
  bg_col     = "#F0F0F0",     # shading colour
  border     = "three_line",  # "three_line" | "none"
  border_width = 3            # scalar or length-3 vector (top / mid / bottom)
)
```

**Layout and saving**

```{r forest-save}
# uses df, est, lower, upper from the minimal example above
p <- plot_forest(
  data        = df,
  est         = est, lower = lower, upper = upper,
  ci_column   = 3L,
  row_height  = NULL,   # auto (8 / 12 / 10 / 15 mm); or scalar/vector
  col_width   = NULL,   # auto (rounds up to nearest 5 mm)
  save        = TRUE,
  dest        = "forest_main",   # extension ignored; all 4 formats saved
  save_width  = 20,              # cm
  save_height = NULL             # auto: nrow(data) * 0.9 + 3 cm
)
```

> All four formats (PNG, PDF, JPG, TIFF) are written at **300 dpi** with a
> white background. The function returns the plot object invisibly; display
> with `plot(p)` or `grid::grid.draw(p)`.

---

## `plot_tableone()` — Baseline Characteristics Table

### Minimal example

```{r tableone-minimal}
library(gtsummary)
data(trial)   # built-in gtsummary dataset

plot_tableone(
  data   = trial,
  vars   = c("age", "marker", "grade"),
  strata = "trt",
  save   = FALSE
)
```

### With SMD, custom labels, and export

```{r tableone-full}
plot_tableone(
  data    = trial,
  vars    = c("age", "marker", "grade", "stage"),
  strata  = "trt",
  label   = list(age ~ "Age (years)", marker ~ "Marker level (ng/mL)"),
  add_p   = TRUE,    # Wilcoxon / chi-squared p-values; formatted as <0.001
  add_smd = TRUE,
  overall = TRUE,
  dest    = "table1",
  save    = TRUE
)
```

### Key parameters

**Variable types and statistics**

```{r tableone-types}
dt <- as.data.frame(ops_toy(scenario = "association"))

plot_tableone(
  data      = dt,
  vars      = c("p21022", "p21001_i0", "p31", "p20116_i0"),
  strata    = "dm_status",
  type      = list(p21022 = "continuous2"),   # show median + IQR
  statistic = list(
    all_continuous()  ~ "{mean} ({sd})",
    all_categorical() ~ "{n} ({p}%)"
  ),
  digits    = list(p21022 ~ 1, p21001_i0 ~ 1),
  missing   = "ifany",   # show missing counts when present
  save      = FALSE
)
```

**SMD column**

The SMD column summarises covariate balance between groups:
- Continuous variables: Cohen's *d* (pooled-SD formula)
- Categorical variables: RMSD of group proportions

```{r tableone-smd}
plot_tableone(
  data    = dt,
  vars    = c("p21022", "p21001_i0", "p31"),
  strata  = "dm_status",
  add_smd = TRUE,
  save    = FALSE
)
```

**Excluding rows**

Use `exclude_labels` to remove specific level rows from the rendered table
(e.g. a redundant reference category or an "Unknown" level):

```{r tableone-exclude}
plot_tableone(
  data           = dt,
  vars           = c("p31", "p20116_i0"),
  strata         = "dm_status",
  exclude_labels = "Never",   # e.g. remove reference category from display
  save           = FALSE
)
```

**Export formats**

When `save = TRUE`, four files are written simultaneously:

| Format | Tool | Notes |
|---|---|---|
| `.docx` | `gt::gtsave()` | Ready for Word submission |
| `.html` | `gt::gtsave()` | Interactive preview |
| `.pdf` | `pagedown::chrome_print()` | Requires Chrome / Chromium |
| `.png` | `webshot2::webshot()` | 2x zoom, table element only |

> PDF and PNG rendering requires `pagedown` and `webshot2` respectively.
> Install with `install.packages(c("pagedown", "webshot2"))`.

---

## Getting Help

- `?plot_forest`, `?plot_tableone`
- `vignette("assoc")` — association analysis producing forest plot inputs
- [GitHub Issues](https://github.com/evanbio/ukbflow/issues)