This release incorporates feedback from the PhD dissertation reviewers: dr hab. Andrzej Dudek, dr hab. Joanna Landmesser-Rusek, and dr hab. Paweł Andrzej Strzelecki.
occup_panel dataset - a rotational panel covering
2009Q1–2010Q4 - used to illustrate id_var direct matching
and to exercise the panel-data code path in tests."nb") added to the supported ML methods,
via e1071 (Suggests).summary_c2c() now supports both lm and
glm, adds input/shape validation
(df_old/df_new, coefficient table checks), and
reports the reference distribution used for p-values (t or
normal). Added edge-case tests and aligned regression/panel vignette
examples with current summary_c2c() output and
fixed-effects inference guidance.cat2cat_ml_run() now reports the Brier score and mean
P(true class) in addition to accuracy. A proper scoring rule matters
because cat2cat weights are probabilities, not
classifications - a model can be accurate and still be poorly
calibrated.stopifnot() assertions now carry descriptive
messages, so failures point the user at the offending argument instead
of printing the raw expression.cat2cat() ML fallback is now configurable via
ml$on_fail ("freq", "naive",
"na", "error") with optional warning control
via ml$fail_warn. Failed ML weights are now explicitly
handled according to this policy instead of always silently falling back
to frequency weights.cat2cat() ML now accepts factor features
in addition to numeric/logical: factor columns listed in
ml$features are automatically one-hot encoded using the
union of levels observed in ml$data and the target
period.id_var, aggregated-data
workflows, hierarchical-code mappings, and regression/inference after
harmonisation into one better-structured advanced reference.cat2cat() and cat2cat_agg()
trimmed - argument documentation kept, conceptual material moved to the
vignettes where it belongs.nomnoml diagram that previously mislabelled
the base/target sides under forward mapping.get_freqs() for
r-devel/R CMD check: replaced
as.data.frame(table(...)) conversion with direct
data.frame(input = names(tab), Freq = as.integer(tab))
construction to avoid failures when NA appears in table
names (row names contain missing values).cat2cat_ml_run function to check the ml models
performance before cat2cat with ml option is run. Now, the
ml models are more transparent.ropensci standards, like CONTRIBUTING file and
testthat version 3.cat2cat_agg has updated cat_var argument
to two new ones, cat_var_old and
cat_var_new.master to main branch.freqs_df argument in the cat2cat
function is moved from data to mappings part, it is backward compatible.
Now it is consistent with the python cat2cat implementation.pkgcheck related fixes, like 80 chars per line.data and library calls style.cat2cat function.dummy_c2c to be backward compatible.cat_apply_freq function performance.ml argument in the dummy_c2c function
is redefined, shorter names for a simpler usage.cat2cat ml part is using direct
cat_var for target (for an update) dataset now, not the one
from the ml argument list.cat2cat validation, if the
trans table covers all needed levels.ml and data argument in
the cat2cat::cat2cat function, two additional arguments
each.prune_c2c scales the weights now, so still sum to one
for each subject.dummy_c2c to add a default
cat2cat columns to a data.frame.occupand occup_small datasets have 4
periods now.pkgdown reference.pkgdown website.deparse instead of deparse1.tinyverse world, even less dependencies.cat2cat function, the ml part is assuming
that categorical variable is always named “code”.cat2cat function.randomForest packages to
Suggests, they are delayed loaded now.occup_small dataset to pass checks in terms of
computation time of examples.cat2cat function - “knn”, “rf”,
“lda”.prune_c2c and cross_c2c to improve
processing of results.