calculateSRDValues,
calculateSRDDistribution, utilsRankingMatrix,
and calculateCrossValidation now stop with an informative
error message when the input data frame contains NA values,
non-numeric columns, fewer than 2 columns, or fewer than 2 rows.
Previously, NA values could cause an infinite loop in the
C++ layer.
calculateCrossValidation now stops with an
informative error message when number_of_folds is 0 or 1.
Previously, these values caused an immediate R session crash.
calculateSRDValues,
calculateSRDDistribution, utilsRankingMatrix,
and calculateCrossValidation now stop with an informative
error message when the input data frame contains a constant column or
constant columns. Constant columns carry no information for SRD analysis
and previously caused undefined behaviour in the C++ layer.
plotCrossValidation no longer row-binds the
precomputed summary statistics (min, xx1, Q1, median, Q3, xx19, max)
together with the fold-wise SRD values into a single sample for ggplot
to recompute quartiles and whiskers from. Previously, this mixture of
real and derived values distorted the boxplot, especially when
number_of_folds was small. The box geometry (min, Q1,
median, Q3, max) is now taken directly from boxplot_values,
and the mean is computed directly from the fold-wise SRD values; neither
is derived from a contaminated sample (see also the related ordering
simplification under Improvements).
calculateSRDDistribution() and
calculateCrossValidation() gain a new seed
parameter for reproducible results. When seed = NULL (the
default), the original stochastic behaviour is preserved, ensuring full
backward compatibility.
Three example datasets are now bundled with the package and
accessible via
system.file("extdata", "<filename>", package = "rSRD").
The following three files use a semicolon separator and include a header
row (read.csv(..., header = TRUE, sep = ";")).
mep_profiles.csv: voting profiles of Members of the
European Parliament.bundesliga20_21.csv: team performance data from the
2020/21 Bundesliga season.movies1994.csv: ratings and rankings of films released
in 1994.Added a vignette, “Getting Started with rSRD”, covering the full
analysis workflow: data preprocessing, computing SRD values, the
permutation test for significance, cross-validation, and pairwise
visualisation via heatmaps. Access it with
vignette("rSRD-introduction", package = "rSRD").
plotCrossValidation no longer re-derives a column
ordering from boxplot_values. The columns of
SRD_values_of_different_folds already arrive ordered by
median, Q1, Q3, min, and max, since this ordering is performed in the
C++ layer
(Cross_Validation::Wilcoxon/Alpaydin/
Dietterich) and preserved unchanged by
calculateCrossValidation(), which only attaches solution
names to the already-ordered columns. The redundant re-ordering step has
been removed, simplifying the function and removing its dependency on
boxplot_values for this purpose
(plotCrossValidation still uses boxplot_values
directly for box geometry; see Bug fixes).ggplot2::aes_string() calls in
plotPermTest replaced with ggplot2::aes()
using the .data pronoun.plotPermTest now places solution labels at the
interpolated y-value of the distribution curve at each solution’s SRD
value, replacing a random placement that changed on every call. The x-
and y-axes now carry informative labels (“Normalised SRD value” and
“Relative frequency” or “Cumulative relative frequency” depending on the
densityToDistr argument), replacing auto-generated labels
that exposed internal implementation details.import() directives for dplyr,
ggplot2, tibble, janitor,
rlang, and stringr replaced with specific
importFrom() calls. stringr removed as an
unused dependency.utilsColorPalette is now generated via
grDevices::colorRampPalette() instead of a 250-element
hard-coded vector.?rSRD.%>% pipe operators replaced with the native R
pipe |>.testthat
(edition 3), covering calculateSRDValues(),
calculateCrossValidation(), utilsMaxSRD(),
utilsTieProbability(), utilsCalculateRank(),
utilsCreateReference(), utilsPreprocessDF()
and many other functions.