Package {CKNNRLD}


Title: Clustering-Based K-Nearest Neighbor Regression for Longitudinal Data
Version: 0.1.4
Description: Implements the 'CKNNRLD' algorithm (Clustering-Based K-Nearest Neighbor Regression for Longitudinal Data) for improving K-Nearest Neighbor ('KNN') regression on longitudinal data through cluster-based partitioning and localized prediction. Offers enhanced computational efficiency and accuracy for high-volume longitudinal datasets. The acronym 'KNN' stands for K-Nearest Neighbor. References: Loeloe MS, Tabatabaei SM, Sefidkar R, Mehrparvar AH, Jambarsang S (2025). "Boosting K-nearest neighbor regression performance for longitudinal data through a novel learning approach." BMC Bioinformatics, 26, 232. <doi:10.1186/s12859-025-06205-1>.
License: GPL-3
Encoding: UTF-8
Imports: Directional, graphics, Rfast
Depends: R (≥ 3.5.0)
NeedsCompilation: no
Language: en-US
Config/roxygen2/version: 8.0.0
Packaged: 2026-06-09 16:08:47 UTC; sadegh-pc
Author: Mohammad Sadegh Loeloe [aut, cre], Seyyed Mohammad Tabatabaei [aut], Reyhane Sefidkar [aut], Amir Houshang Mehrparvar [aut], Sara Jambarsang [aut, ths]
Maintainer: Mohammad Sadegh Loeloe <mslbiostat@gmail.com>
Repository: CRAN
Date/Publication: 2026-06-09 16:20:02 UTC

Find Optimal Number of Clusters for Longitudinal Data

Description

This function determines the best number of clusters (C) for longitudinal data clustering using the elbow method (WCSS).

Usage

BestC(Y, range_clusters = 2:4, method = "kmeans")

Arguments

Y

A matrix or data frame of longitudinal outcomes (subjects x timepoints).

range_clusters

A numeric vector of cluster numbers to evaluate (e.g., 2:4).

method

Clustering method to use (currently only "kmeans").

Value

A list with best_c, criteria, and criteria_best.

Examples

set.seed(123)
n <- 20
T <- 3
y <- matrix(rnorm(n * T), nrow = n)
best_c_info <- BestC(Y = y, range_clusters = 2:3)
print(best_c_info$best_c)


Cluster-based KNN Regression for Longitudinal Data (CKNNRLD)

Description

This function implements a clustering-based KNN regression method for longitudinal data.

Usage

CKNNRLD(xnew, y, x, k = 5, c = 4, cluster_method = "kmeans")

Arguments

xnew

A matrix of predictor values for test data.

y

A matrix or data frame of longitudinal responses (subjects x timepoints).

x

A matrix or data frame of predictors for training data.

k

Number of nearest neighbors to use.

c

Number of clusters.

cluster_method

Clustering method. Currently supports "kmeans".

Value

A data frame with predicted values and cluster assignment.

Examples

set.seed(123)
n <- 20
T <- 3
d <- 2
x <- matrix(runif(n * d), nrow = n)
y <- matrix(rnorm(n * T), nrow = n)
train_idx <- sample(1:n, 14)
test_idx <- setdiff(1:n, train_idx)
result <- CKNNRLD(
  x = x[train_idx, ],
  y = y[train_idx, ],
  xnew = x[test_idx, ],
  k = 3,
  c = 2
)
head(result)


Tune CKNNRLD Model with Automatic Cluster Selection

Description

Automatically selects the best number of clusters (C) and tunes CKNNRLD.

Usage

CKNNRLD.tune(
  y,
  x,
  nfolds = 10,
  folds = NULL,
  seed = NULL,
  A = 10,
  C_range = 2:4,
  cluster_method = "kmeans"
)

Arguments

y

Matrix of longitudinal outcomes.

x

Matrix of predictor variables.

nfolds

Number of folds for cross-validation.

folds

Optional list of pre-specified fold indices.

seed

Random seed for reproducibility.

A

Maximum number of neighbors to evaluate.

C_range

Range of cluster numbers to evaluate.

cluster_method

Clustering method to use (currently only "kmeans").

Value

A list containing best_c, cluster_results, cluster_sizes, etc.

Examples

set.seed(123)
n <- 20
T <- 3
d <- 2
x <- matrix(runif(n * d), nrow = n)
y <- matrix(rnorm(n * T), nrow = n)
tune_result <- CKNNRLD.tune(
  y = y,
  x = x,
  nfolds = 3,
  A = 4,
  C_range = 2:3
)
print(tune_result$best_c)


Standard K-Nearest Neighbor Regression for Longitudinal Data

Description

This function performs KNN regression for longitudinal data without clustering. It predicts longitudinal outcomes for new observations based on the average of their k nearest neighbors in the predictor space.

Usage

KNNRLD(xnew, y, x, k = 5)

Arguments

xnew

A matrix of predictor values for prediction (test set).

y

A matrix or data frame of longitudinal responses (training set).

x

A matrix or data frame of training predictor values.

k

Number of nearest neighbors to use. Can be a scalar or a vector.

Value

A list of matrices with predicted values for each value of k. Each matrix has dimensions nrow(xnew) x ncol(y).

Examples

set.seed(123)
n <- 20
T <- 3
d <- 2
x <- matrix(runif(n * d), nrow = n)
y <- matrix(rnorm(n * T), nrow = n)
train_idx <- sample(1:n, 14)
test_idx <- setdiff(1:n, train_idx)
pred <- KNNRLD(
  xnew = x[test_idx, ],
  y = y[train_idx, ],
  x = x[train_idx, ],
  k = 3
)
head(pred[[1]])


Tune k in KNNRLD using Cross-Validation

Description

Finds the optimal number of neighbors for KNN regression using k-fold CV.

Usage

KNNRLD.tune(
  y,
  x,
  nfolds = 10,
  folds = NULL,
  seed = NULL,
  A = 10,
  graph = FALSE
)

Arguments

y

Matrix of longitudinal outcomes.

x

Matrix of predictor variables.

nfolds

Number of cross-validation folds.

folds

Optional list of pre-specified fold indices.

seed

Optional random seed.

A

Maximum number of neighbors to evaluate.

graph

Logical; if TRUE, plots MSPE vs. k.

Value

A list containing crit, best_k, performance, and runtime.

Examples


set.seed(123)
n <- 20
T <- 3
d <- 2
x <- matrix(runif(n * d), nrow = n)
y <- matrix(rnorm(n * T), nrow = n)
tune_result <- KNNRLD.tune(
  y = y,
  x = x,
  nfolds = 3,
  A = 4
)
str(tune_result)