The package evclass contains methods for evidential
classification. An evidential classifier quantifies the uncertainty
about the class of a pattern by a Dempster-Shafer mass function. In
evidential distance-based classifiers, the mass functions are
computed from distances between the test pattern and either training
pattern, or prototypes. Classical classifiers such as logistic
regression, radial basis function (RBF) neural networks and multilayer
perceptrons with a softmax output layer can also be seen as evidential
classifiers. The user is invited to read the papers cited in this
vignette to get familiar with the main concepts underlying evidential
classification. These papers can be downloaded from the author’s web
site, at https://www.hds.utc.fr/~tdenoeux/. Here, we provide a
short guided tour of the main functions in the evclass
package. The classification methods implemented to date are:
Also included in this package are functions to express the outputs of already trained logistic regression of multilayer perceptron classifiers in the belief function framework, using the approach described in (Denœux 2019).
You first need to install this package:
library(evclass)The following sections contain a brief introduction on the way to use
the main functions in the package evclass for evidential
classification.
The principle of the evidential K-nearest neighbor (EK-NN) classifier
is explained in (Denœux 1995), and the
optimization of the parameters of this model is presented in (Zouhal and Denœux 1998). The reader is referred
to these references. Here, we focus on the practical application of this
method using the functions implemented in evclass.
Consider, for instance, the ionosphere data. This
dataset consists in 351 instances grouped in two classes and described
by 34 numeric attributes. The first 175 instances are training data, the
rest are test data. Let us first load the data, and split them into a
training set and a test set.
data(ionosphere)
xtr<-ionosphere$x[1:176,-2]
ytr<-ionosphere$y[1:176]
xtst<-ionosphere$x[177:351,-2]
ytst<-ionosphere$y[177:351]The EK-NN classifier is implemented as three functions:
EkNNinit for initialization, EkNNfit for
training, and EkNNval for testing. Let us initialize the
classifier and train it on the ionosphere data, with \(K=5\) neighbors. (If the argument
param is not passed to EkNNfit, the function
EkNNinit is called inside EkNNfit; here, we
make the call explicit for clarity).
set.seed(20221229)
param0<- EkNNinit(xtr,ytr)
options=list(maxiter=300,eta=0.1,gain_min=1e-5,disp=FALSE)
fit<-EkNNfit(xtr,ytr,param=param0,K=5,options=options)The list fit contains the optimized parameters, the
final value of the cost function, the leave-one-out (LOO) error rate,
the LOO predicted class labels and the LOO predicted mass functions.
Here the LOO error rate and confusion matrix are:
print(fit$err)## [1] 0.1079545table(fit$ypred,ytr)##    ytr
##       1   2
##   1 108  14
##   2   5  49We can then evaluate the classifier on the test data:
val<-EkNNval(xtrain=xtr,ytrain=ytr,xtst=xtst,K=5,ytst=ytst,param=fit$param)
print(val$err)## [1] 0.1142857table(val$ypred,ytst)##    ytst
##       1   2
##   1 107  15
##   2   5  48To determine the best value of \(K\), we may compute the LOO error for different candidate value. Here, we will all values between 1 and 15
err<-rep(0,15)
i<-0
for(K in 1:15){
  fit<-EkNNfit(xtr,ytr,K,options=list(maxiter=100,eta=0.1,gain_min=1e-5,disp=FALSE))
  err[K]<-fit$err
}
plot(1:15,err,type="b",xlab='K',ylab='LOO error rate')The minimum LOO error rate is obtained for \(K=8\). The test error rate and confusion matrix for that value of \(K\) are obtained as follows:
fit<-EkNNfit(xtr,ytr,K=8,options=list(maxiter=100,eta=0.1,gain_min=1e-5,disp=FALSE))
val<-EkNNval(xtrain=xtr,ytrain=ytr,xtst=xtst,K=8,ytst=ytst,param=fit$param)
print(val$err)## [1] 0.09142857table(val$ypred,ytst)##    ytst
##       1   2
##   1 106  10
##   2   6  53In the evidential neural network classifier, the output mass
functions are based on distances to protypes, which allows for faster
classification. The prototypes and their class-membership degrees are
leanrnt by minimizing a cost function. This function is defined as the
sum of an error term and, optionally, a regularization term. As for the
EK-NN classifier, the evidential neural network classifier is
implemented as three functions: proDSinit for
initialization, proDSfit for training and
proDSval for evaluation.
Let us demonstrate this method on the glass dataset.
This data set contains 185 instances, which can be split into 89
training instances and 96 test instances.
data(glass)
xtr<-glass$x[1:89,]
ytr<-glass$y[1:89]
xtst<-glass$x[90:185,]
ytst<-glass$y[90:185]We then initialize a network with 7 prototypes:
param0<-proDSinit(xtr,ytr,nproto=7,nprotoPerClass=FALSE,crisp=FALSE)and train this network without regularization:
options<-list(maxiter=500,eta=0.1,gain_min=1e-5,disp=20)
fit<-proDSfit(x=xtr,y=ytr,param=param0,options=options)## [1]  1.0000000  0.3582862 10.0000000
## [1] 21.0000000  0.2933648  1.2426284
## [1] 41.0000000  0.2278790  0.2197497
## [1] 61.0000000  0.1925287  0.0397398
## [1] 81.000000000  0.188994512  0.006524067
## [1] 1.010000e+02 1.883259e-01 1.274291e-03
## [1] 1.210000e+02 1.881550e-01 2.842364e-04Finally, we evaluate the performance of the network on the test set:
val<-proDSval(xtst,fit$param,ytst)
print(val$err)## [1] 0.3020833table(ytst,val$ypred)##     
## ytst  1  2  4
##    1 31  9  0
##    2  5 28  4
##    3  5  3  0
##    4  0  3  8If the training is done with regularization, the hyperparameter
mu needs to be determined by cross-validation.
In the belief function framework, there are several definitions of
expectation. Each of these definitions results in a different decision
rule that can be used for classification. The reader is referred to
(Denœux 1997) for a detailed description
of theses rules and there application to classification. Here, we will
illustrate the use of the function decision for generating
decisions.
We consider the Iris dataset from the package datasets.
To plot the decisions regions, we will only use two input attributes:
‘Petal.Length’ and ‘Petal.Width’. This code plots of the data and trains
the evidential neural network classifiers with six prototypes:
data("iris")
x<- iris[,3:4]
y<-as.numeric(iris[,5])
c<-max(y)
plot(x[,1],x[,2],pch=y,xlab="Petal Length",ylab="Petal Width")param0<-proDSinit(x,y,6)
fit<-proDSfit(x,y,param0)## [1]  1.0000000  0.3677901 10.0000000
## [1] 11.0000000  0.2188797  3.5760950
## [1] 21.00000000  0.08071952  1.33620828
## [1] 31.00000000  0.04385881  0.49005208
## [1] 41.00000000  0.03274671  0.21950743
## [1] 51.00000000  0.02745491  0.08000766
## [1] 61.00000000  0.02540228  0.03600120
## [1] 71.00000000  0.02354846  0.01519283
## [1] 81.00000000  0.02203566  0.00642574
## [1] 91.000000000  0.021082124  0.003074791
## [1] 1.010000e+02 2.090683e-02 1.323356e-03Let us assume that we have the following loss matrix:
L=cbind(1-diag(c),rep(0.3,c))
print(L)##      [,1] [,2] [,3] [,4]
## [1,]    0    1    1  0.3
## [2,]    1    0    1  0.3
## [3,]    1    1    0  0.3This matrix has four columns, one for each possible act (decision). The first three decisions correspond to the assignment to each the three classes. The losses are 0 for a correct classification, and 1 for a misclassification. The fourth decision is rejection. For this act, the loss is 0.3, whatever the true class. The following code draws the decision regions for this loss matrix, and three decision rules: the minimization of the lower, upper and pignistic expectations (see (Denœux 1997) for details about these rules).
xx<-seq(-1,9,0.01)
yy<-seq(-2,4.5,0.01)
nx<-length(xx)
ny<-length(yy)
Dlower<-matrix(0,nrow=nx,ncol=ny)
Dupper<-Dlower
Dpig<-Dlower
for(i in 1:nx){
  X<-matrix(c(rep(xx[i],ny),yy),ny,2)
  val<-proDSval(X,fit$param)
  Dupper[i,]<-decision(val$m,L=L,rule='upper')
  Dlower[i,]<-decision(val$m,L=L,rule='lower')
  Dpig[i,]<-decision(val$m,L=L,rule='pignistic')
}
contour(xx,yy,Dlower,xlab="Petal.Length",ylab="Petal.Width",drawlabels=FALSE)
for(k in 1:c) points(x[y==k,1],x[y==k,2],pch=k)
contour(xx,yy,Dupper,xlab="Petal.Length",ylab="Petal.Width",drawlabels=FALSE,add=TRUE,lty=2)
contour(xx,yy,Dpig,xlab="Petal.Length",ylab="Petal.Width",drawlabels=FALSE,add=TRUE,lty=3)As suggested in (Denœux 1997), we can also consider the case where there is an unknown class, not represented in the learning set. We can then construct a loss matrix with four rows (the last row corresponds to the unknown class) and five columns (the last column corresponds to the assignment to the unknown class). Assume that the losses are defined as follows:
L<-cbind(1-diag(c),rep(0.2,c),rep(0.22,c))
L<-rbind(L,c(1,1,1,0.2,0))
print(L)##      [,1] [,2] [,3] [,4] [,5]
## [1,]    0    1    1  0.2 0.22
## [2,]    1    0    1  0.2 0.22
## [3,]    1    1    0  0.2 0.22
## [4,]    1    1    1  0.2 0.00We can now plot the decision regions for the pignistic decision rule:
for(i in 1:nx){
  X<-matrix(c(rep(xx[i],ny),yy),ny,2)
  val<-proDSval(X,fit$param,rep(0,ny))
  Dlower[i,]<-decision(val$m,L=L,rule='lower')
  Dpig[i,]<-decision(val$m,L=L,rule='pignistic')
}
contour(xx,yy,Dpig,xlab="Petal.Length",ylab="Petal.Width",drawlabels=FALSE)
for(k in 1:c) points(x[y==k,1],x[y==k,2],pch=k)The outer region corresponds to the assignment to the unknown class: this hypothesis becomes more plausible when the test vector is far from all the prototypes representing the training data.
As shown in (Denœux 2019), the calculations performed in the softmax layer of a feedforward neural network can be interpreted in terms of combination of evidence by Dempster’s rule. The output class probabilities can be seen as normalized plausibilities according to an underlying belief function. Applying these ideas to a radial basis function (RBF) network, it is possible to derive an alternative evidential classifier with properties similar to those of the evidential neural network classifier described above (Huang et al. 2022).
As for the EK-NN and evidential neural network classifiers, the RBF
classifier is implemented as three functions: RBFinit for
initialization, RBFfit for training and RBFval
for evaluation.
Let us demonstrate this method on the glass dataset. We
initialize a network with 7 prototypes:
param0<-RBFinit(xtr,ytr,nproto=7)and train this network with regularization coefficient \(\lambda=0.001\):
fit<-RBFfit(xtr,ytr,param0,lambda=0.001,control=list(fnscale=-1,maxit=1000))Finally, we evaluate the performance of the network on the test set:
val<-RBFval(xtst,fit$param,ytst)
print(val$err)## [1] 0.2916667table(ytst,val$ypred)##     
## ytst  1  2  4
##    1 33  7  0
##    2  5 28  4
##    3  5  3  0
##    4  0  4  7The output mass functions are contained in
val$Belief:
val$Belief$mass[1:5,]##      [,1]       [,2]      [,3]        [,4]       [,5]        [,6]         [,7]
## [1,]    0 0.58304740 0.1744757 0.004657424 0.17745937 0.013651693 0.0009615845
## [2,]    0 0.07344411 0.8300411 0.006973935 0.05988434 0.004677578 0.0026116437
## [3,]    0 0.22191723 0.5970105 0.002397081 0.16344224 0.005569297 0.0011821047
## [4,]    0 0.27630118 0.5888964 0.004433872 0.10283884 0.007023508 0.0016679922
## [5,]    0 0.10489662 0.8272373 0.003132097 0.05281318 0.003176378 0.0015020620
##              [,8]        [,9]        [,10]        [,11]        [,12]
## [1,] 0.0010738384 0.025956608 0.0115497752 0.0008135317 9.085022e-04
## [2,] 0.0015483677 0.014082603 0.0025104558 0.0014016689 8.310088e-04
## [3,] 0.0005151679 0.006003336 0.0009445386 0.0002004820 8.737116e-05
## [4,] 0.0009316628 0.012820662 0.0025001596 0.0005937556 3.316442e-04
## [5,] 0.0006466034 0.004806531 0.0007150677 0.0003381449 1.455637e-04
##             [,13]        [,14]        [,15]        [,16]
## [1,] 0.0023845984 0.0026629728 1.875719e-04 2.094687e-04
## [2,] 0.0009401315 0.0005573766 3.112015e-04 1.845023e-04
## [3,] 0.0004657929 0.0002029952 4.308651e-05 1.877734e-05
## [4,] 0.0009405428 0.0005253434 1.247623e-04 6.968639e-05
## [5,] 0.0003429255 0.0001476216 6.980806e-05 3.005077e-05val$Belief$pl[1:5,]##            [,1]      [,2]       [,3]        [,4]
## [1,] 0.61776107 0.1832876 0.19859110 0.044673029
## [2,] 0.09072733 0.8439034 0.07071515 0.020818948
## [3,] 0.23165246 0.6014546 0.17143946 0.007966380
## [4,] 0.29211706 0.5970498 0.11412234 0.017906557
## [5,] 0.11289000 0.8331017 0.05872863 0.006595713val$Belief$bel[1:5,]##            [,1]      [,2]       [,3]        [,4]
## [1,] 0.58304740 0.1744757 0.17745937 0.025956608
## [2,] 0.07344411 0.8300411 0.05988434 0.014082603
## [3,] 0.22191723 0.5970105 0.16344224 0.006003336
## [4,] 0.27630118 0.5888964 0.10283884 0.012820662
## [5,] 0.10489662 0.8272373 0.05281318 0.004806531We note that, in contrast with the outputs from evidential k-NN and neural network classifiers, the output mass functions based on weights of evidence have \(2^M-1\) focal sets, where \(M\) is the number of classes. The focal sets are obtained as
val$Belief$focal##       [,1] [,2] [,3] [,4]
##  [1,]    0    0    0    0
##  [2,]    1    0    0    0
##  [3,]    0    1    0    0
##  [4,]    1    1    0    0
##  [5,]    0    0    1    0
##  [6,]    1    0    1    0
##  [7,]    0    1    1    0
##  [8,]    1    1    1    0
##  [9,]    0    0    0    1
## [10,]    1    0    0    1
## [11,]    0    1    0    1
## [12,]    1    1    0    1
## [13,]    0    0    1    1
## [14,]    1    0    1    1
## [15,]    0    1    1    1
## [16,]    1    1    1    1As shown in (Denœux 2019), logistic regression and multilayer feedforward neural networks can be viewed as converting input or higher-level features into Dempster-Shafer mass functions and aggregating them by Dempster’s rule of combination. The probabilistic outputs of these classifiers are the normalized plausibilities corresponding to the underlying combined mass function. This mass function is more informative than the output probability distribution.
Here, we illustrate the computation of mass functions for trained logistic regression and multilayer perceptron classifiers.
Let us first consider again the ionosphere dataset:
data(ionosphere)
x<-ionosphere$x[,-2]
y<-ionosphere$y-1Let us fit a logistic regression classifier on these data:
fit<-glm(y ~ x,family='binomial')The optimal cofficients (denoted as \(\alpha\) and \(\beta\) in (Denœux
2019) can be computed using function calcAB:
AB<-calcAB(fit$coefficients,colMeans(x))Using these coefficients, we can then compute the output mass functions and the corresponding contour functions:
Bel<-calcm(x,AB$A,AB$B)
Bel$focal##      [,1] [,2]
## [1,]    0    0
## [2,]    1    0
## [3,]    0    1
## [4,]    1    1Bel$mass[1:5,]##      [,1]        [,2]      [,3]         [,4]
## [1,]    0 0.035494939 0.9645048 2.975586e-07
## [2,]    0 0.022556633 0.9774395 3.869694e-06
## [3,]    0 0.114317329 0.8856825 1.519187e-07
## [4,]    0 0.015056402 0.9849436 4.161507e-10
## [5,]    0 0.007307175 0.9926928 6.313876e-10Bel$pl[1:5,]##             [,1]      [,2]
## [1,] 0.035495236 0.9645051
## [2,] 0.022560503 0.9774434
## [3,] 0.114317481 0.8856827
## [4,] 0.015056402 0.9849436
## [5,] 0.007307176 0.9926928Let us now consider again the glass dataset:
data(glass)
M<-max(glass$y)
d<-ncol(glass$x)
n<-nrow(glass$x)
x<-scale(glass$x)
y<-as.factor(glass$y)We train a neural network with \(J=5\) hidden units, using function
nnet of package nnet, with a regularization
coefficient decay=0.01:
library(nnet)
J<-5
fit<-nnet(y~x,size=J,decay=0.01)## # weights:  74
## initial  value 323.067396 
## iter  10 value 135.727407
## iter  20 value 104.137133
## iter  30 value 89.804624
## iter  40 value 78.869460
## iter  50 value 75.572605
## iter  60 value 74.986837
## iter  70 value 74.877776
## iter  80 value 73.539686
## iter  90 value 70.775342
## iter 100 value 68.769386
## final  value 68.769386 
## stopped after 100 iterationsWe first need to extract the two layers of weights and to recompute the outputs from the hidden units;
W1<-matrix(fit$wts[1:(J*(d+1))],d+1,J)
W2<-matrix(fit$wts[(J*(d+1)+1):(J*(d+1) + M*(J+1))],J+1,M)
a1<-cbind(rep(1,n),x)%*%W1
o1<-1/(1+exp(-a1))We can then compute the output mass functions as
AB<-calcAB(W2,colMeans(o1))
Bel<-calcm(o1,AB$A,AB$B)
Bel$mass[1:5,]##      [,1]       [,2]        [,3]         [,4]       [,5]         [,6]
## [1,]    0 0.09201752 0.006033288 5.452724e-06 0.90145019 1.490901e-04
## [2,]    0 0.87396005 0.003553544 3.599587e-08 0.12247590 1.010041e-05
## [3,]    0 0.06424810 0.905965884 1.310345e-05 0.02699388 2.601519e-05
## [4,]    0 0.96877896 0.016366645 1.307094e-04 0.01259386 1.460422e-04
## [5,]    0 0.72270383 0.067037475 6.110541e-07 0.21020081 1.983050e-05
##              [,7]         [,8]         [,9]        [,10]        [,11]
## [1,] 1.993250e-04 4.307527e-06 1.359320e-04 1.199772e-07 1.604026e-07
## [2,] 2.083657e-07 5.745201e-09 1.335618e-07 1.984252e-09 4.093400e-11
## [3,] 4.704915e-05 6.213525e-07 2.680469e-03 3.792644e-06 6.859096e-06
## [4,] 2.113669e-04 1.467222e-05 1.727117e-03 6.880557e-06 9.958229e-06
## [5,] 7.718707e-06 6.693631e-08 2.938967e-05 1.877425e-08 7.307576e-09
##             [,12]        [,13]        [,14]        [,15]        [,16]
## [1,] 3.466392e-09 4.385780e-06 9.477922e-08 1.267144e-07 2.738371e-09
## [2,] 1.128660e-12 1.148604e-08 3.167010e-10 6.533361e-12 1.801423e-13
## [3,] 9.058434e-08 1.361784e-05 1.798433e-07 3.252514e-07 4.295418e-09
## [4,] 6.912593e-07 1.112637e-05 7.723470e-07 1.117818e-06 7.759431e-08
## [5,] 6.337100e-11 2.371524e-07 2.056576e-09 8.004892e-10 6.941810e-12Bel$pl[1:5,]##            [,1]        [,2]       [,3]         [,4]
## [1,] 0.09217660 0.006242666 0.90180752 1.408259e-04
## [2,] 0.87397020 0.003553794 0.12248623 1.473975e-07
## [3,] 0.06429191 0.906033937 0.02708170 2.705339e-03
## [4,] 0.96907881 0.016735239 0.01297904 1.757741e-03
## [5,] 0.72272436 0.067045880 0.21022867 2.965583e-05