Ce jeu de données est constitué de caractéristiques de chiffres manuscrits (0'--9’) extrait d’une collection de cartes d’utilité publique néerlandaises. Les modèles ont été numérisés en images binaire. Ces chiffres sont représentés par les six des ensembles de fonctionnalités suivantes :
Nous n’avons pas de valeurs manquantes. La problématique: en analysant les 6 groupes séparement on gardera dans chaque groupe les variables qui influence le plus le choix des classes et cherchera le meilleur algorithme qui peut predire les classes.
Nous chargeons le fichier data_train et les bibliothèques associées.
table(sapply(data_train,class))
nrow(data_train)
ncol(data_train)
nrow(data_train$class)
sum(data_train$class == "0")
sum(data_train$class == "1")
data_train[1002,]$class
Nous avons 650 variables et 1500 patterns. On une variable factor qui correspond à “class” et 649 variables numériques. Nous avons 150 patterns par classe (0-9).
Nous allons explorer et prétraiter chaque groupe séparement puis les transformer afin de réduire le nombre de variables pour les prédictions.
#variables quantitatives
var.numeric<-which(sapply(data_train,class)=="numeric"|sapply(data_train,class)=="Factor")
names(var.numeric)
#variables qualitatives
var.factor<-which(sapply(data_train,class)=="factor")
names(var.factor)
data_fou = data_train[, c(217: 292,650)]
data_fou_pred = data_test[1, c(217: 292)]
var1.numeric<-which(sapply(data_fou,class)=="numeric"|sapply(data_fou,class)=="Factor")
var1.factor<-which(sapply(data_fou,class)=="factor")
names(var1.numeric)
names(var1.factor)
# chargement de la librarie
library(stargazer)
#L'écart-type (standard déviation en anglais)
stargazer(data_fou,summary.stat=c("n","min","p25","median","mean","p75","max","sd"),type = "text")
Nous retrouvons 1500 patterns et il y a 25% des observations qui ont des valeurs inférieures entre 0.046 et 0.275 au 1er quartile. Si la distribution est symétrique alors la médiane est égale à la moyenne. On voit ici que les valeurs sont proches. On remarque que l’écart-type (qui est la racine carrée de la variance) est faible et qui correspond l’étalement des donnés autour de la valeur centrale “moyenne. Il y a 25% des patterns qui ont des valeurs supérieures entre 0.094 et 0.527 au 3ème quartile.
data_fac = data_train[, c(1: 216,650)]
var2.numeric<-which(sapply(data_fac,class)=="numeric"|sapply(data_fac,class)=="Factor")
var2.factor<-which(sapply(data_fac,class)=="factor")
names(var2.numeric)
names(var2.factor)
#L'écart-type (standard déviation en anglais)
stargazer(data_fac,summary.stat=c("n","min","p25","median","mean","p75","max","sd"),type = "text")
On peut remarquer qu’aucune des variables n’est constante.On voit ici que les valeurs des moyennes et medianes sont proches. Les distributions sont en majorité symétrique. L’ecart type est très élevé et qui correspond à un grand étalement des valeurs.
On remarque que peu de variables suivent une loi normale mais on identifie quelques variables avec une distribution non-symétrique comme fou_2, fou_3, fou_8, fou_73, fou_74, fou_76.
On repère ensuite les potentielles valeurs aberrantes par les boîtes à moustaches.
Il est difficile de prendre une décison sur les valeurs aberrantes mais en regardant les variables fou_2, fou_3, fou_8, fou_73, fou_74, fou_76, on ne retrouvent pas de valeurs supérieures au 3ème quartile.
On remarque que peu de variables suivent une loi normale mais on identifie quelques variables avec une distribution non-symétrique comme fac_168, fac_207, fac_117. On repère ensuite les potentielles valeurs aberrantes par les boîtes à moustaches.
Il est difficile de prendre une décison sur les valeurs aberrantes mais en regardant les variables fac_1, fac_2, fac_177, fac_190, fac_200, fac_204, fac_201, fac_199 fac_198 fac_192 à fac_187 fac_181 à fac_177, fac_167 fac_165 fac_164 fac_157 155 153 151 140 139 132 127 119 103 93 78 68 55 44 43 35 23 21 19 8 à 13 on ne retrouvent pas de valeurs supérieures aux extrèmes.
Dans le cadre de notre objectif de prédiction, il convient prioritairement d’étudier le lien entre les variables explicatives et la réponse. Afin d’identifier la nature du lien entre les variables quantitatives et la réponse, on regarde quelles sont les variables les plus liées à la classification des 1500 patterns du dataset.
head(sort(res.eta2,decreasing = TRUE),30)
## fou_2 fou_7 fou_76 fou_74 fou_73 fou_5 fou_1 fou_3
## 0.7038799 0.6961865 0.6836988 0.6821754 0.6683027 0.5640135 0.5555146 0.5437522
## fou_8 fou_6 fou_14 fou_72 fou_71 fou_9 fou_69 fou_10
## 0.5179851 0.5152077 0.5121813 0.5001308 0.4807067 0.4386385 0.4037635 0.3347831
## fou_13 fou_18 fou_75 fou_11 fou_70 fou_65 fou_12 fou_67
## 0.3328472 0.3296812 0.3256250 0.3227679 0.3051005 0.2785351 0.2370006 0.2359260
## fou_4 fou_63 fou_16 fou_61 fou_26 fou_23
## 0.2339438 0.2289425 0.2278498 0.2201945 0.2109669 0.2091307
Parmi les variables quantitatives, les 10 liaisons les plus fortes sont observées pour les variables fou_2, fou_7, fou_76, fou_74, fou_73, fou_6 fou_1, fou_3 fou_8 . Au contraire, les variables fou_22, fou_32 et fou_25 n’apparaissent pas comme discriminantes.
head(sort(res2.eta2,decreasing = TRUE),60)
## fac_181 fac_29 fac_1 fac_133 fac_65 fac_97 fac_109 fac_185
## 0.8235833 0.7640065 0.7623640 0.7588678 0.7496635 0.7459444 0.7428700 0.7419308
## fac_55 fac_53 fac_113 fac_67 fac_125 fac_19 fac_7 fac_108
## 0.7304271 0.7191773 0.7106324 0.7099867 0.7066622 0.6915551 0.6875124 0.6866042
## fac_157 fac_84 fac_37 fac_207 fac_111 fac_135 fac_2 fac_199
## 0.6742395 0.6646775 0.6593644 0.6565910 0.6547879 0.6546994 0.6534518 0.6492029
## fac_146 fac_94 fac_193 fac_123 fac_13 fac_43 fac_195 fac_183
## 0.6490177 0.6474979 0.6438690 0.6358080 0.6351870 0.6322438 0.6237683 0.6228601
## fac_12 fac_147 fac_3 fac_184 fac_198 fac_41 fac_50 fac_132
## 0.6217602 0.6183220 0.6154372 0.6146831 0.6129402 0.6092013 0.6052495 0.6017710
## fac_22 fac_91 fac_194 fac_177 fac_120 fac_154 fac_186 fac_204
## 0.6014057 0.5948239 0.5939674 0.5935456 0.5910816 0.5875155 0.5869117 0.5868520
## fac_165 fac_38 fac_86 fac_112 fac_26 fac_10 fac_115 fac_190
## 0.5858476 0.5830233 0.5806618 0.5767439 0.5763151 0.5733795 0.5728362 0.5722956
## fac_106 fac_144 fac_33 fac_57
## 0.5665942 0.5650814 0.5646108 0.5646108
Parmi les variables quantitatives, les 10 liaisons les plus fortes sont observées pour les variables fac_181, fac_29, fac_1, fac_65, fac_55, fac_53, fac_67, fac_19, fac_7 fac_37, fac_2. Au contraire, les variables fac_60, fac_17 et fac_28 n’apparaissent pas comme discriminantes.
La distribution des variables continues conditionnellement à la variable réponse est parfois déterminante pour l’utilisation de certains modèles, notamment l’analyse linéaire discriminante. On analyse donc la nature de ces distributions.
library(lattice);library(gridExtra)
##
## Attaching package: 'gridExtra'
## The following object is masked from 'package:dplyr':
##
## combine
# distribution conditionnelle de fou_2
plot1<-lattice::histogram(~fou_2|class,data=data_fou,type="density",col="lightblue",ylab="Densité")
# distribution conditionnelle de fou_7
plot2<-lattice::histogram(~fou_7|class,data=data_fou,type="density",col="lightblue",ylab="Densité")
# distribution conditionnelle de fou_76
plot3<-lattice::histogram(~fou_76|class,data=data_fou,type="density",col="lightblue",ylab="Densité")
# distribution conditionnelle de fou_74
plot4<-lattice::histogram(~fou_74|class,data=data_fou,type="density",col="lightblue",ylab="Densité")
# distribution conditionnelle de fou_73
plot5<-lattice::histogram(~fou_73|class,data=data_fou,type="density",col="lightblue",ylab="Densité")
# affichage
grid.arrange(plot1,plot2,plot3,plot4,plot5,nrow=2,ncol=3)
Les distributions conditionnelles des variables fou_2, fou_7, fou_76 sont normales. Cependant pour la classe 1 de la variable fou_74 et pour les classes 0 et 1 de la variable fou_73, les distribution ne sont pas normales.
library(lattice);library(gridExtra)
# distribution conditionnelle de fac_29
plot1<-lattice::histogram(~fac_29|class,data=data_fac,type="density",col="lightblue",ylab="Densité")
# distribution conditionnelle de fac_1
plot2<-lattice::histogram(~fac_1|class,data=data_fac,type="density",col="lightblue",ylab="Densité")
# distribution conditionnelle de fac_65
plot3<-lattice::histogram(~fac_65|class,data=data_fac,type="density",col="lightblue",ylab="Densité")
# distribution conditionnelle de fac_55
plot4<-lattice::histogram(~fac_55|class,data=data_fac,type="density",col="lightblue",ylab="Densité")
# distribution conditionnelle de fac_53
plot5<-lattice::histogram(~fac_53|class,data=data_fac,type="density",col="lightblue",ylab="Densité")
# affichage
grid.arrange(plot1,plot2,plot3,plot4,plot5,nrow=2,ncol=3)
Les distributions conditionnelles des variables fac_29, fac_1, fac_65,fac_55 , fac_53 sont normales. Cependant pour la classe 6 et 7 identifie clairement une queue à droite ou à gauche.
Des liaisons trop fortes entre variables explicatives peuvent conduire à de grande instabilité dans les modèles. L’analyse des liaisons entre variables explicatives permettra de détecter les couples de variables les plus liées.
On détermine les valeurs des coefficients de corrélation linéaire et de Spearman entre les variables quantitatives. Le test
library(DescTools)
matcram<-cor(data_fou[,var1.numeric])
PlotCorr(matcram,
breaks=seq(0, 1, length=21),cex.axis = 0.7,
args.colorlegend = list(labels=sprintf("%.1f", seq(0, 1, length = 15)), frame=TRUE))
text(x=rep(1:ncol(matcram),ncol(matcram)), y=rep(1:ncol(matcram),each=ncol(matcram)),
label=sprintf("%0.2f", matcram[,ncol(matcram):1]), cex=0.2, xpd=TRUE)
Nous n’observons pas de très fortes liaisons.
Voici la matrice des corrélations avec coefficient de spearman :
matcor<-cor(data_fou[,var1.numeric],method = "spearman")
PlotCorr(matcram,
breaks=seq(0, 1, length=21),cex.axis = 0.7,
args.colorlegend = list(labels=sprintf("%.1f", seq(0, 1, length = 15)), frame=TRUE))
text(x=rep(1:ncol(matcor),ncol(matcor)), y=rep(1:ncol(matcor),each=ncol(matcor)),
label=sprintf("%0.2f", matcor[,ncol(matcor):1]), cex=0.3, xpd=TRUE)
On constate que les coefficients de corrélation et de Spearman sont plutôt proches et que les liaisons ne sont pas très fortes. On ne s’attend donc pas à des problèmes de colinéarité entre ces variables.
library(DescTools)
matcram<-cor(data_fac[,var2.numeric])
PlotCorr(matcram,
breaks=seq(0, 1, length=21),cex.axis = 0.7,
args.colorlegend = list(labels=sprintf("%.1f", seq(0, 1, length = 15)), frame=TRUE))
text(x=rep(1:ncol(matcram),ncol(matcram)), y=rep(1:ncol(matcram),each=ncol(matcram)),
label=sprintf("%0.2f", matcram[,ncol(matcram):1]), cex=0.2, xpd=TRUE)
Nous n’observons pas de très fortes liaisons.
Voici la matrice des corrélations avec coefficient de spearman :
matcor<-cor(data_fac[,var2.numeric],method = "spearman")
PlotCorr(matcram,
breaks=seq(0, 1, length=21),cex.axis = 0.7,
args.colorlegend = list(labels=sprintf("%.1f", seq(0, 1, length = 15)), frame=TRUE))
text(x=rep(1:ncol(matcor),ncol(matcor)), y=rep(1:ncol(matcor),each=ncol(matcor)),
label=sprintf("%0.2f", matcor[,ncol(matcor):1]), cex=0.3, xpd=TRUE)
On constate que les coefficients de corrélation et de Spearman sont plutôt proches et que les liaisons ne sont pas très fortes. On ne s’attend donc pas à des problèmes de colinéarité entre ces variables.
L’analyse multivariée va permettre notamment de résumer les liaisons entre les variables explicatives et d’identifier des groupes de patterns aux profils similaires. Les variables étant de natures différentes, on effectue d’abord une ACP puis un ACM après discrétisation des variables quantitatives que l’on complètera par une CAH effectuée sur les premières composantes.
Nous effectuons une ACP sur les données quantitatives en considérant class comme une variable supplémentaire.
Distribution de l’inertie
L’inertie des axes factoriels indique d’une part si les variables sont structurées et suggére d’autre part le nombre judicieux de composantes principales à étudier.
par(mar = c(2.6, 4.1, 1.1, 2.1))
#barplot(res.pca$eig[,2], names.arg = 1:nrow(res.pca$eig))
fviz_eig(res1.pca, addlabels = TRUE, ylim = c(0, 18))
Pour la réduction du volume de données, on impose la conservation d’une qualité d’approximation des données initiales et avec 35 dimensions on a 82% des données. Pour le choix du nombre d’axes en ACP, les différences entre valeurs propres successives deviennent faibles après la 4eme dimension. On va alors garder 4 axes en ACP car les axes principaux suivants ne sont plus significatifs. On a 32.363% des données initiales.
Dim.1 Dim.2 Dim.3 Dim.4 Dim.5 Dim.6
15.421 6.795 6.458 3.688 3.316 3.205
15.421 22.217 28.675 32.363 35.679 38.884
Figure 1 - Décomposition de l’inertie totale
Une estimation du nombre pertinent d’axes à interpréter suggére de restreindre l’analyse à la description des 2 premiers axes.
Description du plan 1:4
drawn <-
integer(0)
par(mar = c(4.1, 4.1, 1.1, 2.1))
#plot.PCA(res.pca, select = drawn, axes = 1:2, choix = 'ind', invisible = 'quali', title = '', cex = cex)
fviz_contrib(res1.pca, choice = "ind", axes = 1:4, top = 10)+
theme(axis.text = element_text(size = 7.5))
Description du plan 1:2
drawn <-
integer(0)
par(mar = c(4.1, 4.1, 1.1, 2.1))
#plot.PCA(res.pca, select = drawn, axes = 1:2, choix = 'ind', invisible = 'quali', title = '', cex = cex)
fviz_contrib(res1.pca, choice = "ind", axes = 1:2, top = 10)+
theme(axis.text = element_text(size = 7.5))
Description du plan 3:4
drawn <-
integer(0)
par(mar = c(4.1, 4.1, 1.1, 2.1))
#plot.PCA(res.pca, select = drawn, axes = 1:2, choix = 'ind', invisible = 'quali', title = '', cex = cex)
fviz_contrib(res1.pca, choice = "ind", axes = 3:4, top = 10)+
theme(axis.text = element_text(size = 7.5))
data_fou[644,]$class
data_fou[828,]$class
data_fou[504,]$class
data_fou[555,]$class
data_fou[119,]$class
data_fou[585,]$class
data_fou[67,]$class
data_fou[570,]$class
data_fou[810,]$class
data_fou[868,]$class
summary(res1.pca)
res1.pca$var
## $coord
## Dim.1 Dim.2 Dim.3 Dim.4 Dim.5
## fou_1 0.51308897 -3.325406e-02 -0.4554217415 -0.008136542 0.098278620
## fou_2 0.40339360 -3.386300e-01 -0.1618096419 0.290845462 -0.233603813
## fou_3 0.56166486 -4.262304e-02 -0.4584340103 0.135870923 -0.006262507
## fou_4 -0.14945267 -2.460082e-01 0.0008672515 0.114063915 -0.261061336
## fou_5 0.14262144 6.598985e-01 0.2151071236 -0.035466310 0.125405161
## fou_6 0.39105883 -3.855206e-01 0.0960059008 0.190132541 -0.148252850
## fou_7 0.18039382 2.354610e-01 0.6029209284 -0.297439947 -0.003902100
## fou_8 0.55963257 -1.704291e-02 -0.4692068478 0.110808057 0.033169533
## fou_9 -0.01195126 2.996149e-01 0.4831509253 -0.159837930 -0.149451458
## fou_10 0.36716213 -2.032773e-01 0.3674191049 0.138392925 -0.411474141
## fou_11 0.64671677 -1.313877e-02 0.0453676940 0.101652111 0.135855078
## fou_12 0.14168140 4.029960e-01 0.1762817485 -0.007265882 0.427223344
## fou_13 0.60241067 5.403838e-02 -0.0799120619 -0.096168395 -0.011780224
## fou_14 -0.13676622 2.756649e-01 0.4745632143 -0.091453974 0.090723634
## fou_15 0.34484140 1.967171e-01 0.0602584510 -0.007548785 0.005578676
## fou_16 0.40233637 -3.894088e-01 -0.0159851823 0.232234229 0.251785426
## fou_17 0.25925887 1.536251e-01 0.4130865033 0.143999156 -0.097440174
## fou_18 0.55714681 1.888086e-01 0.1530265169 -0.061537750 -0.019979004
## fou_19 0.23142880 1.368423e-01 0.2478804627 -0.159501265 0.121958365
## fou_20 0.31693796 -3.755119e-01 0.1857883211 -0.135271339 0.169519071
## fou_21 0.43170388 -2.463715e-01 0.1276721351 0.159815656 0.140472896
## fou_22 0.24783983 2.256100e-01 0.2369248633 0.172132094 0.170989206
## fou_23 0.52009313 1.307413e-02 0.3256969698 0.085884752 -0.144884160
## fou_24 0.25590895 3.639108e-01 0.3579806375 -0.107728749 0.109511219
## fou_25 0.24317061 -2.451494e-01 0.0108382483 -0.027388540 0.233210481
## fou_26 0.42051226 -3.493462e-01 0.1500635101 0.060551198 0.216860733
## fou_27 0.31262026 1.946877e-02 0.3216531828 0.116375378 -0.150289327
## fou_28 0.44614240 1.153322e-01 0.2870505746 0.224639123 -0.043564232
## fou_29 0.33168445 2.280802e-01 0.2778300926 0.111935227 0.196856844
## fou_30 0.27346776 -4.290733e-01 0.0909610489 -0.252678972 0.276454198
## fou_31 0.38861155 -3.062153e-01 0.1345768346 -0.113141884 0.148084265
## fou_32 0.32464929 4.615968e-02 0.2732616185 0.213933175 0.023053990
## fou_33 0.47443907 -7.861116e-05 0.2427913104 0.306310937 -0.005061858
## fou_34 0.31539062 2.240853e-01 0.3545233103 0.137043139 0.051652977
## fou_35 0.29204051 -3.497455e-01 0.0566963983 -0.331411098 0.239917765
## fou_36 0.35604409 -3.322872e-01 0.1001470234 -0.060392312 0.125615894
## fou_37 0.36695766 -7.230124e-02 0.2097075962 0.211251030 0.012452726
## fou_38 0.42595276 2.832655e-02 0.3121526953 0.287754669 0.089170587
## fou_39 0.37231754 1.586715e-01 0.3247883488 0.205277761 0.050168486
## fou_40 0.25987864 -3.991988e-01 0.0435908726 -0.285672627 0.161000361
## fou_41 0.32095111 -3.405320e-01 0.0914073722 -0.104452749 0.150876760
## fou_42 0.35840879 -8.860169e-02 0.2425054330 0.175820932 0.099876962
## fou_43 0.44467529 -1.190243e-02 0.2390641536 0.314464434 0.057562685
## fou_44 0.37481185 1.141344e-01 0.3662226002 0.237767229 0.059268422
## fou_45 0.39035977 -2.661393e-01 -0.0996292498 -0.321886746 0.103651733
## fou_46 0.35420042 -4.941285e-02 -0.0588537997 -0.225731461 0.026525746
## fou_47 0.42579495 1.615082e-01 -0.1714408172 -0.020072628 -0.178424285
## fou_48 0.34941628 3.412338e-01 -0.1718788524 -0.064138243 -0.210092106
## fou_49 0.35537523 -3.391155e-01 -0.0480583449 -0.268429623 0.126487798
## fou_50 0.37426039 3.844150e-02 -0.1417111903 -0.048580189 -0.116415918
## fou_51 0.43321784 2.418431e-01 -0.1886157251 -0.090228286 -0.197589953
## fou_52 0.35453811 3.409579e-01 -0.1918510021 -0.083320532 -0.129481399
## fou_53 0.42820572 -2.354555e-01 -0.1428426535 -0.138019562 0.039911281
## fou_54 0.38577928 1.383506e-01 -0.0969555298 -0.195444708 -0.111986302
## fou_55 0.44376819 2.442853e-01 -0.1942280751 -0.148132365 -0.224884342
## fou_56 0.25233070 3.646830e-01 -0.1329736303 0.029203776 -0.119523411
## fou_57 0.45967328 -2.753365e-01 -0.0076349372 -0.239159064 -0.065151397
## fou_58 0.34415169 2.366203e-01 -0.1654894702 -0.268136964 -0.281732400
## fou_59 0.41539668 2.846738e-01 -0.2788643979 0.094382700 -0.233386324
## fou_60 0.31882722 3.389526e-01 -0.2012179931 0.122514728 0.079398366
## fou_61 0.54556257 -3.262915e-02 -0.1698936595 -0.194626357 -0.223407840
## fou_62 0.30401456 2.984146e-01 -0.1302286438 0.065214371 -0.167583832
## fou_63 0.48863974 2.460544e-01 -0.2566208624 0.058625554 -0.066774306
## fou_64 0.03398352 4.547847e-01 -0.0200340930 -0.152385668 0.266538901
## fou_65 0.50188952 -2.131200e-01 -0.0772634163 0.030329226 -0.345745156
## fou_66 0.23887970 3.970221e-01 0.0086992697 -0.254833472 -0.149885216
## fou_67 0.44113880 2.346315e-01 -0.3367839804 -0.096595512 0.266844842
## fou_68 0.29843457 1.548233e-01 -0.3164033368 0.139635797 0.427230382
## fou_69 0.67835787 7.809180e-02 0.0551113814 -0.339900038 -0.062788139
## fou_70 0.25243792 4.305550e-01 -0.3442686962 -0.071511650 0.414283663
## fou_71 0.40953596 9.237418e-02 -0.5148284166 0.317408173 0.251187658
## fou_72 -0.44553598 3.056135e-01 0.0056058158 0.286004741 0.197081262
## fou_73 0.56949180 1.156848e-02 -0.4693936379 0.281302908 0.028846568
## fou_74 -0.49977455 1.262194e-01 -0.3674627383 0.308950099 0.252325660
## fou_75 0.21208752 -3.618401e-01 -0.0334697842 0.463996329 -0.212290320
## fou_76 -0.62801597 1.341398e-02 -0.1647116860 0.251433900 -0.028186852
##
## $cor
## Dim.1 Dim.2 Dim.3 Dim.4 Dim.5
## fou_1 0.51308897 -3.325406e-02 -0.4554217415 -0.008136542 0.098278620
## fou_2 0.40339360 -3.386300e-01 -0.1618096419 0.290845462 -0.233603813
## fou_3 0.56166486 -4.262304e-02 -0.4584340103 0.135870923 -0.006262507
## fou_4 -0.14945267 -2.460082e-01 0.0008672515 0.114063915 -0.261061336
## fou_5 0.14262144 6.598985e-01 0.2151071236 -0.035466310 0.125405161
## fou_6 0.39105883 -3.855206e-01 0.0960059008 0.190132541 -0.148252850
## fou_7 0.18039382 2.354610e-01 0.6029209284 -0.297439947 -0.003902100
## fou_8 0.55963257 -1.704291e-02 -0.4692068478 0.110808057 0.033169533
## fou_9 -0.01195126 2.996149e-01 0.4831509253 -0.159837930 -0.149451458
## fou_10 0.36716213 -2.032773e-01 0.3674191049 0.138392925 -0.411474141
## fou_11 0.64671677 -1.313877e-02 0.0453676940 0.101652111 0.135855078
## fou_12 0.14168140 4.029960e-01 0.1762817485 -0.007265882 0.427223344
## fou_13 0.60241067 5.403838e-02 -0.0799120619 -0.096168395 -0.011780224
## fou_14 -0.13676622 2.756649e-01 0.4745632143 -0.091453974 0.090723634
## fou_15 0.34484140 1.967171e-01 0.0602584510 -0.007548785 0.005578676
## fou_16 0.40233637 -3.894088e-01 -0.0159851823 0.232234229 0.251785426
## fou_17 0.25925887 1.536251e-01 0.4130865033 0.143999156 -0.097440174
## fou_18 0.55714681 1.888086e-01 0.1530265169 -0.061537750 -0.019979004
## fou_19 0.23142880 1.368423e-01 0.2478804627 -0.159501265 0.121958365
## fou_20 0.31693796 -3.755119e-01 0.1857883211 -0.135271339 0.169519071
## fou_21 0.43170388 -2.463715e-01 0.1276721351 0.159815656 0.140472896
## fou_22 0.24783983 2.256100e-01 0.2369248633 0.172132094 0.170989206
## fou_23 0.52009313 1.307413e-02 0.3256969698 0.085884752 -0.144884160
## fou_24 0.25590895 3.639108e-01 0.3579806375 -0.107728749 0.109511219
## fou_25 0.24317061 -2.451494e-01 0.0108382483 -0.027388540 0.233210481
## fou_26 0.42051226 -3.493462e-01 0.1500635101 0.060551198 0.216860733
## fou_27 0.31262026 1.946877e-02 0.3216531828 0.116375378 -0.150289327
## fou_28 0.44614240 1.153322e-01 0.2870505746 0.224639123 -0.043564232
## fou_29 0.33168445 2.280802e-01 0.2778300926 0.111935227 0.196856844
## fou_30 0.27346776 -4.290733e-01 0.0909610489 -0.252678972 0.276454198
## fou_31 0.38861155 -3.062153e-01 0.1345768346 -0.113141884 0.148084265
## fou_32 0.32464929 4.615968e-02 0.2732616185 0.213933175 0.023053990
## fou_33 0.47443907 -7.861116e-05 0.2427913104 0.306310937 -0.005061858
## fou_34 0.31539062 2.240853e-01 0.3545233103 0.137043139 0.051652977
## fou_35 0.29204051 -3.497455e-01 0.0566963983 -0.331411098 0.239917765
## fou_36 0.35604409 -3.322872e-01 0.1001470234 -0.060392312 0.125615894
## fou_37 0.36695766 -7.230124e-02 0.2097075962 0.211251030 0.012452726
## fou_38 0.42595276 2.832655e-02 0.3121526953 0.287754669 0.089170587
## fou_39 0.37231754 1.586715e-01 0.3247883488 0.205277761 0.050168486
## fou_40 0.25987864 -3.991988e-01 0.0435908726 -0.285672627 0.161000361
## fou_41 0.32095111 -3.405320e-01 0.0914073722 -0.104452749 0.150876760
## fou_42 0.35840879 -8.860169e-02 0.2425054330 0.175820932 0.099876962
## fou_43 0.44467529 -1.190243e-02 0.2390641536 0.314464434 0.057562685
## fou_44 0.37481185 1.141344e-01 0.3662226002 0.237767229 0.059268422
## fou_45 0.39035977 -2.661393e-01 -0.0996292498 -0.321886746 0.103651733
## fou_46 0.35420042 -4.941285e-02 -0.0588537997 -0.225731461 0.026525746
## fou_47 0.42579495 1.615082e-01 -0.1714408172 -0.020072628 -0.178424285
## fou_48 0.34941628 3.412338e-01 -0.1718788524 -0.064138243 -0.210092106
## fou_49 0.35537523 -3.391155e-01 -0.0480583449 -0.268429623 0.126487798
## fou_50 0.37426039 3.844150e-02 -0.1417111903 -0.048580189 -0.116415918
## fou_51 0.43321784 2.418431e-01 -0.1886157251 -0.090228286 -0.197589953
## fou_52 0.35453811 3.409579e-01 -0.1918510021 -0.083320532 -0.129481399
## fou_53 0.42820572 -2.354555e-01 -0.1428426535 -0.138019562 0.039911281
## fou_54 0.38577928 1.383506e-01 -0.0969555298 -0.195444708 -0.111986302
## fou_55 0.44376819 2.442853e-01 -0.1942280751 -0.148132365 -0.224884342
## fou_56 0.25233070 3.646830e-01 -0.1329736303 0.029203776 -0.119523411
## fou_57 0.45967328 -2.753365e-01 -0.0076349372 -0.239159064 -0.065151397
## fou_58 0.34415169 2.366203e-01 -0.1654894702 -0.268136964 -0.281732400
## fou_59 0.41539668 2.846738e-01 -0.2788643979 0.094382700 -0.233386324
## fou_60 0.31882722 3.389526e-01 -0.2012179931 0.122514728 0.079398366
## fou_61 0.54556257 -3.262915e-02 -0.1698936595 -0.194626357 -0.223407840
## fou_62 0.30401456 2.984146e-01 -0.1302286438 0.065214371 -0.167583832
## fou_63 0.48863974 2.460544e-01 -0.2566208624 0.058625554 -0.066774306
## fou_64 0.03398352 4.547847e-01 -0.0200340930 -0.152385668 0.266538901
## fou_65 0.50188952 -2.131200e-01 -0.0772634163 0.030329226 -0.345745156
## fou_66 0.23887970 3.970221e-01 0.0086992697 -0.254833472 -0.149885216
## fou_67 0.44113880 2.346315e-01 -0.3367839804 -0.096595512 0.266844842
## fou_68 0.29843457 1.548233e-01 -0.3164033368 0.139635797 0.427230382
## fou_69 0.67835787 7.809180e-02 0.0551113814 -0.339900038 -0.062788139
## fou_70 0.25243792 4.305550e-01 -0.3442686962 -0.071511650 0.414283663
## fou_71 0.40953596 9.237418e-02 -0.5148284166 0.317408173 0.251187658
## fou_72 -0.44553598 3.056135e-01 0.0056058158 0.286004741 0.197081262
## fou_73 0.56949180 1.156848e-02 -0.4693936379 0.281302908 0.028846568
## fou_74 -0.49977455 1.262194e-01 -0.3674627383 0.308950099 0.252325660
## fou_75 0.21208752 -3.618401e-01 -0.0334697842 0.463996329 -0.212290320
## fou_76 -0.62801597 1.341398e-02 -0.1647116860 0.251433900 -0.028186852
##
## $cos2
## Dim.1 Dim.2 Dim.3 Dim.4 Dim.5
## fou_1 0.2632602921 1.105833e-03 2.074090e-01 6.620331e-05 9.658687e-03
## fou_2 0.1627263955 1.146703e-01 2.618236e-02 8.459108e-02 5.457074e-02
## fou_3 0.3154674099 1.816723e-03 2.101617e-01 1.846091e-02 3.921900e-05
## fou_4 0.0223361005 6.052005e-02 7.521252e-07 1.301058e-02 6.815302e-02
## fou_5 0.0203408760 4.354661e-01 4.627107e-02 1.257859e-03 1.572645e-02
## fou_6 0.1529270051 1.486262e-01 9.217133e-03 3.615038e-02 2.197891e-02
## fou_7 0.0325419292 5.544186e-02 3.635136e-01 8.847052e-02 1.522638e-05
## fou_8 0.3131886179 2.904606e-04 2.201551e-01 1.227843e-02 1.100218e-03
## fou_9 0.0001428326 8.976910e-02 2.334348e-01 2.554816e-02 2.233574e-02
## fou_10 0.1348080277 4.132168e-02 1.349968e-01 1.915260e-02 1.693110e-01
## fou_11 0.4182425780 1.726274e-04 2.058228e-03 1.033315e-02 1.845660e-02
## fou_12 0.0200736188 1.624058e-01 3.107525e-02 5.279305e-05 1.825198e-01
## fou_13 0.3628986161 2.920146e-03 6.385938e-03 9.248360e-03 1.387737e-04
## fou_14 0.0187049992 7.599112e-02 2.252102e-01 8.363829e-03 8.230778e-03
## fou_15 0.1189155900 3.869762e-02 3.631081e-03 5.698415e-05 3.112162e-05
## fou_16 0.1618745569 1.516392e-01 2.555261e-04 5.393274e-02 6.339590e-02
## fou_17 0.0672151629 2.360067e-02 1.706405e-01 2.073576e-02 9.494588e-03
## fou_18 0.3104125651 3.564868e-02 2.341711e-02 3.786895e-03 3.991606e-04
## fou_19 0.0535592874 1.872581e-02 6.144472e-02 2.544065e-02 1.487384e-02
## fou_20 0.1004496696 1.410092e-01 3.451730e-02 1.829834e-02 2.873672e-02
## fou_21 0.1863682422 6.069891e-02 1.630017e-02 2.554104e-02 1.973263e-02
## fou_22 0.0614245802 5.089989e-02 5.613339e-02 2.962946e-02 2.923731e-02
## fou_23 0.2704968610 1.709328e-04 1.060785e-01 7.376191e-03 2.099142e-02
## fou_24 0.0654893912 1.324311e-01 1.281501e-01 1.160548e-02 1.199271e-02
## fou_25 0.0591319447 6.009822e-02 1.174676e-04 7.501321e-04 5.438713e-02
## fou_26 0.1768305587 1.220428e-01 2.251906e-02 3.666448e-03 4.702858e-02
## fou_27 0.0977314281 3.790328e-04 1.034608e-01 1.354323e-02 2.258688e-02
## fou_28 0.1990430419 1.330151e-02 8.239803e-02 5.046274e-02 1.897842e-03
## fou_29 0.1100145739 5.202056e-02 7.718956e-02 1.252950e-02 3.875262e-02
## fou_30 0.0747846132 1.841039e-01 8.273912e-03 6.384666e-02 7.642692e-02
## fou_31 0.1510189354 9.376781e-02 1.811092e-02 1.280109e-02 2.192895e-02
## fou_32 0.1053971601 2.130716e-03 7.467191e-02 4.576740e-02 5.314865e-04
## fou_33 0.2250924316 6.179715e-09 5.894762e-02 9.382639e-02 2.562240e-05
## fou_34 0.0994712462 5.021422e-02 1.256868e-01 1.878082e-02 2.668030e-03
## fou_35 0.0852876603 1.223219e-01 3.214482e-03 1.098333e-01 5.756053e-02
## fou_36 0.1267673914 1.104148e-01 1.002943e-02 3.647231e-03 1.577935e-02
## fou_37 0.1346579224 5.227470e-03 4.397728e-02 4.462700e-02 1.550704e-04
## fou_38 0.1814357506 8.023936e-04 9.743931e-02 8.280275e-02 7.951394e-03
## fou_39 0.1386203505 2.517664e-02 1.054875e-01 4.213896e-02 2.516877e-03
## fou_40 0.0675369067 1.593597e-01 1.900164e-03 8.160885e-02 2.592112e-02
## fou_41 0.1030096121 1.159621e-01 8.355308e-03 1.091038e-02 2.276380e-02
## fou_42 0.1284568585 7.850259e-03 5.880889e-02 3.091300e-02 9.975408e-03
## fou_43 0.1977361154 1.416679e-04 5.715167e-02 9.888788e-02 3.313463e-03
## fou_44 0.1404839202 1.302667e-02 1.341190e-01 5.653326e-02 3.512746e-03
## fou_45 0.1523807501 7.083014e-02 9.925987e-03 1.036111e-01 1.074368e-02
## fou_46 0.1254579371 2.441629e-03 3.463770e-03 5.095469e-02 7.036152e-04
## fou_47 0.1813013422 2.608489e-02 2.939195e-02 4.029104e-04 3.183523e-02
## fou_48 0.1220917383 1.164405e-01 2.954234e-02 4.113714e-03 4.413869e-02
## fou_49 0.1262915524 1.149993e-01 2.309605e-03 7.205446e-02 1.599916e-02
## fou_50 0.1400708403 1.477749e-03 2.008206e-02 2.360035e-03 1.355267e-02
## fou_51 0.1876776959 5.848808e-02 3.557589e-02 8.141144e-03 3.904179e-02
## fou_52 0.1256972711 1.162523e-01 3.680681e-02 6.942311e-03 1.676543e-02
## fou_53 0.1833601382 5.543930e-02 2.040402e-02 1.904940e-02 1.592910e-03
## fou_54 0.1488256530 1.914088e-02 9.400375e-03 3.819863e-02 1.254093e-02
## fou_55 0.1969302032 5.967532e-02 3.772455e-02 2.194320e-02 5.057297e-02
## fou_56 0.0636707833 1.329937e-01 1.768199e-02 8.528605e-04 1.428585e-02
## fou_57 0.2112995242 7.581019e-02 5.829227e-05 5.719706e-02 4.244704e-03
## fou_58 0.1184403832 5.598916e-02 2.738676e-02 7.189743e-02 7.937315e-02
## fou_59 0.1725544027 8.103920e-02 7.776535e-02 8.908094e-03 5.446918e-02
## fou_60 0.1016507992 1.148888e-01 4.048868e-02 1.500986e-02 6.304100e-03
## fou_61 0.2976385179 1.064662e-03 2.886386e-02 3.787942e-02 4.991106e-02
## fou_62 0.0924248524 8.905130e-02 1.695950e-02 4.252914e-03 2.808434e-02
## fou_63 0.2387687922 6.054275e-02 6.585427e-02 3.436956e-03 4.458808e-03
## fou_64 0.0011548793 2.068291e-01 4.013649e-04 2.322139e-02 7.104299e-02
## fou_65 0.2518930858 4.542013e-02 5.969635e-03 9.198619e-04 1.195397e-01
## fou_66 0.0570635114 1.576265e-01 7.567729e-05 6.494010e-02 2.246558e-02
## fou_67 0.1946034427 5.505195e-02 1.134234e-01 9.330693e-03 7.120617e-02
## fou_68 0.0890631952 2.397026e-02 1.001111e-01 1.949816e-02 1.825258e-01
## fou_69 0.4601693953 6.098329e-03 3.037264e-03 1.155320e-01 3.942350e-03
## fou_70 0.0637249013 1.853776e-01 1.185209e-01 5.113916e-03 1.716310e-01
## fou_71 0.1677197060 8.532989e-03 2.650483e-01 1.007479e-01 6.309524e-02
## fou_72 0.1985023116 9.339958e-02 3.142517e-05 8.179871e-02 3.884102e-02
## fou_73 0.3243209119 1.338298e-04 2.203304e-01 7.913133e-02 8.321245e-04
## fou_74 0.2497746015 1.593134e-02 1.350289e-01 9.545016e-02 6.366824e-02
## fou_75 0.0449811183 1.309282e-01 1.120226e-03 2.152926e-01 4.506718e-02
## fou_76 0.3944040634 1.799350e-04 2.712994e-02 6.321901e-02 7.944986e-04
##
## $contrib
## Dim.1 Dim.2 Dim.3 Dim.4 Dim.5
## fou_1 2.246194193 2.141215e-02 4.225580e+00 0.002362154 0.3832699100
## fou_2 1.388417075 2.220351e+00 5.334179e-01 3.018234702 2.1654416433
## fou_3 2.691636701 3.517706e-02 4.281663e+00 0.658690615 0.0015562635
## fou_4 0.190576478 1.171845e+00 1.532318e-05 0.464221200 2.7044050658
## fou_5 0.173552788 8.431893e+00 9.426889e-01 0.044880785 0.6240472112
## fou_6 1.304806540 2.877836e+00 1.877823e-01 1.289856302 0.8721531042
## fou_7 0.277654833 1.073516e+00 7.405929e+00 3.156654247 0.0006042037
## fou_8 2.672193552 5.624164e-03 4.485259e+00 0.438097831 0.0436581503
## fou_9 0.001218679 1.738191e+00 4.755809e+00 0.911566004 0.8863126333
## fou_10 1.150211476 8.001081e-01 2.750314e+00 0.683370458 6.7184907394
## fou_11 3.568536837 3.342570e-03 4.193264e-02 0.368689888 0.7323832174
## fou_12 0.171272491 3.144649e+00 6.331017e-01 0.001883671 7.2426346726
## fou_13 3.096330091 5.654254e-02 1.301019e-01 0.329984208 0.0055067290
## fou_14 0.159595130 1.471410e+00 4.588249e+00 0.298423888 0.3266085178
## fou_15 1.014613733 7.492988e-01 7.397666e-02 0.002033211 0.0012349486
## fou_16 1.381149003 2.936177e+00 5.205878e-03 1.924335917 2.5156360284
## fou_17 0.573494420 4.569778e-01 3.476489e+00 0.739857899 0.3767582155
## fou_18 2.648507664 6.902623e-01 4.770811e-01 0.135117515 0.0158392389
## fou_19 0.456979514 3.625863e-01 1.251824e+00 0.907729997 0.5902144174
## fou_20 0.857058475 2.730349e+00 7.032272e-01 0.652889978 1.1403121560
## fou_21 1.590134464 1.175308e+00 3.320864e-01 0.911311954 0.7830179182
## fou_22 0.524087907 9.855702e-01 1.143616e+00 1.057187765 1.1601763884
## fou_23 2.307938177 3.309758e-03 2.161157e+00 0.263184652 0.8329682380
## fou_24 0.558769760 2.564251e+00 2.610826e+00 0.414087058 0.4758870166
## fou_25 0.504526641 1.163677e+00 2.393189e-03 0.026764935 2.1581556182
## fou_26 1.508756869 2.363103e+00 4.587848e-01 0.130819926 1.8661582684
## fou_27 0.833865846 7.339180e-03 2.107825e+00 0.483226379 0.8962783572
## fou_28 1.698278618 2.575561e-01 1.678710e+00 1.800525249 0.0753089795
## fou_29 0.938668323 1.007270e+00 1.572597e+00 0.447056066 1.5377568400
## fou_30 0.638078620 3.564788e+00 1.685659e-01 2.278067709 3.0327248275
## fou_31 1.288526474 1.815618e+00 3.689771e-01 0.456746510 0.8701706993
## fou_32 0.899271543 4.125687e-02 1.521304e+00 1.632994412 0.0210901098
## fou_33 1.920537688 1.196573e-07 1.200950e+00 3.347753188 0.0010167320
## fou_34 0.848710354 9.722936e-01 2.560639e+00 0.670105253 0.1058710795
## fou_35 0.727692908 2.368508e+00 6.548921e-02 3.918885009 2.2840806942
## fou_36 1.081607014 2.137952e+00 2.043313e-01 0.130134286 0.6261462955
## fou_37 1.148930745 1.012191e-01 8.959569e-01 1.592304404 0.0061534048
## fou_38 1.548049371 1.553668e-02 1.985148e+00 2.954426454 0.3155221677
## fou_39 1.182739045 4.874932e-01 2.149115e+00 1.503530449 0.0998731214
## fou_40 0.576239609 3.085668e+00 3.871239e-02 2.911827760 1.0285853310
## fou_41 0.878900463 2.245363e+00 1.702242e-01 0.389285448 0.9032985761
## fou_42 1.096021915 1.520039e-01 1.198124e+00 1.102984930 0.3958378107
## fou_43 1.687127636 2.743104e-03 1.164361e+00 3.528348640 0.1314827295
## fou_44 1.198639428 2.522343e-01 2.732430e+00 2.017123165 0.1393905575
## fou_45 1.300145773 1.371478e+00 2.022239e-01 3.696873703 0.4263239821
## fou_46 1.070434464 4.727706e-02 7.056800e-02 1.818078406 0.0279204127
## fou_47 1.546902569 5.050795e-01 5.988075e-01 0.014375961 1.2632652772
## fou_48 1.041713323 2.254627e+00 6.018714e-01 0.146778533 1.7514836874
## fou_49 1.077547052 2.226722e+00 4.705399e-02 2.570924415 0.6348686661
## fou_50 1.195114940 2.861352e-02 4.091354e-01 0.084206736 0.5377883206
## fou_51 1.601307009 1.132500e+00 7.247940e-01 0.290478399 1.5492315851
## fou_52 1.072476514 2.250983e+00 7.498717e-01 0.247703700 0.6652752916
## fou_53 1.564468664 1.073466e+00 4.156948e-01 0.679688170 0.0632088604
## fou_54 1.269812910 3.706233e-01 1.915155e-01 1.362938490 0.4976413203
## fou_55 1.680251417 1.155488e+00 7.685689e-01 0.782939746 2.0068044850
## fou_56 0.543253001 2.575145e+00 3.602383e-01 0.030430315 0.5668818942
## fou_57 1.802853596 1.467906e+00 1.187599e-03 2.040807842 0.1684356778
## fou_58 1.010559165 1.084113e+00 5.579555e-01 2.565321491 3.1496349273
## fou_59 1.472271774 1.569155e+00 1.584327e+00 0.317843413 2.1614113921
## fou_60 0.867306775 2.224583e+00 8.248832e-01 0.535556168 0.2501553265
## fou_61 2.539516709 2.061495e-02 5.880485e-01 1.351548795 1.9805392185
## fou_62 0.788588986 1.724293e+00 3.455189e-01 0.151745229 1.1144250397
## fou_63 2.037227378 1.172284e+00 1.341661e+00 0.122631585 0.1769315942
## fou_64 0.009853682 4.004815e+00 8.177079e-03 0.828546082 2.8190828126
## fou_65 2.149206711 8.794661e-01 1.216205e-01 0.032820944 4.7434992672
## fou_66 0.486878317 3.052109e+00 1.541787e-03 2.317081811 0.8914648482
## fou_67 1.660398989 1.065966e+00 2.310796e+00 0.332921865 2.8255581837
## fou_68 0.759906594 4.641340e-01 2.039581e+00 0.695699936 7.2428733020
## fou_69 3.926265581 1.180814e-01 6.187873e-02 4.122217017 0.1564378532
## fou_70 0.543714748 3.589451e+00 2.414648e+00 0.182466032 6.8105509260
## fou_71 1.431021088 1.652235e-01 5.399877e+00 3.594716423 2.5037053778
## fou_72 1.693664989 1.808488e+00 6.402306e-04 2.918602096 1.5412649278
## fou_73 2.767176710 2.591335e-03 4.488831e+00 2.823428974 0.0330198384
## fou_74 2.131131342 3.084772e-01 2.750967e+00 3.405689913 2.5264427676
## fou_75 0.383788705 2.535152e+00 2.282257e-02 7.681703017 1.7883273184
## fou_76 3.365141435 3.484066e-03 5.527231e-01 2.255672722 0.0315267908
On retrouve les 7 patterns qui contribuent le plus à la construction du plan. Ces patterns ont des valeurs extrémes par rapport à la moyenne des individus. Ils appartiennt tous à la class 0 sauf le patterns 644 qui appartient à la class 4 Figure 2 - Décomposition des contributions des patterns
Figure 3 - Graphe des patterns (ACP) Les patterns libellés sont ceux ayant la plus grande contribution é la construction du plan. Les patterns sont colorés selon leur appartenance aux modalités de la variable CLASS.
On a nos 10 classes.
par(mar = c(4.1, 4.1, 1.1, 2.1))
fviz_pca_var(res1.pca, axes = 1:2,select.var = list(contrib = 10),jitter = list(what = "both",width = 0.1, height = 0.15) ,col.var = "contrib")
## Warning: argument jitter is deprecated; please use repel instead.
Figure 4.1 - Graphe des variables (ACP) Les variables libellées sont celles les mieux représentées sur le plan 1:2. On a les variables fou_5, (fou_11,fou_73), (fou69,fou_13), (fou_11,fou_3) fou_76, fou_18, fou_16 et qui contribuent le plus et les varaibles fou_11 fou_69, fou_3, fou_73 et fou_13 sont très corrélées.
par(mar = c(4.1, 4.1, 1.1, 2.1))
fviz_pca_var(res1.pca, axes = 3:4,select.var = list(contrib =11),jitter = list(what = "both",width = 0.1, height = 0.15) ,col.var = "contrib")
## Warning: argument jitter is deprecated; please use repel instead.
Figure 4.2 - Graphe des variables (ACP) Les variables libellées sont celles les mieux représentées sur le plan 3:4. On a les variables fou_7, (fou_71, fou_73), (fou_14 fou_9),( fou_8,fou_3) fou_74, fou_75 fou_1 fou_17 contribuent le plus et les varaibles (fou_71, fou_73), (fou_14 fou_9),( fou_8,fou_3) sont très corrélées.
Les éléments les plus intéressants qui ressortent de l’ACP nous montre que :
la première composante est essentiellement liée à fou69,fou_11, fou_76, fou_13, fou_73, (fou_3, fou_8 et fou_18)
la deuxième à fou_5 (fou_64 fou_70 et fou_30)
la troisème à fou_7 fou_71, fou_9 (et fou_14) fou_8
La quatrième fou_75, fou_69 (fou_35)
Enfin, sur chacune de ces trois composantes, les modalités de class ont des coordonnées significativement différentes, respectivement pour les classe 0 (-5.501 3.136 1.813), 1 (-0.941 -1.539 -1.109), 2 (-1.320 -1.240 -1.832), 3 (1.905 -2.113 0.766 ), 4 (2.534 0.304 0.954 ), 5 (2.021 -0.044 3.212 ), 6 (2.278 0.830 -0.583 ), 7 (2.100 0.710 -2.918 ), 8 (-5.200 -0.685 -0.027) et 9 (2.124 0.641 -0.276). Les cos2 des deux premières composantes sont assez élevés par exemple pour la classe 0 ( 0.631 et 0.205), mais faible pour la troisième.
Nous pouvons conclure pour notre prédiction que les variables quantitatives sont fou69,fou_11, fou_76, fou_13, fou_73, fou_3, fou_8 et fou_18,fou_5 fou_64 fou_70 et fou_30 et fou_7 fou_71, fou_9 et fou_14 les plus susceptibles de contribuer à la qualité d’un modèle de prédiction avec les correlations.
Nous pouvons conclure pour notre prédiction que les variables quantitatives sont fou69,fou_11, fou_76,fou_5 ,fou_7 fou_71,fou_9 fou_8 fou_17 fou_74 fou_75 les plus susceptibles de contribuer à la qualité d’un modèle de prédiction.
On fait un test de correlation de Pearson et Spearman pour verifier si corrélation.
### Correlation de Pearso
cor.test(data_fou$fou_5,data_fou$fou_69)
##
## Pearson's product-moment correlation
##
## data: data_fou$fou_5 and data_fou$fou_69
## t = 5.7751, df = 1498, p-value = 9.338e-09
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.0976934 0.1967212
## sample estimates:
## cor
## 0.1475771
Le p-value = 9.338e-09, il n’y a pas de lien en fou_69 et fou_5
### Correlation de Spearman
cor.test(data_fou$fou_13,data_fou$fou_69,method="spearman")
##
## Spearman's rank correlation rho
##
## data: data_fou$fou_13 and data_fou$fou_69
## S = 280925097, p-value < 2.2e-16
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
## rho
## 0.5005774
Le p-value <2.2e-16, il y a lien en fou_69 et fou_13
Distribution de l’inertie
L’inertie des axes factoriels indique d’une part si les variables sont structurées et suggére d’autre part le nombre judicieux de composantes principales à étudier. Les 2 premiers axes de l’ACP expriment 37.6% de l’inertie totale du jeu de données. Les 12 premieres dimensins représente 82% de l’information initiale.
par(mar = c(2.6, 4.1, 1.1, 2.1))
#barplot(res.pca$eig[,2], names.arg = 1:nrow(res.pca$eig))
fviz_eig(res2.pca, addlabels = TRUE, ylim = c(0, 24))
Pour la réduction du volume de données, on impose la conservation d’une qualité d’approximation des données initiales et avec 12 dimensions on a 82% des données. Pour le choix du nombre d’axes en ACP, les différences entre valeurs propres successives deviennent faibles après la 4eme dimension. On va alors garder 5 axes en ACP car les axes principaux suivants ne sont plus significatifs. On a 63.114% des données initiales.
Dim.1 Dim.2 Dim.3 Dim.4 Dim.5 Dim.6 Dim.7
22.542 15.064 11.809 7.831 5.868 4.967 3.515 22.542 37.606 49.415 57.246 63.114 68.080 71.596
#sort(res2.pca$var$contrib,decreasing = TRUE)
res2.pca$var$contrib
## Dim.1 Dim.2 Dim.3 Dim.4 Dim.5
## fac_1 1.712524e-01 7.347311e-01 3.029941e-01 4.628172e-01 3.773641e-01
## fac_2 7.465527e-01 9.995390e-01 5.417192e-01 1.395476e-02 1.877141e-02
## fac_3 1.072041e+00 5.426223e-02 1.166232e+00 1.173928e-01 1.209251e-01
## fac_4 4.319097e-03 2.183938e+00 2.659159e-01 9.099135e-01 1.566801e-02
## fac_5 9.291398e-06 1.137871e+00 5.723413e-02 2.902953e-01 1.930614e-01
## fac_6 6.962893e-01 3.338184e-02 6.227512e-01 1.987580e-01 1.306821e-01
## fac_7 1.242005e+00 1.738007e-01 9.520759e-01 6.520129e-02 5.965167e-02
## fac_8 3.730053e-01 4.765511e-01 3.052807e-01 2.901963e-01 7.172970e-01
## fac_9 1.112549e-01 1.937503e+00 2.077508e-01 2.505196e-01 1.895676e-01
## fac_10 1.534766e-02 3.909943e-01 1.024205e+00 2.051011e-01 4.373175e-01
## fac_11 1.977301e-01 4.623712e-01 1.260109e-02 1.520399e-03 1.120252e-02
## fac_12 6.427345e-01 9.413095e-04 6.442078e-01 4.735852e-01 1.787986e-02
## fac_13 8.093122e-01 6.293330e-02 6.033055e-01 8.194107e-02 1.069439e+00
## fac_14 1.667006e-02 1.863745e+00 8.165219e-02 2.968823e-01 6.857070e-02
## fac_15 3.613152e-02 5.179779e-01 6.895176e-01 1.581185e+00 8.674526e-01
## fac_16 2.669010e-02 2.110827e+00 2.053700e-01 6.419911e-01 7.367219e-04
## fac_17 2.795513e-01 3.721930e-01 5.271209e-01 6.661960e-01 7.461100e-02
## fac_18 2.860751e-01 2.779087e-01 1.226392e+00 1.915349e+00 1.081892e-01
## fac_19 1.262724e+00 2.000958e-01 4.223422e-01 1.800263e-01 2.461568e-01
## fac_20 3.657683e-03 5.566606e-02 2.358604e-01 9.926501e-01 1.731740e+00
## fac_21 1.221720e-01 1.237028e+00 4.100226e-03 6.681433e-01 6.633190e-01
## fac_22 9.144457e-01 1.783487e-01 2.725597e-01 1.169939e-01 6.412518e-01
## fac_23 5.820102e-01 1.907700e-01 8.963747e-03 1.026670e-03 2.707430e-03
## fac_24 3.715687e-01 6.989500e-02 5.196216e-01 2.714494e-01 4.061393e-01
## fac_25 3.321358e-02 1.931641e-02 8.434569e-01 3.709609e-01 3.566904e-02
## fac_26 1.258790e+00 3.400240e-01 2.872632e-01 7.455862e-03 2.708996e-01
## fac_27 4.745826e-01 1.015174e-06 9.397067e-01 2.321690e-02 3.525975e-02
## fac_28 6.406304e-02 4.896177e-01 7.005920e-01 1.495430e+00 2.968716e-01
## fac_29 8.878887e-01 4.784257e-01 1.094716e+00 9.578758e-02 7.581494e-02
## fac_30 2.222029e-01 1.325642e-01 1.361761e+00 1.037135e+00 2.569397e-01
## fac_31 4.354993e-01 6.720246e-02 2.131560e+00 1.934663e-02 7.432723e-02
## fac_32 3.827294e-03 3.436990e-02 7.069582e-01 3.240054e-01 3.579253e-01
## fac_33 1.086791e+00 1.947509e-01 3.416016e-01 3.260995e-03 1.563177e+00
## fac_34 1.922421e-02 9.623328e-01 5.524229e-01 8.288943e-01 4.522777e-01
## fac_35 5.250668e-03 3.673613e-02 3.120736e-01 6.434481e-03 3.187604e-02
## fac_36 4.728111e-01 6.017378e-01 2.433513e-01 1.018343e-01 1.175650e+00
## fac_37 2.924109e-01 1.079471e-01 1.478833e+00 2.244447e-01 9.847287e-01
## fac_38 7.474534e-01 1.170224e+00 2.394914e-01 5.997044e-02 1.595281e-01
## fac_39 1.048492e+00 2.523279e-01 5.607439e-01 5.046392e-01 4.935464e-02
## fac_40 7.725825e-02 1.967532e+00 5.045962e-01 6.670737e-01 9.315006e-02
## fac_41 5.231376e-01 7.760500e-02 1.338947e+00 3.133692e-03 8.066875e-01
## fac_42 2.941948e-02 7.038162e-03 6.255816e-01 1.576035e+00 1.115023e+00
## fac_43 1.092276e+00 6.346998e-02 1.166808e-01 7.730085e-03 7.435653e-01
## fac_44 3.548233e-01 3.767879e-01 1.541460e-01 2.906713e-01 4.293421e-03
## fac_45 2.901016e-01 1.618974e+00 1.000474e-02 1.500303e-01 8.530236e-01
## fac_46 2.277891e-01 4.084117e-01 2.832895e-02 1.281021e+00 4.778885e-02
## fac_47 5.776551e-01 9.541083e-01 2.223994e-01 1.611287e-03 2.245191e-02
## fac_48 6.885714e-01 1.136439e-01 8.136802e-01 3.574509e-01 3.638923e-01
## fac_49 6.410136e-02 3.846223e-01 7.702551e-01 7.819286e-01 7.122249e-01
## fac_50 8.047507e-01 1.128557e+00 3.093032e-01 4.914348e-02 1.622981e-01
## fac_51 9.560746e-01 2.686648e-01 6.504688e-01 5.266527e-01 6.714753e-02
## fac_52 3.189069e-01 1.152387e+00 7.728044e-01 5.206790e-01 4.823209e-01
## fac_53 6.656514e-01 2.290298e-01 1.444605e+00 7.675791e-06 4.525097e-01
## fac_54 6.952510e-02 6.731841e-02 1.350710e+00 1.193070e+00 4.708273e-01
## fac_55 1.508756e+00 9.824349e-03 3.481168e-01 4.553094e-02 3.173512e-01
## fac_56 6.932981e-01 2.888977e-01 4.333876e-02 6.380378e-06 2.134014e-01
## fac_57 1.086791e+00 1.947509e-01 3.416016e-01 3.260995e-03 1.563177e+00
## fac_58 3.869870e-02 7.429231e-01 6.226446e-02 1.067719e+00 1.616776e-01
## fac_59 4.842156e-01 9.853075e-01 2.462935e-01 3.820341e-03 2.836287e-02
## fac_60 1.781362e-02 8.603361e-04 1.010452e-01 2.428400e-02 5.569729e-05
## fac_61 5.478397e-01 1.361006e-01 3.718981e-02 2.203723e-01 1.637107e-01
## fac_62 3.246848e-01 6.096741e-01 2.653269e-01 1.157936e-01 7.923013e-01
## fac_63 8.867780e-01 1.660355e-01 9.709336e-01 4.462557e-01 4.962372e-03
## fac_64 7.712867e-01 1.090462e-01 8.004686e-01 1.853297e-01 1.043307e+00
## fac_65 9.039195e-01 5.397215e-01 9.316641e-01 1.754326e-01 4.263452e-02
## fac_66 1.363517e-01 3.262328e-01 1.793290e+00 1.117389e+00 3.247493e-02
## fac_67 1.478787e+00 3.078483e-03 3.331319e-01 1.007499e-01 3.007214e-01
## fac_68 1.503310e-02 7.383479e-02 4.931811e-01 4.984928e-01 3.538601e-01
## fac_69 8.409288e-01 6.006248e-01 6.674787e-01 4.798196e-02 2.541347e-01
## fac_70 9.228200e-02 5.944435e-01 8.199865e-01 7.647101e-01 3.620999e-01
## fac_71 3.163465e-01 3.583946e-01 9.910550e-02 2.695628e-02 2.255128e-02
## fac_72 3.529176e-02 1.120848e+00 2.897349e-01 2.579736e-02 2.239881e+00
## fac_73 4.556527e-01 4.002781e-01 1.232764e-01 2.658556e-01 5.966293e-03
## fac_74 7.836299e-03 1.116598e+00 6.856612e-02 6.684061e-01 9.562234e-01
## fac_75 2.216449e-02 6.709347e-01 1.761193e-01 1.518474e+00 1.490373e+00
## fac_76 5.007331e-02 1.718370e+00 5.131090e-01 9.998310e-01 1.519349e-01
## fac_77 7.321126e-01 1.015789e-03 4.779653e-01 9.962187e-01 7.587060e-02
## fac_78 6.211446e-01 1.266659e-01 7.912167e-01 1.628112e+00 8.467828e-02
## fac_79 6.784997e-01 4.727105e-01 1.662307e-02 1.201947e+00 1.145446e-01
## fac_80 2.285015e-01 3.075954e-01 8.590151e-03 6.168616e-01 1.488778e+00
## fac_81 3.120165e-01 1.529095e+00 8.105252e-03 1.321746e-01 1.002351e+00
## fac_82 1.127176e+00 4.502156e-02 9.943794e-03 5.752300e-01 2.795171e-01
## fac_83 1.245519e-01 6.330172e-01 6.061767e-03 7.430176e-04 2.253547e-03
## fac_84 1.204334e+00 1.376855e-02 3.911868e-02 5.747668e-01 4.731079e-02
## fac_85 1.134750e-01 7.338079e-02 6.118527e-01 1.661053e-01 1.572894e+00
## fac_86 7.660595e-01 1.810204e-01 7.178158e-01 6.735341e-01 3.445939e-01
## fac_87 7.282704e-02 5.991028e-01 4.143946e-01 1.354476e+00 1.157561e+00
## fac_88 2.708520e-02 1.868172e+00 1.972098e-01 1.183405e+00 5.790700e-02
## fac_89 6.630434e-01 5.176252e-02 2.354271e-01 8.016451e-01 3.026555e-01
## fac_90 2.371208e-01 2.304136e-01 1.280425e+00 2.059519e+00 1.066538e-01
## fac_91 1.043959e+00 3.138640e-01 1.425145e-01 5.823089e-01 2.431741e-01
## fac_92 4.186016e-02 3.445110e-01 1.284628e-01 6.562100e-01 1.278836e-01
## fac_93 9.276012e-01 5.501912e-01 1.653033e-01 5.286538e-07 1.382134e+00
## fac_94 1.798803e-01 2.615038e-01 8.055202e-01 4.086131e-03 1.023219e+00
## fac_95 3.597129e-01 1.512721e-02 1.785631e-02 6.590814e-04 7.751359e-02
## fac_96 2.760533e-01 8.717709e-02 3.812721e-01 7.653415e-02 1.871251e+00
## fac_97 7.463352e-01 1.010741e+00 1.739144e-01 3.814923e-01 7.702108e-02
## fac_98 3.851474e-01 1.072183e+00 2.191518e-01 4.357847e-01 1.124568e-01
## fac_99 2.408364e-01 3.998081e-01 1.536261e-03 5.859308e-01 2.483733e+00
## fac_100 3.748555e-02 2.070251e+00 1.780922e-01 9.853838e-01 1.217796e-01
## fac_101 1.004289e-01 2.679083e-01 8.520328e-01 4.446513e-02 1.342106e+00
## fac_102 2.098053e-01 1.800084e-01 1.249526e+00 2.011471e+00 1.962585e-01
## fac_103 4.670174e-01 4.097238e-01 1.201449e-01 1.681017e+00 6.838307e-01
## fac_104 7.173262e-01 2.608918e-01 2.791082e-01 6.263593e-02 3.229049e-02
## fac_105 9.684554e-01 4.759903e-01 2.167604e-01 1.338597e-03 1.321997e+00
## fac_106 5.146291e-01 7.456904e-01 2.568780e-01 8.368376e-02 5.721528e-01
## fac_107 9.641210e-03 1.030893e+00 3.590959e-01 1.213199e-02 1.083102e-01
## fac_108 9.428465e-01 4.047768e-02 2.296638e-01 3.276415e-01 1.261048e+00
## fac_109 1.302677e+00 1.272399e-01 2.078985e-01 4.136832e-01 3.527485e-04
## fac_110 9.011029e-01 4.140742e-03 1.461293e-01 7.503765e-03 5.647715e-01
## fac_111 1.068083e+00 3.208761e-02 1.240587e+00 4.802940e-02 1.903200e-01
## fac_112 6.766615e-01 1.241482e-01 4.049367e-01 1.720323e-02 6.827949e-01
## fac_113 9.211709e-01 2.879631e-01 1.004196e+00 3.198095e-01 5.661426e-02
## fac_114 9.236382e-02 6.738693e-04 2.320761e-01 1.820865e-02 2.406200e-01
## fac_115 9.464169e-01 1.828478e-01 2.886450e-03 1.430019e+00 3.743685e-01
## fac_116 6.506880e-02 6.213872e-02 4.333491e-04 1.013168e+00 4.035464e-01
## fac_117 7.449199e-01 9.957026e-02 6.599475e-01 1.015219e-01 1.167514e+00
## fac_118 3.859760e-03 7.295656e-01 6.437659e-01 2.992178e-01 4.512917e-01
## fac_119 6.997527e-01 4.995234e-02 3.728584e-01 7.998520e-03 2.067567e-01
## fac_120 7.428968e-01 4.047509e-01 4.171350e-03 1.746252e-01 1.577308e+00
## fac_121 8.590690e-02 3.649439e-02 5.414945e-01 6.291152e-02 4.531257e-01
## fac_122 8.245162e-02 3.761669e-01 1.819385e-01 1.959402e-02 6.322443e-01
## fac_123 9.086132e-01 4.011614e-02 1.400239e+00 7.021422e-02 1.390058e-01
## fac_124 6.074398e-01 2.726779e-01 8.391924e-01 3.412252e-01 1.025004e+00
## fac_125 9.444561e-01 3.272757e-01 9.694046e-01 3.245072e-01 2.242397e-02
## fac_126 1.991366e-03 3.009963e-01 1.362078e+00 1.209093e+00 2.817610e-02
## fac_127 8.170781e-01 2.352031e-01 9.359020e-03 1.698998e+00 3.263631e-01
## fac_128 9.092008e-05 2.494298e-01 1.858125e-01 1.125294e+00 2.980269e-01
## fac_129 4.828740e-01 1.306751e+00 4.088218e-01 1.426214e-01 9.634531e-03
## fac_130 9.900556e-02 6.563976e-01 9.470065e-03 2.909585e-02 1.985345e-01
## fac_131 4.988581e-02 5.105538e-02 1.564221e-01 7.088855e-02 3.245819e-01
## fac_132 5.951276e-01 2.223731e-01 7.437097e-01 2.599166e-01 2.466933e-01
## fac_133 1.849902e-01 9.203454e-01 4.003382e-01 4.549111e-02 8.430260e-01
## fac_134 2.912661e-01 9.824692e-01 3.316733e-01 1.527089e-02 2.843389e-01
## fac_135 1.067904e+00 3.208280e-02 1.240807e+00 4.811114e-02 1.902960e-01
## fac_136 9.648168e-02 2.033849e+00 9.693323e-02 8.861576e-01 1.226993e-01
## fac_137 8.362312e-01 2.967777e-02 3.545361e-01 9.648028e-01 1.595700e-01
## fac_138 3.099471e-04 2.096180e-01 1.314787e+00 2.032772e+00 3.521453e-02
## fac_139 3.134950e-01 6.894285e-01 6.394348e-01 1.300208e+00 2.821281e-01
## fac_140 4.268201e-01 5.113964e-01 2.044941e-01 4.387286e-01 1.287307e+00
## fac_141 4.788072e-01 8.920957e-01 7.486908e-01 3.254648e-01 5.313398e-01
## fac_142 8.488180e-02 4.390963e-01 1.889169e-06 4.367089e-03 1.047590e-01
## fac_143 1.402713e-01 4.556065e-01 1.775712e-02 7.577462e-04 4.997716e-03
## fac_144 6.273819e-01 5.095103e-01 4.667796e-02 1.454626e-01 1.369686e+00
## fac_145 7.341904e-01 2.841123e-02 1.244765e-01 7.236398e-01 1.144071e-02
## fac_146 1.017805e+00 1.114733e-01 5.985905e-01 1.226189e-01 1.793974e-01
## fac_147 1.035899e+00 7.984771e-03 1.195115e+00 9.245779e-03 2.540916e-01
## fac_148 3.734766e-02 2.070111e+00 1.787495e-01 9.846754e-01 1.218557e-01
## fac_149 8.438729e-01 1.012565e-01 7.979572e-01 4.187923e-01 3.634660e-03
## fac_150 3.302158e-02 3.343320e-01 1.725768e+00 1.135746e+00 5.474988e-04
## fac_151 8.304360e-01 4.472443e-01 4.443836e-01 8.625395e-01 4.074490e-02
## fac_152 2.627773e-01 4.694408e-01 7.315241e-01 3.258773e-02 2.783192e-02
## fac_153 5.777246e-01 5.792560e-01 7.257186e-01 3.275304e-01 8.848428e-01
## fac_154 6.682258e-01 5.296197e-01 3.352990e-01 7.881239e-03 6.601003e-01
## fac_155 5.040684e-01 3.519586e-01 1.956253e-02 2.962052e-04 2.560987e-03
## fac_156 3.406156e-03 7.264949e-02 5.193087e-01 2.943198e-03 1.363348e+00
## fac_157 9.815269e-01 1.454964e-01 6.090710e-01 1.111394e-01 1.115887e-01
## fac_158 7.355913e-02 2.366951e-01 4.699417e-02 1.056256e+00 8.206183e-01
## fac_159 1.660577e-01 2.565910e-01 3.661303e-01 1.275024e+00 1.173074e+00
## fac_160 1.939776e-01 1.321921e+00 7.222291e-01 7.521855e-01 4.510625e-01
## fac_161 7.799592e-01 3.395181e-03 7.254203e-01 8.588643e-01 8.637798e-03
## fac_162 2.530953e-01 2.680855e-01 1.331865e+00 1.315993e+00 3.479412e-05
## fac_163 7.338843e-02 7.384851e-01 2.076186e-02 2.381743e+00 3.173677e-02
## fac_164 1.742519e-04 1.505188e-01 1.502325e-01 1.026885e+00 1.712033e+00
## fac_165 1.139894e+00 2.773365e-03 6.313580e-01 2.670908e-02 1.382537e+00
## fac_166 1.103170e+00 9.695805e-02 6.671903e-03 3.921815e-01 2.673913e-01
## fac_167 5.250668e-03 3.673613e-02 3.120736e-01 6.434481e-03 3.187604e-02
## fac_168 1.571146e-02 1.414968e-01 2.361812e-01 4.486670e-04 6.136565e-01
## fac_169 3.118055e-01 5.570042e-01 7.295761e-02 7.836204e-01 1.364682e-03
## fac_170 2.286995e-02 1.383043e-01 3.273854e-03 9.566613e-01 9.208963e-01
## fac_171 7.473008e-01 2.844797e-01 6.707873e-01 8.093558e-01 4.592758e-02
## fac_172 5.278361e-01 1.450753e-01 3.906927e-01 8.911630e-02 9.002624e-01
## fac_173 9.013917e-01 3.523281e-01 4.895262e-01 7.293173e-01 1.724772e-02
## fac_174 7.484586e-02 2.994230e-01 1.799001e+00 1.448994e+00 6.035808e-02
## fac_175 2.821860e-02 7.749814e-01 1.161173e-02 2.726164e+00 2.061732e-01
## fac_176 1.914448e-03 6.129244e-03 7.220435e-01 3.229223e-01 6.567624e-01
## fac_177 1.078564e+00 7.739669e-02 7.315847e-01 2.673280e-02 1.106487e+00
## fac_178 1.578144e-01 3.358864e-03 2.056963e-01 4.924258e-02 2.426940e-02
## fac_179 4.922549e-02 1.074944e-01 2.531972e-01 6.705691e-03 1.880685e-02
## fac_180 7.400909e-02 1.077745e+00 9.416673e-02 1.482317e-02 2.540947e+00
## fac_181 1.394167e+00 3.290276e-02 4.743871e-01 2.006280e-01 1.661612e-01
## fac_182 2.346429e-01 8.282460e-01 1.979002e-01 4.867259e-02 1.077496e-01
## fac_183 1.163626e+00 6.875514e-02 1.021433e+00 1.199661e-01 8.539677e-02
## fac_184 8.027457e-01 4.387026e-01 1.669889e-01 2.431782e-04 6.665772e-01
## fac_185 8.375943e-01 5.839287e-01 9.275661e-01 1.612799e-01 5.566171e-02
## fac_186 8.517244e-01 7.360669e-03 4.071172e-01 6.744594e-02 1.603673e-01
## fac_187 8.170781e-01 2.352031e-01 9.359020e-03 1.698998e+00 3.263631e-01
## fac_188 4.393420e-01 5.529072e-01 2.463465e-01 3.183327e-04 2.018494e-01
## fac_189 3.432963e-03 1.941913e+00 2.129295e-01 6.113978e-01 2.827627e-01
## fac_190 7.862123e-02 2.232492e-01 9.705264e-01 1.800608e-01 3.432542e-01
## fac_191 5.667749e-01 2.042834e-02 2.924181e-01 1.373942e-02 2.190599e-01
## fac_192 1.401694e-01 1.022853e+00 1.173332e-01 7.287827e-04 2.200881e+00
## fac_193 5.474262e-01 1.104309e+00 2.275461e-02 7.671562e-01 1.336706e-01
## fac_194 1.076466e+00 1.138650e-01 7.367007e-01 3.943141e-01 8.213262e-02
## fac_195 4.809007e-01 4.451830e-01 1.066374e-01 9.691651e-01 2.042682e+00
## fac_196 5.153115e-04 2.200866e+00 2.777122e-01 8.788218e-01 1.371851e-02
## fac_197 7.244881e-05 1.064588e+00 1.039680e-01 2.690221e-01 3.958022e-01
## fac_198 7.325572e-01 3.698194e-02 5.279944e-01 1.534937e+00 1.455075e-01
## fac_199 1.165888e+00 1.415604e-01 1.146046e+00 5.111405e-02 1.817109e-02
## fac_200 6.441227e-01 3.615470e-01 4.959540e-01 4.574545e-03 1.230768e-01
## fac_201 9.999790e-01 3.872578e-01 2.232277e-01 2.157635e-04 1.511913e+00
## fac_202 1.002001e+00 2.557593e-01 5.201502e-03 8.883526e-02 3.470196e-01
## fac_203 4.394840e-02 6.711361e-01 3.359831e-01 2.936981e-03 8.649993e-04
## fac_204 6.757774e-01 2.084735e-01 7.940011e-02 6.030133e-01 3.949387e-02
## fac_205 3.311905e-01 9.386236e-02 1.146487e-01 1.928359e-02 2.675240e+00
## fac_206 6.225554e-03 2.752190e-01 4.644789e-02 7.035820e-01 5.521049e-01
## fac_207 1.085893e+00 9.948230e-03 1.182638e+00 4.406200e-03 2.555561e-01
## fac_208 1.110617e-01 1.890675e+00 4.179149e-02 7.990640e-01 1.826038e-01
## fac_209 9.696645e-03 6.210503e-01 1.864955e-01 3.880022e-02 1.316092e+00
## fac_210 3.148962e-01 1.121761e-02 1.600290e-02 7.876472e-02 7.808431e-01
## fac_211 7.501302e-03 1.234919e-01 1.026226e+00 4.005123e-01 9.330122e-05
## fac_212 4.113668e-01 6.239378e-01 6.422612e-01 5.736157e-02 3.370548e-01
## fac_213 7.569001e-03 8.572394e-01 1.510279e-02 1.530913e-01 6.767253e-04
## fac_214 7.257363e-02 1.825916e-01 2.377599e-01 3.068228e-01 1.925236e-01
## fac_215 2.116536e-01 1.911758e-01 7.283948e-04 1.335864e-03 9.542461e-02
## fac_216 4.638376e-01 9.435167e-02 3.364458e-01 1.383253e-01 1.887971e+00
head (res2.pca$ind$coord, n = 10) %>% round (digits = 2)
## Dim.1 Dim.2 Dim.3 Dim.4 Dim.5
## 1 -7.21 -6.21 6.36 4.44 -0.10
## 2 -7.50 -6.36 7.52 3.07 2.42
## 3 -7.45 -5.76 -0.14 -0.59 2.93
## 4 -2.34 -2.26 4.32 2.67 3.72
## 5 -6.24 -7.85 5.32 2.62 1.43
## 6 -7.99 -7.55 4.22 2.12 4.36
## 7 -7.56 -6.26 5.23 2.60 3.30
## 8 -8.08 -7.63 5.22 2.77 2.79
## 9 -6.14 -3.89 -0.48 -5.69 -2.05
## 10 -9.10 -3.64 4.17 2.10 -0.58
Figure 1 - Décomposition de l’inertie totale
Description du plan 1:2
drawn <-
integer(0)
par(mar = c(4.1, 4.1, 1.1, 2.1))
#plot.PCA(res.pca, select = drawn, axes = 1:2, choix = 'ind', invisible = 'quali', title = '', cex = cex)
fviz_contrib(res2.pca, choice = "ind", axes = 1:5, top = 10)+
theme(axis.text = element_text(size = 7.5))
data_fac[485,]$class
data_fac[1022,]$class
data_fac[606,]$class
data_fac[1102,]$class
data_fac[905,]$class
data_fac[689,]$class
data_fac[1045,]$class
data_fac[721,]$class
data_fac[1195,]$class
data_fac[909,]$class
On retrouve les 7 variables qui contribuent le plus à la construction du plan.
dim1: fac_55, fac_67, fac_181, fac_109 fac_19 , (fac_26 fac_7 fac_84)
dim2: fac_196 fac_4 fac_100 fac_148 fac_136 (fac_40)
dim3: fac_31 fac_174 fac_66 fac_150 fac_37 (fac_123 fac_126
dim4: fac_175 fac_163 fac_90 (fac_102 fac_18)
dim5: fac_205 fac_180 fac_99 fac_72 fac_192 fac_195 fac_216
Figure 2 - Décomposition des contributions des patterns
Figure 3 - Graphe des patterns (ACP) Les patterns libellés sont ceux ayant la plus grande contribution é la construction du plan. Les patterns sont colorés selon leur appartenance aux modalités de la variable CLASS.
On a nos 10 classes qui sont tous regroupées.
par(mar = c(4.1, 4.1, 1.1, 2.1))
fviz_pca_var(res2.pca, axes = 1:2,select.var = list(contrib = 10),jitter = list(what = "both",width = 0.1, height = 0.15) ,col.var = "contrib")
## Warning: argument jitter is deprecated; please use repel instead.
Figure 4 - Graphe des variables (ACP) Les variables libellées sont celles les mieux représentées sur le plan.
On a les variables (fac_55,fac_67) (fac_50, fac_38), fac_26, ,(fac_196,fac_4) fac_136,fac_16,fac_97 qui contribuent le plus et les varaibles fac sont très corrélées. Les éléments les plus intéressants qui ressortent de l’ACP nous montre que :
par(mar = c(4.1, 4.1, 1.1, 2.1))
fviz_pca_var(res2.pca, axes = 3:4,select.var = list(contrib = 10),jitter = list(what = "both",width = 0.1, height = 0.15) ,col.var = "contrib")
## Warning: argument jitter is deprecated; please use repel instead.
Figure 4 - Graphe des variables (ACP) Les variables libellées sont celles les mieux représentées sur le plan.
On a les variables fac_174 (fac_138 fac_90,fac_102,fac_18 ) ,(fac_66,fac150),(fac_162,fac_126) fac_31 qui contribuent le plus et les varaibles fac sont très corrélées.
par(mar = c(4.1, 4.1, 1.1, 2.1))
fviz_pca_var(res2.pca, axes = 4:5,select.var = list(contrib = 18),jitter = list(what = "both",width = 0.1, height = 0.15) ,col.var = "contrib")
## Warning: argument jitter is deprecated; please use repel instead.
Pour l’axe 5, on a les variables fac_103 fac_205 fac_180 qui contribuent le plus et les varaibles fac sont très corrélées.
Nous pouvons conclure pour notre prédiction que les variables quantitatives sont fac_55, fac_196 ,fac_136, fac_31 fac_174 fac_90, fac_66 fac_205 fac_180 fac_175 fac_163
On réalise l’ACM en mettant la variable class en variable illustrative. On gère les modalités rares (fréquence relative <5%) en effectuant de la ventilation. On voit qu’on a un peu près 1400 valeurs différentes dans chaque variable en utilisant describe(). L’ACM étant sensible aux modalités rares, on opte pour un découpage quantiles en 4 classes.
# nb valeur par variables quantitatives (quand elles prennent un nombre de valeurs supérieure à 10)
describe(data_fou)
# discrétisation en 4 classes des variables quantitatives (quand elles prennent un nombre de valeurs supérieure à 10)
don.cat<-data_fou
for(i in which(sapply(data_fou,is.numeric))){
if(length(table(don.cat[[i]]))>10){
breaks<-c(-Inf,quantile(don.cat[[i]],
na.rm=T)[-1])
don.cat[[i]]<-cut(don.cat[[i]],
breaks=breaks,labels=F);
}
don.cat[[i]]<-as.factor(don.cat[[i]])
}
str(don.cat$fou_1)
## Factor w/ 4 levels "1","2","3","4": 1 1 1 1 1 1 1 1 1 1 ...
str(don.cat$fou_2)
## Factor w/ 4 levels "1","2","3","4": 1 1 1 2 1 1 1 1 2 1 ...
str(don.cat$fou_3)
## Factor w/ 4 levels "1","2","3","4": 1 1 1 1 1 1 1 1 1 2 ...
str(don.cat$fou_4)
## Factor w/ 4 levels "1","2","3","4": 3 4 2 2 3 2 3 3 1 3 ...
Voici un exemple de discrétisation en 4 classes.
Voici le graphe des patterns On voit que les groupes 0 et 7 sont éloignés des autres.
Voici le graphe des variables avec class
On remarque que fou_5 est séparé du groupe et que la variable class est séparée.
Voici le graphe avec les modalités
On prend les 30 variables qui contribuent le plus.
L’exploitation visuelle permet de mettre en évidence les contributions les plus fortes à cette dimension sont fou_5, fou_2, fou_3, fou_69, fou_8, fou_16 fou_73 et fou_76 mais le dim 1 et 2 ne représente que 8.1 % des données.
On a entre 15 et 512 valeurs unique dans les variable fac donc on peut faire une discrétisation en 4 classes des variables quantitatives.
# nb valeur par variables quantitatives (quand elles prennent un nombre de valeurs supérieure à 10)
describe(data_fac)
# discrétisation en 4 classes des variables quantitatives (quand elles prennent un nombre de valeurs supérieure à 10)
don.cat2<-data_fac
for(i in which(sapply(data_fac,is.numeric))){
if(length(table(don.cat2[[i]]))>10){
breaks<-c(-Inf,quantile(don.cat2[[i]],
na.rm=T)[-1])
don.cat2[[i]]<-cut(don.cat2[[i]],
breaks=breaks,labels=F);
}
don.cat2[[i]]<-as.factor(don.cat2[[i]])
}
str(don.cat2$fac_1)
## Factor w/ 4 levels "1","2","3","4": 1 1 1 2 1 1 1 1 2 1 ...
str(don.cat2$fac_2)
## Factor w/ 4 levels "1","2","3","4": 1 1 1 2 1 1 1 1 2 1 ...
str(don.cat2$fac_3)
## Factor w/ 4 levels "1","2","3","4": 1 1 1 2 1 1 1 1 2 1 ...
str(don.cat2$fac_4)
## Factor w/ 4 levels "1","2","3","4": 1 2 2 2 1 1 2 1 3 3 ...
Voici un exemple de discrétisation en 4 classes.
Voici le graphe des patterns On voit que les groupes 0 et 7 sont éloignés des autres.
Voici le graphe des variables avec class fac_208 fac_136 fac_181 fac_2 fac_33 fac185 fac_100 fac_65 fac_109 fac_165 fac_105 fac_117
On remarque que tout est concentré.
Voici le graphe avec les modalités
On prend les 30 variables qui contribuent le plus.
L’exploitation visuelle permet de mettre en évidence aucunes contributions forte mais le dim 1 et 2 ne représente que 12.4 % des données.
On complète cette analyse par une CAH sur les composantes de l’ACM. Le calcul de l’inertie inter-classe avec 10 groupes.
set.seed(0)
# Choix du nombre de composantes
ncp<-which(res.mca$eig[,3]>80)[1]
# On effectue l'ACM en conservant les 32 premières dimensions
res.mca<-MCA(don.cat,graph=FALSE,quali.sup=ncol(don.cat),level.ventil = 0.05,ncp=ncp)
# On effectue la CAH (avec 10 classes d'après le diagramme des gains d'inertie)
res.cah<-HCPC(res.mca,nb.clust=10,graph=FALSE, description = FALSE)
# On affiche le dendogramme
plot(res.cah, choice="tree")
drawn <-
integer(0)
par(mar = c(4.1, 4.1, 1.1, 2.1))
plot.HCPC(res.cah, choice = 'map', draw.tree = FALSE, select = drawn)
# On représente les classes sur le graphe de l'ACM
set.seed(0)
res.mca.clust<-MCA(res.cah$data.clust,
graph=FALSE,
quali.sup=c(ncol(don.cat),ncol(res.cah$data.clust)),
level.ventil = 0.05)
plot.MCA(res.mca.clust,
habillage = ncol(res.cah$data.clust),
choix="ind",invisible=c("ind","var"))
On voit que chaque classe est proche d’un cluster et que les clesses 3 , 7, 5 ,4 ,9 et 6 sont proches.
set.seed(0)
# Choix du nombre de composantes
ncp<-which(res2.mca$eig[,3]>80)[1]
# On effectue l'ACM en conservant les 32 premières dimensions
res2.mca<-MCA(don.cat2,graph=FALSE,quali.sup=ncol(don.cat2),level.ventil = 0.05,ncp=ncp)
# On effectue la CAH (avec 10 classes d'après le diagramme des gains d'inertie)
res2.cah<-HCPC(res2.mca,nb.clust=10,graph=FALSE, description = FALSE)
# On affiche le dendogramme
plot(res2.cah, choice="tree")
drawn <-
integer(0)
par(mar = c(4.1, 4.1, 1.1, 2.1))
plot.HCPC(res2.cah, choice = 'map', draw.tree = FALSE, select = drawn)
# On représente les classes sur le graphe de l'ACM
set.seed(0)
res2.mca.clust<-MCA(res2.cah$data.clust,
graph=FALSE,
quali.sup=c(ncol(don.cat2),ncol(res2.cah$data.clust)),
level.ventil = 0.05)
plot.MCA(res2.mca.clust,
habillage = ncol(res2.cah$data.clust),
choix="ind",invisible=c("ind","var"))
On voit que chaque classe est proche d’un cluster et que les classes 3 , 7, 5 ,4 ,9 et 2 sont proches.
Il n’y aura pas de découpage par classes car nous avons un peu près 1400 valeurs uniques par variables car il ne permet pas de voir les valeurs extrèmes ou abberrantes qui peuvent caractériser une classe.
On ne va pas réduire le nombre de lignes car nous avons seulement 1500 patterns.
Sur le choix des variables, nous pouvons conclure grâce à l’ACP et ACM que nous garderons les variables mfeat-fou suivantes pour notre prédiction :
11 composantes Nous pouvons conclure pour notre prédiction que les variables quantitatives sont fou69,fou_11, fou_76,fou_5 ,fou_7 fou_71,fou_9 fou_8 fou_17 fou_74 fou_75
8 composantes fou_5, fou_2, fou_3, fou_69, fou_8, fou_73, fou_16 et fou_76
ou
16 composantes fou69,fou_11, fou_76, fou_13, fou_73, fou_3, fou_8 et fou_18,fou_5 fou_64 fou_70 et fou_30 et fou_7 fou_71, fou_9 et fou_14
data_fou0<- data_fou[,c(69,11,76,5,7,71,9,8,17,74,75,77)]
data_fou1<- data_fou[,c(2,3,5,8,16,69,73,76,77)]
data_fou2<- data_fou[,c(5,3,7,8,9,11,13,14,18,30,64,69,71,73,76,77)]
#str(data_fou0)
#str(data_fou1)
#str(data_fou2)
On a les variables (fac_55,fac_67) (fac_50, fac_98), fac_26, ,(fac_197,fac_4) fac_136,fac_16,fac_97
dim1: fac_55, fac_67, fac_181, fac_109 fac_19 , (fac_26 fac_7 fac_84)
dim2: fac_196 fac_4 fac_100 fac_148 fac_136 (fac_40)
dim3: fac_31 fac_174 fac_66 fac_150 fac_37 (fac_123 fac_126
dim4: fac_175 fac_163 fac_90 (fac_102 fac_18)
dim5: fac_205 fac_180 fac_99 fac_72 fac_192 fac_195 fac_216
data_fac0<- data_fac[,c(55,196,136,31,174,90,66,205,180,175,163,217)]
data_fac1<- data_fac[,c(55,50,26,197,136,16,97,217)]
data_fac2<- data_fac[,c(55,67,196,4,31,174,175,163,205,180,181,109,100,148,66,150,90,102,99,72,192,195,18,37,123,126,136,40,26,7,84,217)]
#str(data_fac0)
#str(data_fac1)
#str(data_fac2)
11 composantes Nous pouvons conclure pour notre prédiction que les variables quantitatives sont fac_55, fac_196 ,fac_136, fac_31 fac_174 fac_90, fac_66 fac_205 fac_180 fac_175 fac_163
26 composantes
fac_55 fac_67 fac_181 fac_109 fac_19 (fac_26 fac_7) fac_196 fac_4 fac_100 fac_148 fac_136 (fac_40)
fac_31 fac_174 fac_66 fac_150 fac_37 (fac_123 fac_126) fac_175 fac_163 fac_90 (fac_102 fac_18)
fac_205 fac_180 fac_99 fac_72 fac_192 fac_195 fac_216
ou
60 composantes fac_181 fac_29 fac_1 fac_133 fac_65 fac_97 fac_109 fac_185 fac_55 fac_53 fac_113 fac_67 fac_125 fac_19 fac_7 fac_108 fac_157 fac_84 fac_37 fac_207 fac_111 fac_135 fac_2 fac_199 fac_146 fac_94 fac_193 fac_123 fac_13 fac_43 fac_195 fac_183 fac_12 fac_147 fac_3 fac_184 fac_198 fac_41 fac_50 fac_132 fac_22 fac_91 fac_194 fac_177 fac_120 fac_154 fac_186 fac_204 fac_165 fac_38 fac_86 fac_112 fac_26 fac_10 fac_115 fac_190 fac_106 fac_144 fac_33 fac_57
Nous gardons finalement 16 variables explicatives et outcome, donc 17 variables pour commencer la modélisation.
#str(Census)
Nous avons créé des nouveaux jeux de données data_fou0, data_fou1 et data_fou2 Nous avons créé des nouveaux jeux de données data_fac0, data_fac1 et data_fac2
Pour détecter un sur-apprentissage, nous allons diviser les données en deux sous-ensembles, données apprentissage et test-validation. Pour cela, il suffit de découper notre jeu de données en deux (soit 30% pour les données de test-validation et 70% pour l’apprentissage).
Nous allons vérifier les colonnes avec des valeurs manquantes NA
data_fac = data_train[, c(1: 216,650)]
data_fac0<- data_fac[,c(55,196,136,31,174,90,66,205,180,175,163,217)]
data_fac1<- data_fac[,c(55,50,26,197,136,16,97,217)]
data_fac2<- data_fac[,c(55,67,196,4,31,174,175,163,205,180,181,109,100,148,66,150,90,102,99,72,192,195,18,37,123,126,136,40,26,7,84,217)]
#str(data_fac)
#str(data_fac1)
#str(data_fac2)
Ensuite, nous divisons l’ensemble de données apprentissage et Test.
Ensuite, nous divisons l’ensemble de données apprentissage et Test.
On transforme notre dataset Test grace aux 12 composantes principales
acp_train <- NULL
n_dim=12
data_pix <- data_train[, c(1: 216,650)]
data_pix <- data_fac
columnNumber <- which(colnames(data_pix)=="class")
data_pix <- data_pix[,c(columnNumber,1:ncol(data_pix)-1)]# put this class to column 1
#head(data_pix, n = 10L)
## 75% of the sample size
smp_size.pix <- floor(0.75 * nrow(data_pix))
set.seed(123)
train_ind.pix <- sample(seq_len(nrow(data_pix)), size = smp_size.pix)
data_pix.train <- data_pix[train_ind.pix, ]
data_pix.test <- data_pix[-train_ind.pix, ]
PCA1=prcomp(data_pix.train[,(2:ncol(data_pix.train))],center = T,scale. = F)
projected=scale(data_pix.train[,(2:ncol(data_pix.train))], PCA1$center, PCA1$scale) %*% PCA1$rotation
acp_train <-scale(data_pix.train[,(2:ncol(data_pix.train))], PCA1$center, PCA1$scale) %*% PCA1$rotation
acp_train <-acp_train[,1:n_dim]
acp_train = cbind(acp_train, replicate(1,data_pix.train$class))
colnames(acp_train)[ncol(acp_train)] <- "class"
acp_train_dt <- as.data.frame(acp_train)
acp_train_dt <- acp_train_dt[,c(n_dim+1,1:ncol(acp_train_dt)-1)]## class to first colummn
acp_train_dt[] <- lapply(acp_train_dt, function(x) as.numeric(as.character(x)))##convert to numeric all the fields
acp_train_dt$class <- as.factor(acp_train_dt$class)## convert to factor class
acp_test <- NULL
acp_test <- predict(PCA1, newdata=data_pix.test[,(2:ncol(data_pix.test))])
acp_test <-acp_test[,1:n_dim]
acp_test = cbind(acp_test, replicate(1,data_pix.test$class))
colnames(acp_test)[ncol(acp_test)] <- "class"
library(reshape)
##
## Attaching package: 'reshape'
## The following object is masked from 'package:dplyr':
##
## rename
## The following objects are masked from 'package:tidyr':
##
## expand, smiths
acp_test_dt <- as.data.frame(acp_test)
acp_test_dt <- acp_test_dt[,c(n_dim+1,1:ncol(acp_test_dt)-1)]## class to first colummn
acp_test_dt[] <- lapply(acp_test_dt, function(x) as.numeric(as.character(x)))##convert to numeric all the fields
acp_test_dt$class <- as.factor(acp_test_dt$class)## convert to factor class
small_train_data31<-acp_train_dt
validation_data31<-acp_test_dt
On transforme notre dataset Test grace aux 35 composantes principales
acp_train <- NULL
n_dim=35
data_pix <- data_train[, c(1: 216,650)]
data_pix <- data_fac
columnNumber <- which(colnames(data_pix)=="class")
data_pix <- data_pix[,c(columnNumber,1:ncol(data_pix)-1)]# put this class to column 1
#head(data_pix, n = 10L)
## 75% of the sample size
smp_size.pix <- floor(0.75 * nrow(data_pix))
set.seed(123)
train_ind.pix <- sample(seq_len(nrow(data_pix)), size = smp_size.pix)
data_pix.train <- data_pix[train_ind.pix, ]
data_pix.test <- data_pix[-train_ind.pix, ]
PCA1=prcomp(data_pix.train[,(2:ncol(data_pix.train))],center = T,scale. = F)
projected=scale(data_pix.train[,(2:ncol(data_pix.train))], PCA1$center, PCA1$scale) %*% PCA1$rotation
acp_train <-scale(data_pix.train[,(2:ncol(data_pix.train))], PCA1$center, PCA1$scale) %*% PCA1$rotation
acp_train <-acp_train[,1:n_dim]
acp_train = cbind(acp_train, replicate(1,data_pix.train$class))
colnames(acp_train)[ncol(acp_train)] <- "class"
acp_train_dt <- as.data.frame(acp_train)
acp_train_dt <- acp_train_dt[,c(n_dim+1,1:ncol(acp_train_dt)-1)]## class to first colummn
acp_train_dt[] <- lapply(acp_train_dt, function(x) as.numeric(as.character(x)))##convert to numeric all the fields
acp_train_dt$class <- as.factor(acp_train_dt$class)## convert to factor class
acp_test <- NULL
acp_test <- predict(PCA1, newdata=data_pix.test[,(2:ncol(data_pix.test))])
acp_test <-acp_test[,1:n_dim]
acp_test = cbind(acp_test, replicate(1,data_pix.test$class))
colnames(acp_test)[ncol(acp_test)] <- "class"
library(reshape)
acp_test_dt <- as.data.frame(acp_test)
acp_test_dt <- acp_test_dt[,c(n_dim+1,1:ncol(acp_test_dt)-1)]## class to first colummn
acp_test_dt[] <- lapply(acp_test_dt, function(x) as.numeric(as.character(x)))##convert to numeric all the fields
acp_test_dt$class <- as.factor(acp_test_dt$class)## convert to factor class
small_train_data3<-acp_train_dt
validation_data3<-acp_test_dt
On va utiliser les fonctions sélectionnées dans le modèle rpart pour l’algorithme CART car il est facile d’utilisation pour un arbre binaire. Et pour la variable outcome la prédiction est binaire.
CART
method = ‘rpart2’
Type: Regression, Classification
Tuning parameters:
- maxdepth (Max Tree Depth)
Required packages: rpart
Nous utilisons le paramètre number=10 et avec répétitions repeats=5 car le temps d’execution est rapide.
## Time difference of 8.114695 secs
## [1] 16
## Time difference of 3.723288 secs
## [1] 16
## Time difference of 5.058274 secs
## [1] 18
Voici notre arbre. On a utilisé la représentation graphique avec la fonction fancyRpartPlot
La variable qui influence le plus est fou_73 et fou_2
## Time difference of 16.08845 secs
## [1] 14
## Time difference of 3.774673 secs
## [1] 18
Voici notre arbre. On a utilisé la représentation graphique avec la fonction fancyRpartPlot
La variable qui influence le plus est fou_73 et fou_2
Nous avons une faible taux d’erreur de validation et elle est équivalant à l’erreur d’apprentissage.
Le kappa est faible donc accord faible sur la prédiction par rapport à une prédiction au hasard.
Nous allons utiliser random Forest qui utilise le bagging et donc prend en compte la majorité des variables.
Random Forest
method = ‘rf’
Type: Classification, Regression
Tuning parameters:
- mtry (#Randomly Selected Predictors)
Required packages: randomForest
A model-specific variable importance metric is available
Nous utilisons le paramètre number=2.
## Time difference of 13.53228 secs
## Random Forest
##
## 1170 samples
## 76 predictor
## 10 classes: '0', '1', '2', '3', '4', '5', '6', '7', '8', '9'
##
## No pre-processing
## Resampling: Cross-Validated (2 fold)
## Summary of sample sizes: 584, 586
## Resampling results across tuning parameters:
##
## mtry Accuracy Kappa
## 8 0.8136688 0.7929384
## 12 0.8059663 0.7843868
## 16 0.8025475 0.7805944
## 20 0.7999819 0.7777345
##
## Accuracy was used to select the optimal model using the largest value.
## The final value used for the model was mtry = 8.
## Time difference of 7.46223 secs
## Random Forest
##
## 1125 samples
## 35 predictor
## 10 classes: '0', '1', '2', '3', '4', '5', '6', '7', '8', '9'
##
## No pre-processing
## Resampling: Cross-Validated (2 fold)
## Summary of sample sizes: 562, 563
## Resampling results across tuning parameters:
##
## mtry Accuracy Kappa
## 8 0.9235523 0.9150484
## 12 0.9199999 0.9111017
## 16 0.9093254 0.8992409
## 20 0.9057714 0.8952938
##
## Accuracy was used to select the optimal model using the largest value.
## The final value used for the model was mtry = 8.
## Time difference of 4.289018 secs
## Random Forest
##
## 1148 samples
## 11 predictor
## 10 classes: '0', '1', '2', '3', '4', '5', '6', '7', '8', '9'
##
## No pre-processing
## Resampling: Cross-Validated (2 fold)
## Summary of sample sizes: 573, 575
## Resampling results across tuning parameters:
##
## mtry Accuracy Kappa
## 8 0.7639138 0.7375416
## 12 0.7586934 0.7317057
## 16 0.7595660 0.7326961
## 20 0.7621807 0.7356154
##
## Accuracy was used to select the optimal model using the largest value.
## The final value used for the model was mtry = 8.
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1 2 3 4 5 6 7 8 9
## 0 31 0 0 0 0 0 0 0 1 0
## 1 0 34 1 3 0 0 0 1 2 3
## 2 0 0 39 0 1 0 0 0 0 0
## 3 0 2 0 17 0 0 1 0 0 1
## 4 0 1 0 0 27 3 1 0 0 0
## 5 0 0 0 0 3 30 0 0 0 0
## 6 0 0 0 0 3 0 17 1 1 14
## 7 0 3 5 1 5 0 0 29 0 0
## 8 0 2 0 0 0 0 1 0 28 0
## 9 0 0 0 2 1 1 20 0 0 16
##
## Overall Statistics
##
## Accuracy : 0.7614
## 95% CI : (0.7133, 0.805)
## No Information Rate : 0.1278
## P-Value [Acc > NIR] : < 2.2e-16
##
## Kappa : 0.7342
##
## Mcnemar's Test P-Value : NA
##
## Statistics by Class:
##
## Class: 0 Class: 1 Class: 2 Class: 3 Class: 4 Class: 5
## Sensitivity 1.00000 0.80952 0.8667 0.73913 0.67500 0.88235
## Specificity 0.99688 0.96774 0.9967 0.98784 0.98397 0.99057
## Pos Pred Value 0.96875 0.77273 0.9750 0.80952 0.84375 0.90909
## Neg Pred Value 1.00000 0.97403 0.9808 0.98187 0.95938 0.98746
## Prevalence 0.08807 0.11932 0.1278 0.06534 0.11364 0.09659
## Detection Rate 0.08807 0.09659 0.1108 0.04830 0.07670 0.08523
## Detection Prevalence 0.09091 0.12500 0.1136 0.05966 0.09091 0.09375
## Balanced Accuracy 0.99844 0.88863 0.9317 0.86349 0.82949 0.93646
## Class: 6 Class: 7 Class: 8 Class: 9
## Sensitivity 0.4250 0.93548 0.87500 0.47059
## Specificity 0.9391 0.95639 0.99062 0.92453
## Pos Pred Value 0.4722 0.67442 0.90323 0.40000
## Neg Pred Value 0.9272 0.99353 0.98754 0.94231
## Prevalence 0.1136 0.08807 0.09091 0.09659
## Detection Rate 0.0483 0.08239 0.07955 0.04545
## Detection Prevalence 0.1023 0.12216 0.08807 0.11364
## Balanced Accuracy 0.6821 0.94594 0.93281 0.69756
## Accuracy Kappa
## 0.7613636 0.7342298
Voici le resultat de l’apprentissage : Nous avons une plus grande erreur d’apprentissage sur ce modèle par rapport aux premiers.
Nous avons une bonne accuracy pour la prédiction mais le temps d’excution est long.
Nous utilisons le paramètre number=2.
## Time difference of 28.98668 secs
## Random Forest
##
## 1158 samples
## 216 predictor
## 10 classes: '0', '1', '2', '3', '4', '5', '6', '7', '8', '9'
##
## No pre-processing
## Resampling: Cross-Validated (2 fold)
## Summary of sample sizes: 581, 577
## Resampling results across tuning parameters:
##
## mtry Accuracy Kappa
## 8 0.9421857 0.9357491
## 12 0.9473552 0.9414949
## 16 0.9439009 0.9376569
## 20 0.9439068 0.9376648
##
## Accuracy was used to select the optimal model using the largest value.
## The final value used for the model was mtry = 12.
## Time difference of 4.267051 secs
## Random Forest
##
## 1125 samples
## 12 predictor
## 10 classes: '0', '1', '2', '3', '4', '5', '6', '7', '8', '9'
##
## No pre-processing
## Resampling: Cross-Validated (2 fold)
## Summary of sample sizes: 562, 563
## Resampling results across tuning parameters:
##
## mtry Accuracy Kappa
## 8 0.8791126 0.8656629
## 12 0.8622229 0.8469098
## 16 0.8613348 0.8459192
## 20 0.8640054 0.8488843
##
## Accuracy was used to select the optimal model using the largest value.
## The final value used for the model was mtry = 8.
## Time difference of 4.468869 secs
## Random Forest
##
## 1169 samples
## 11 predictor
## 10 classes: '0', '1', '2', '3', '4', '5', '6', '7', '8', '9'
##
## No pre-processing
## Resampling: Cross-Validated (2 fold)
## Summary of sample sizes: 583, 586
## Resampling results across tuning parameters:
##
## mtry Accuracy Kappa
## 8 0.8391821 0.8212244
## 12 0.8340407 0.8155055
## 16 0.8323298 0.8136010
## 20 0.8306190 0.8116944
##
## Accuracy was used to select the optimal model using the largest value.
## The final value used for the model was mtry = 8.
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1 2 3 4 5 6 7 8 9
## 0 25 2 0 0 0 0 0 0 2 0
## 1 0 27 0 1 1 0 0 0 3 2
## 2 0 0 33 0 0 0 0 0 0 0
## 3 0 1 0 19 1 1 0 1 0 3
## 4 0 3 0 0 24 0 0 0 1 1
## 5 0 0 0 0 0 36 4 0 1 0
## 6 1 0 1 0 1 2 25 0 0 0
## 7 0 1 1 0 0 0 0 25 1 3
## 8 4 0 0 0 0 1 2 0 30 0
## 9 0 1 0 0 0 3 0 0 2 35
##
## Overall Statistics
##
## Accuracy : 0.8429
## 95% CI : (0.7991, 0.8804)
## No Information Rate : 0.1329
## P-Value [Acc > NIR] : < 2.2e-16
##
## Kappa : 0.8248
##
## Mcnemar's Test P-Value : NA
##
## Statistics by Class:
##
## Class: 0 Class: 1 Class: 2 Class: 3 Class: 4 Class: 5
## Sensitivity 0.83333 0.77143 0.9429 0.95000 0.88889 0.8372
## Specificity 0.98671 0.97635 1.0000 0.97749 0.98355 0.9826
## Pos Pred Value 0.86207 0.79412 1.0000 0.73077 0.82759 0.8780
## Neg Pred Value 0.98344 0.97306 0.9933 0.99672 0.99007 0.9759
## Prevalence 0.09063 0.10574 0.1057 0.06042 0.08157 0.1299
## Detection Rate 0.07553 0.08157 0.0997 0.05740 0.07251 0.1088
## Detection Prevalence 0.08761 0.10272 0.0997 0.07855 0.08761 0.1239
## Balanced Accuracy 0.91002 0.87389 0.9714 0.96375 0.93622 0.9099
## Class: 6 Class: 7 Class: 8 Class: 9
## Sensitivity 0.80645 0.96154 0.75000 0.7955
## Specificity 0.98333 0.98033 0.97595 0.9791
## Pos Pred Value 0.83333 0.80645 0.81081 0.8537
## Neg Pred Value 0.98007 0.99667 0.96599 0.9690
## Prevalence 0.09366 0.07855 0.12085 0.1329
## Detection Rate 0.07553 0.07553 0.09063 0.1057
## Detection Prevalence 0.09063 0.09366 0.11178 0.1239
## Balanced Accuracy 0.89489 0.97093 0.86297 0.8873
## Accuracy Kappa
## 0.8429003 0.8248321
Voici le resultat de l’apprentissage : Nous avons une plus grande erreur d’apprentissage sur ce modèle par rapport aux premiers.
Nous avons une bonne accuracy pour la prédiction mais le temps d’excution est long.
Les réseaux neuronaux sont l’un des modèles d’apprentissage machine les plus fascinants car leur structure est inspirée par le cerveau.
Neural Network
method = ‘nnet’
Type: Classification, Regression
Tuning parameters:
size (#Hidden Units)
decay (Weight Decay)
Required packages: nnet
A model-specific variable importance metric is available.
Le graphe plotnet du réseau de neurones:
Voici le resultat de la validation:
## Accuracy Kappa
## 0.7226667 0.6919602
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1 2 3 4 5 6 7 8 9
## 0 26 0 0 0 0 0 0 0 9 0
## 1 1 23 0 1 1 0 0 1 0 3
## 2 0 0 33 1 2 3 0 5 0 2
## 3 0 1 0 28 0 7 0 0 0 4
## 4 1 2 0 0 30 3 4 1 1 0
## 5 2 0 1 2 0 25 0 0 0 0
## 6 0 0 0 0 0 0 29 0 1 0
## 7 0 4 2 4 0 0 0 25 0 1
## 8 14 3 2 0 0 0 4 0 25 1
## 9 0 4 1 2 0 2 0 1 0 27
##
## Overall Statistics
##
## Accuracy : 0.7227
## 95% CI : (0.6744, 0.7674)
## No Information Rate : 0.1173
## P-Value [Acc > NIR] : < 2.2e-16
##
## Kappa : 0.692
##
## Mcnemar's Test P-Value : NA
##
## Statistics by Class:
##
## Class: 0 Class: 1 Class: 2 Class: 3 Class: 4 Class: 5
## Sensitivity 0.59091 0.62162 0.8462 0.73684 0.9091 0.62500
## Specificity 0.97281 0.97929 0.9613 0.96439 0.9649 0.98507
## Pos Pred Value 0.74286 0.76667 0.7174 0.70000 0.7143 0.83333
## Neg Pred Value 0.94706 0.95942 0.9818 0.97015 0.9910 0.95652
## Prevalence 0.11733 0.09867 0.1040 0.10133 0.0880 0.10667
## Detection Rate 0.06933 0.06133 0.0880 0.07467 0.0800 0.06667
## Detection Prevalence 0.09333 0.08000 0.1227 0.10667 0.1120 0.08000
## Balanced Accuracy 0.78186 0.80046 0.9037 0.85062 0.9370 0.80504
## Class: 6 Class: 7 Class: 8 Class: 9
## Sensitivity 0.78378 0.75758 0.69444 0.71053
## Specificity 0.99704 0.96784 0.92920 0.97033
## Pos Pred Value 0.96667 0.69444 0.51020 0.72973
## Neg Pred Value 0.97681 0.97640 0.96626 0.96746
## Prevalence 0.09867 0.08800 0.09600 0.10133
## Detection Rate 0.07733 0.06667 0.06667 0.07200
## Detection Prevalence 0.08000 0.09600 0.13067 0.09867
## Balanced Accuracy 0.89041 0.86271 0.81182 0.84043
Nous avons une faible erreur de validation et elle est équivalant à l’erreur d’apprentissage.
Le kappa est modéré avec un accord modéré de prédiction.
Le graphe plotnet du réseau de neurones:
Nous avons une faible erreur de validation et elle est équivalant à l’erreur d’apprentissage.
Le kappa est modéré avec un accord modéré de prédiction.
Nous avons une très bonne accuracy pour la prédiction mais le temps d’excution est très long.
Le k Nearest Neighbors (KNN) est un algorithme qui peut servir autant pour la classification que la régression. L’algorithme Le plus proches voisins consiste à choisir les k données les plus proches du point étudié afin d’en prédire sa valeur.
k-Nearest Neighbors
method = ‘knn’
Type: Classification, Regression
Tuning parameters:
- k (#Neighbors)
Nous utilisons le paramètre number=10 et pas de répétitions car le temps d’execution est trop long.
## Time difference of 1.507066 secs
## k-Nearest Neighbors
##
## 1170 samples
## 76 predictor
## 10 classes: '0', '1', '2', '3', '4', '5', '6', '7', '8', '9'
##
## No pre-processing
## Resampling: Cross-Validated (10 fold)
## Summary of sample sizes: 1055, 1054, 1053, 1051, 1054, 1051, ...
## Resampling results across tuning parameters:
##
## k Accuracy Kappa
## 1 0.8153792 0.7948209
## 4 0.8290623 0.8100238
## 7 0.8368360 0.8186657
## 10 0.8274091 0.8081827
## 13 0.8283589 0.8092364
##
## Accuracy was used to select the optimal model using the largest value.
## The final value used for the model was k = 7.
## Accuracy Kappa
## 0.8636364 0.8483688
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1 2 3 4 5 6 7 8 9
## 0 31 0 0 0 0 0 0 0 0 0
## 1 0 22 0 1 0 0 0 0 0 0
## 2 0 0 35 0 0 0 0 0 0 0
## 3 0 1 0 32 0 0 0 0 0 0
## 4 0 1 0 1 32 2 0 2 0 0
## 5 0 0 0 0 0 33 0 0 0 0
## 6 0 0 1 0 0 0 19 0 0 13
## 7 0 4 2 0 1 0 0 29 0 0
## 8 0 1 0 0 0 0 0 0 29 0
## 9 0 0 0 0 0 1 14 0 0 23
##
## Overall Statistics
##
## Accuracy : 0.8636
## 95% CI : (0.8218, 0.8988)
## No Information Rate : 0.1152
## P-Value [Acc > NIR] : < 2.2e-16
##
## Kappa : 0.8484
##
## Mcnemar's Test P-Value : NA
##
## Statistics by Class:
##
## Class: 0 Class: 1 Class: 2 Class: 3 Class: 4 Class: 5
## Sensitivity 1.00000 0.75862 0.9211 0.94118 0.96970 0.9167
## Specificity 1.00000 0.99668 1.0000 0.99662 0.97980 1.0000
## Pos Pred Value 1.00000 0.95652 1.0000 0.96970 0.84211 1.0000
## Neg Pred Value 1.00000 0.97720 0.9898 0.99327 0.99658 0.9899
## Prevalence 0.09394 0.08788 0.1152 0.10303 0.10000 0.1091
## Detection Rate 0.09394 0.06667 0.1061 0.09697 0.09697 0.1000
## Detection Prevalence 0.09394 0.06970 0.1061 0.10000 0.11515 0.1000
## Balanced Accuracy 1.00000 0.87765 0.9605 0.96890 0.97475 0.9583
## Class: 6 Class: 7 Class: 8 Class: 9
## Sensitivity 0.57576 0.93548 1.00000 0.6389
## Specificity 0.95286 0.97659 0.99668 0.9490
## Pos Pred Value 0.57576 0.80556 0.96667 0.6053
## Neg Pred Value 0.95286 0.99320 1.00000 0.9555
## Prevalence 0.10000 0.09394 0.08788 0.1091
## Detection Rate 0.05758 0.08788 0.08788 0.0697
## Detection Prevalence 0.10000 0.10909 0.09091 0.1152
## Balanced Accuracy 0.76431 0.95604 0.99834 0.7939
## Time difference of 1.072405 secs
## k-Nearest Neighbors
##
## 1125 samples
## 35 predictor
## 10 classes: '0', '1', '2', '3', '4', '5', '6', '7', '8', '9'
##
## No pre-processing
## Resampling: Cross-Validated (10 fold)
## Summary of sample sizes: 1012, 1012, 1013, 1012, 1013, 1014, ...
## Resampling results across tuning parameters:
##
## k Accuracy Kappa
## 1 0.9324416 0.9249195
## 4 0.9253063 0.9169840
## 7 0.9243817 0.9159568
## 10 0.9145916 0.9050720
## 13 0.9145761 0.9050605
##
## Accuracy was used to select the optimal model using the largest value.
## The final value used for the model was k = 1.
## Accuracy Kappa
## 0.9440000 0.9377283
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1 2 3 4 5 6 7 8 9
## 0 42 0 0 0 0 0 0 0 3 0
## 1 0 34 0 0 0 0 0 0 0 0
## 2 0 0 38 0 0 0 0 0 0 0
## 3 0 0 0 36 0 2 0 0 0 0
## 4 0 0 0 0 33 0 0 0 0 0
## 5 1 0 0 2 0 35 2 0 0 0
## 6 0 1 0 0 0 1 34 0 0 0
## 7 0 0 1 0 0 0 0 32 0 1
## 8 1 0 0 0 0 0 1 0 33 0
## 9 0 2 0 0 0 2 0 1 0 37
##
## Overall Statistics
##
## Accuracy : 0.944
## 95% CI : (0.9157, 0.965)
## No Information Rate : 0.1173
## P-Value [Acc > NIR] : < 2.2e-16
##
## Kappa : 0.9377
##
## Mcnemar's Test P-Value : NA
##
## Statistics by Class:
##
## Class: 0 Class: 1 Class: 2 Class: 3 Class: 4 Class: 5
## Sensitivity 0.9545 0.91892 0.9744 0.9474 1.000 0.87500
## Specificity 0.9909 1.00000 1.0000 0.9941 1.000 0.98507
## Pos Pred Value 0.9333 1.00000 1.0000 0.9474 1.000 0.87500
## Neg Pred Value 0.9939 0.99120 0.9970 0.9941 1.000 0.98507
## Prevalence 0.1173 0.09867 0.1040 0.1013 0.088 0.10667
## Detection Rate 0.1120 0.09067 0.1013 0.0960 0.088 0.09333
## Detection Prevalence 0.1200 0.09067 0.1013 0.1013 0.088 0.10667
## Balanced Accuracy 0.9727 0.95946 0.9872 0.9707 1.000 0.93004
## Class: 6 Class: 7 Class: 8 Class: 9
## Sensitivity 0.91892 0.96970 0.91667 0.97368
## Specificity 0.99408 0.99415 0.99410 0.98516
## Pos Pred Value 0.94444 0.94118 0.94286 0.88095
## Neg Pred Value 0.99115 0.99707 0.99118 0.99700
## Prevalence 0.09867 0.08800 0.09600 0.10133
## Detection Rate 0.09067 0.08533 0.08800 0.09867
## Detection Prevalence 0.09600 0.09067 0.09333 0.11200
## Balanced Accuracy 0.95650 0.98192 0.95538 0.97942
## Time difference of 0.9444418 secs
## k-Nearest Neighbors
##
## 1148 samples
## 11 predictor
## 10 classes: '0', '1', '2', '3', '4', '5', '6', '7', '8', '9'
##
## No pre-processing
## Resampling: Cross-Validated (10 fold)
## Summary of sample sizes: 1032, 1035, 1034, 1034, 1031, 1033, ...
## Resampling results across tuning parameters:
##
## k Accuracy Kappa
## 1 0.7926173 0.7694384
## 4 0.8090876 0.7877239
## 7 0.8047907 0.7829244
## 10 0.8169905 0.7964841
## 13 0.8047842 0.7829033
##
## Accuracy was used to select the optimal model using the largest value.
## The final value used for the model was k = 10.
## Accuracy Kappa
## 0.7727273 0.7470288
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1 2 3 4 5 6 7 8 9
## 0 31 0 0 0 0 0 0 0 1 0
## 1 0 30 0 5 0 0 0 0 1 0
## 2 0 3 39 0 0 0 0 0 0 0
## 3 0 1 0 18 0 0 0 0 0 1
## 4 0 0 0 0 24 1 0 0 0 0
## 5 0 0 0 0 3 32 0 0 0 0
## 6 0 1 0 0 3 1 20 0 1 14
## 7 0 3 6 0 7 0 0 31 0 0
## 8 0 4 0 0 0 0 0 0 28 0
## 9 0 0 0 0 3 0 20 0 1 19
##
## Overall Statistics
##
## Accuracy : 0.7727
## 95% CI : (0.7253, 0.8155)
## No Information Rate : 0.1278
## P-Value [Acc > NIR] : < 2.2e-16
##
## Kappa : 0.747
##
## Mcnemar's Test P-Value : NA
##
## Statistics by Class:
##
## Class: 0 Class: 1 Class: 2 Class: 3 Class: 4 Class: 5
## Sensitivity 1.00000 0.71429 0.8667 0.78261 0.60000 0.94118
## Specificity 0.99688 0.98065 0.9902 0.99392 0.99679 0.99057
## Pos Pred Value 0.96875 0.83333 0.9286 0.90000 0.96000 0.91429
## Neg Pred Value 1.00000 0.96203 0.9806 0.98494 0.95107 0.99369
## Prevalence 0.08807 0.11932 0.1278 0.06534 0.11364 0.09659
## Detection Rate 0.08807 0.08523 0.1108 0.05114 0.06818 0.09091
## Detection Prevalence 0.09091 0.10227 0.1193 0.05682 0.07102 0.09943
## Balanced Accuracy 0.99844 0.84747 0.9284 0.88826 0.79840 0.96587
## Class: 6 Class: 7 Class: 8 Class: 9
## Sensitivity 0.50000 1.00000 0.87500 0.55882
## Specificity 0.93590 0.95016 0.98750 0.92453
## Pos Pred Value 0.50000 0.65957 0.87500 0.44186
## Neg Pred Value 0.93590 1.00000 0.98750 0.95146
## Prevalence 0.11364 0.08807 0.09091 0.09659
## Detection Rate 0.05682 0.08807 0.07955 0.05398
## Detection Prevalence 0.11364 0.13352 0.09091 0.12216
## Balanced Accuracy 0.71795 0.97508 0.93125 0.74168
Nous avons une faible erreur d’apprentissage. Le kappa est élevé avec un accord élevé de prédiction
Voici les paramètres choisis :
Nous avons trouver une stratégie pour avoir une valeur optimale pour le paramètre k. Le niveau de prediction est bon avec un temps d’exécution correcte.
## Time difference of 3.051411 secs
## k-Nearest Neighbors
##
## 1158 samples
## 216 predictor
## 10 classes: '0', '1', '2', '3', '4', '5', '6', '7', '8', '9'
##
## No pre-processing
## Resampling: Cross-Validated (10 fold)
## Summary of sample sizes: 1042, 1042, 1042, 1040, 1041, 1041, ...
## Resampling results across tuning parameters:
##
## k Accuracy Kappa
## 1 0.9379263 0.9310076
## 4 0.9308622 0.9231524
## 7 0.9249021 0.9165242
## 10 0.9136935 0.9040645
## 13 0.9101926 0.9001694
##
## Accuracy was used to select the optimal model using the largest value.
## The final value used for the model was k = 1.
## Time difference of 0.9723761 secs
## k-Nearest Neighbors
##
## 1125 samples
## 12 predictor
## 10 classes: '0', '1', '2', '3', '4', '5', '6', '7', '8', '9'
##
## No pre-processing
## Resampling: Cross-Validated (10 fold)
## Summary of sample sizes: 1012, 1012, 1013, 1012, 1013, 1014, ...
## Resampling results across tuning parameters:
##
## k Accuracy Kappa
## 1 0.8977770 0.8863988
## 4 0.9030237 0.8922116
## 7 0.9074410 0.8971234
## 10 0.8949877 0.8832724
## 13 0.8825663 0.8694727
##
## Accuracy was used to select the optimal model using the largest value.
## The final value used for the model was k = 7.
## Time difference of 0.958601 secs
## k-Nearest Neighbors
##
## 1169 samples
## 11 predictor
## 10 classes: '0', '1', '2', '3', '4', '5', '6', '7', '8', '9'
##
## No pre-processing
## Resampling: Cross-Validated (10 fold)
## Summary of sample sizes: 1055, 1052, 1054, 1051, 1050, 1053, ...
## Resampling results across tuning parameters:
##
## k Accuracy Kappa
## 1 0.7178659 0.6863042
## 4 0.7264189 0.6957314
## 7 0.7322840 0.7022191
## 10 0.7134718 0.6812401
## 13 0.7074718 0.6745459
##
## Accuracy was used to select the optimal model using the largest value.
## The final value used for the model was k = 7.
## Accuracy Kappa
## 0.9473684 0.9414094
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1 2 3 4 5 6 7 8 9
## 0 28 0 0 0 0 0 0 0 4 0
## 1 0 34 0 0 0 0 0 0 1 0
## 2 0 0 39 0 0 0 0 0 0 0
## 3 0 0 0 33 0 1 0 1 0 0
## 4 0 0 0 0 33 0 0 0 0 0
## 5 0 0 0 2 0 33 1 0 0 0
## 6 0 0 0 0 0 0 43 0 0 0
## 7 0 0 0 0 0 0 0 26 0 0
## 8 2 1 0 1 0 0 0 0 26 0
## 9 0 1 1 0 0 1 0 1 0 29
##
## Overall Statistics
##
## Accuracy : 0.9474
## 95% CI : (0.9181, 0.9685)
## No Information Rate : 0.1287
## P-Value [Acc > NIR] : < 2.2e-16
##
## Kappa : 0.9414
##
## Mcnemar's Test P-Value : NA
##
## Statistics by Class:
##
## Class: 0 Class: 1 Class: 2 Class: 3 Class: 4 Class: 5
## Sensitivity 0.93333 0.94444 0.9750 0.91667 1.00000 0.94286
## Specificity 0.98718 0.99673 1.0000 0.99346 1.00000 0.99023
## Pos Pred Value 0.87500 0.97143 1.0000 0.94286 1.00000 0.91667
## Neg Pred Value 0.99355 0.99349 0.9967 0.99023 1.00000 0.99346
## Prevalence 0.08772 0.10526 0.1170 0.10526 0.09649 0.10234
## Detection Rate 0.08187 0.09942 0.1140 0.09649 0.09649 0.09649
## Detection Prevalence 0.09357 0.10234 0.1140 0.10234 0.09649 0.10526
## Balanced Accuracy 0.96026 0.97059 0.9875 0.95507 1.00000 0.96654
## Class: 6 Class: 7 Class: 8 Class: 9
## Sensitivity 0.9773 0.92857 0.83871 1.00000
## Specificity 1.0000 1.00000 0.98714 0.98722
## Pos Pred Value 1.0000 1.00000 0.86667 0.87879
## Neg Pred Value 0.9967 0.99367 0.98397 1.00000
## Prevalence 0.1287 0.08187 0.09064 0.08480
## Detection Rate 0.1257 0.07602 0.07602 0.08480
## Detection Prevalence 0.1257 0.07602 0.08772 0.09649
## Balanced Accuracy 0.9886 0.96429 0.91292 0.99361
## Accuracy Kappa
## 0.8853333 0.8725468
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1 2 3 4 5 6 7 8 9
## 0 37 0 0 0 0 0 0 0 4 0
## 1 0 30 0 0 0 1 0 0 2 0
## 2 0 0 37 1 0 3 0 0 0 0
## 3 0 0 0 35 0 6 0 0 0 0
## 4 1 2 0 0 31 0 0 0 0 0
## 5 0 0 1 0 0 28 1 0 0 1
## 6 0 1 0 0 2 0 35 0 0 0
## 7 0 1 0 1 0 0 0 32 0 0
## 8 6 1 1 0 0 1 1 0 30 0
## 9 0 2 0 1 0 1 0 1 0 37
##
## Overall Statistics
##
## Accuracy : 0.8853
## 95% CI : (0.8487, 0.9158)
## No Information Rate : 0.1173
## P-Value [Acc > NIR] : < 2.2e-16
##
## Kappa : 0.8725
##
## Mcnemar's Test P-Value : NA
##
## Statistics by Class:
##
## Class: 0 Class: 1 Class: 2 Class: 3 Class: 4 Class: 5
## Sensitivity 0.84091 0.81081 0.94872 0.92105 0.93939 0.70000
## Specificity 0.98792 0.99112 0.98810 0.98220 0.99123 0.99104
## Pos Pred Value 0.90244 0.90909 0.90244 0.85366 0.91176 0.90323
## Neg Pred Value 0.97904 0.97953 0.99401 0.99102 0.99413 0.96512
## Prevalence 0.11733 0.09867 0.10400 0.10133 0.08800 0.10667
## Detection Rate 0.09867 0.08000 0.09867 0.09333 0.08267 0.07467
## Detection Prevalence 0.10933 0.08800 0.10933 0.10933 0.09067 0.08267
## Balanced Accuracy 0.91441 0.90097 0.96841 0.95162 0.96531 0.84552
## Class: 6 Class: 7 Class: 8 Class: 9
## Sensitivity 0.94595 0.96970 0.8333 0.97368
## Specificity 0.99112 0.99415 0.9705 0.98516
## Pos Pred Value 0.92105 0.94118 0.7500 0.88095
## Neg Pred Value 0.99407 0.99707 0.9821 0.99700
## Prevalence 0.09867 0.08800 0.0960 0.10133
## Detection Rate 0.09333 0.08533 0.0800 0.09867
## Detection Prevalence 0.10133 0.09067 0.1067 0.11200
## Balanced Accuracy 0.96854 0.98192 0.9019 0.97942
## Accuracy Kappa
## 0.6948640 0.6603266
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1 2 3 4 5 6 7 8 9
## 0 21 5 0 0 0 0 0 0 8 0
## 1 0 18 0 0 1 0 0 1 1 5
## 2 0 1 24 1 0 4 1 1 0 2
## 3 0 0 1 14 0 5 0 2 0 5
## 4 1 3 1 0 26 0 1 0 2 1
## 5 0 0 2 1 0 25 4 0 0 0
## 6 0 0 2 0 0 3 24 0 0 0
## 7 0 2 3 1 0 2 0 22 0 2
## 8 8 1 2 1 0 3 1 0 28 1
## 9 0 5 0 2 0 1 0 0 1 28
##
## Overall Statistics
##
## Accuracy : 0.6949
## 95% CI : (0.6422, 0.744)
## No Information Rate : 0.1329
## P-Value [Acc > NIR] : < 2.2e-16
##
## Kappa : 0.6603
##
## Mcnemar's Test P-Value : NA
##
## Statistics by Class:
##
## Class: 0 Class: 1 Class: 2 Class: 3 Class: 4 Class: 5
## Sensitivity 0.70000 0.51429 0.68571 0.70000 0.96296 0.58140
## Specificity 0.95681 0.97297 0.96622 0.95820 0.97039 0.97569
## Pos Pred Value 0.61765 0.69231 0.70588 0.51852 0.74286 0.78125
## Neg Pred Value 0.96970 0.94426 0.96296 0.98026 0.99662 0.93980
## Prevalence 0.09063 0.10574 0.10574 0.06042 0.08157 0.12991
## Detection Rate 0.06344 0.05438 0.07251 0.04230 0.07855 0.07553
## Detection Prevalence 0.10272 0.07855 0.10272 0.08157 0.10574 0.09668
## Balanced Accuracy 0.82841 0.74363 0.82597 0.82910 0.96668 0.77854
## Class: 6 Class: 7 Class: 8 Class: 9
## Sensitivity 0.77419 0.84615 0.70000 0.63636
## Specificity 0.98333 0.96721 0.94158 0.96864
## Pos Pred Value 0.82759 0.68750 0.62222 0.75676
## Neg Pred Value 0.97682 0.98662 0.95804 0.94558
## Prevalence 0.09366 0.07855 0.12085 0.13293
## Detection Rate 0.07251 0.06647 0.08459 0.08459
## Detection Prevalence 0.08761 0.09668 0.13595 0.11178
## Balanced Accuracy 0.87876 0.90668 0.82079 0.80250
Nous avons une faible erreur d’apprentissage. Le kappa est élevé avec un accord élevé de prédiction
Voici les paramètres choisis :