library(doParallel)
library(caret)
library(randomForest)
library(nnet)
data(iris)
registerDoParallel(cores=4); # Utiliser plusieurs core avec caret.
set.seed(1234)
Inspecter les données.
str(iris);
## 'data.frame': 150 obs. of 5 variables:
## $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
## $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
## $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
## $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
## $ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
summary(iris[1:4])
## Sepal.Length Sepal.Width Petal.Length Petal.Width
## Min. :4.300 Min. :2.000 Min. :1.000 Min. :0.100
## 1st Qu.:5.100 1st Qu.:2.800 1st Qu.:1.600 1st Qu.:0.300
## Median :5.800 Median :3.000 Median :4.350 Median :1.300
## Mean :5.843 Mean :3.057 Mean :3.758 Mean :1.199
## 3rd Qu.:6.400 3rd Qu.:3.300 3rd Qu.:5.100 3rd Qu.:1.800
## Max. :7.900 Max. :4.400 Max. :6.900 Max. :2.500
pairs(iris[1:4], main="Données Iris", pch=19, col=as.numeric(iris$Species) + 1)

Séparer nos données en 2 groupes. 70% des données servirons à l’entrainement de notre modèle, le reste va servir à tester notre modèle.
inTrain <- createDataPartition(iris$Species, p=0.7, list=FALSE)
train <- iris[inTrain,]
test <- iris[-inTrain,]
Créer un modèle à partir des données d’entrainement en utilisant la méthode Random Forest.
fitRF <- train(Species ~ ., method="rf", data=train)
Effectuer une prédiction avec notre modèle.
predictionRF <- predict(fitRF, test[,-5])
Vérifier notre prédiction.
confusionMatrix(predictionRF, test$Species)
## Confusion Matrix and Statistics
##
## Reference
## Prediction setosa versicolor virginica
## setosa 15 0 0
## versicolor 0 15 2
## virginica 0 0 13
##
## Overall Statistics
##
## Accuracy : 0.9556
## 95% CI : (0.8485, 0.9946)
## No Information Rate : 0.3333
## P-Value [Acc > NIR] : < 2.2e-16
##
## Kappa : 0.9333
## Mcnemar's Test P-Value : NA
##
## Statistics by Class:
##
## Class: setosa Class: versicolor Class: virginica
## Sensitivity 1.0000 1.0000 0.8667
## Specificity 1.0000 0.9333 1.0000
## Pos Pred Value 1.0000 0.8824 1.0000
## Neg Pred Value 1.0000 1.0000 0.9375
## Prevalence 0.3333 0.3333 0.3333
## Detection Rate 0.3333 0.3333 0.2889
## Detection Prevalence 0.3333 0.3778 0.2889
## Balanced Accuracy 1.0000 0.9667 0.9333
Tentons l’expérience avec un modèle Neural-Network.
fitNN <- train(Species ~ ., method="nnet", data=train)
## # weights: 27
## initial value 117.063616
## iter 10 value 54.749519
## iter 20 value 19.587511
## iter 30 value 19.074718
## iter 40 value 18.926413
## iter 50 value 18.912757
## final value 18.912751
## converged
predictionNN <- predict(fitNN, test[,-5])
confusionMatrix(predictionNN, test$Species)
## Confusion Matrix and Statistics
##
## Reference
## Prediction setosa versicolor virginica
## setosa 15 0 0
## versicolor 0 14 1
## virginica 0 1 14
##
## Overall Statistics
##
## Accuracy : 0.9556
## 95% CI : (0.8485, 0.9946)
## No Information Rate : 0.3333
## P-Value [Acc > NIR] : < 2.2e-16
##
## Kappa : 0.9333
## Mcnemar's Test P-Value : NA
##
## Statistics by Class:
##
## Class: setosa Class: versicolor Class: virginica
## Sensitivity 1.0000 0.9333 0.9333
## Specificity 1.0000 0.9667 0.9667
## Pos Pred Value 1.0000 0.9333 0.9333
## Neg Pred Value 1.0000 0.9667 0.9667
## Prevalence 0.3333 0.3333 0.3333
## Detection Rate 0.3333 0.3111 0.3111
## Detection Prevalence 0.3333 0.3333 0.3333
## Balanced Accuracy 1.0000 0.9500 0.9500