Use the nnet package to analyze the iris data set. Use 80% of the 150 samples as the training data and the rest for validation. Discuss the results.
data("iris")
irisdf<-iris
set.seed(759)
index<-createDataPartition(irisdf$Species,p=0.8,list = FALSE)
train<-irisdf[index,]
test<-irisdf[-index,]
nnetModel<-nnet(Species~.,data=train,size=2, decay=1.0e-5, maxit=50)
## # weights: 19
## initial value 141.261847
## iter 10 value 123.880211
## iter 20 value 55.409326
## iter 30 value 52.800760
## iter 40 value 52.682027
## iter 50 value 36.924167
## final value 36.924167
## stopped after 50 iterations
nnetModel
## a 4-2-3 network with 19 weights
## inputs: Sepal.Length Sepal.Width Petal.Length Petal.Width
## output(s): Species
## options were - softmax modelling decay=1e-05
nnetpred<-predict(nnetModel,newdata =test,type="class")
nnetpred<-as.factor(nnetpred)
result<-confusionMatrix(test$Species,nnetpred)
result
## Confusion Matrix and Statistics
##
## Reference
## Prediction setosa versicolor virginica
## setosa 10 0 0
## versicolor 0 9 1
## virginica 0 1 9
##
## Overall Statistics
##
## Accuracy : 0.9333
## 95% CI : (0.7793, 0.9918)
## No Information Rate : 0.3333
## P-Value [Acc > NIR] : 8.747e-12
##
## Kappa : 0.9
##
## Mcnemar's Test P-Value : NA
##
## Statistics by Class:
##
## Class: setosa Class: versicolor Class: virginica
## Sensitivity 1.0000 0.9000 0.9000
## Specificity 1.0000 0.9500 0.9500
## Pos Pred Value 1.0000 0.9000 0.9000
## Neg Pred Value 1.0000 0.9500 0.9500
## Prevalence 0.3333 0.3333 0.3333
## Detection Rate 0.3333 0.3000 0.3000
## Detection Prevalence 0.3333 0.3333 0.3333
## Balanced Accuracy 1.0000 0.9250 0.9250
When apply the neural network model with the test data, the accuracy rate is 93.33%. We only have two cases were misclassified.
As a mini project, install the keras package and learn how to use it. Then, carry out various tasks that may be useful to your project and studies.
data(iris)
iris2<-iris
iris2[,5] <- as.numeric(iris2[,5]) -1
iris2 <- as.matrix(iris2)
dimnames(iris2) <- NULL
#split the data into training and test data set
index <- sample(2, nrow(iris), replace=TRUE, prob=c(0.8, 0.2))
train_x <- iris2[index==1, 1:4]
train_y <- iris2[index==1, 5]
test_x <- iris[index==2, 1:4]
test_y <- iris2[index==2, 5]
# One hot encode
train_y <- to_categorical(train_y)
## Loaded Tensorflow version 2.7.0
test_y <- to_categorical(test_y)
model <- keras_model_sequential()
# Add layers
model<-model %>%
layer_dense(units = 10, activation = 'relu', input_shape = ncol(train_x)) %>%
layer_dropout(0.2) %>%
layer_dense(units = 10, activation = "relu") %>%
layer_dropout(0.2) %>%
layer_dense(units = 3, activation = 'softmax')
model<-model %>% compile(
loss = 'categorical_crossentropy',
optimizer = 'adam',
metrics = 'accuracy'
)
model
## Model
## Model: "sequential"
## ________________________________________________________________________________
## Layer (type) Output Shape Param #
## ================================================================================
## dense_2 (Dense) (None, 10) 50
##
## dropout_1 (Dropout) (None, 10) 0
##
## dense_1 (Dense) (None, 10) 110
##
## dropout (Dropout) (None, 10) 0
##
## dense (Dense) (None, 3) 33
##
## ================================================================================
## Total params: 193
## Trainable params: 193
## Non-trainable params: 0
## ________________________________________________________________________________
history <- model %>% fit(
x = train_x,
y = train_y,
epochs = 200,
batch_size = 5,
validation_split = 0.3
)
# Plot the history
plot(history)
## `geom_smooth()` using formula 'y ~ x'
keras package is a package for deep learning. From the graph, we can see the accuracy can be around 90% when epoch reach 60 and there are not many change after that.