Ex1

Use the nnet package to analyze the iris data set. Use 80% of the 150 samples as the training data and the rest for validation. Discuss the results.

data("iris")
irisdf<-iris
set.seed(759)
index<-createDataPartition(irisdf$Species,p=0.8,list = FALSE)
train<-irisdf[index,]
test<-irisdf[-index,]
nnetModel<-nnet(Species~.,data=train,size=2, decay=1.0e-5, maxit=50)

## # weights:  19
## initial  value 141.261847 
## iter  10 value 123.880211
## iter  20 value 55.409326
## iter  30 value 52.800760
## iter  40 value 52.682027
## iter  50 value 36.924167
## final  value 36.924167 
## stopped after 50 iterations

nnetModel

## a 4-2-3 network with 19 weights
## inputs: Sepal.Length Sepal.Width Petal.Length Petal.Width 
## output(s): Species 
## options were - softmax modelling  decay=1e-05

nnetpred<-predict(nnetModel,newdata =test,type="class")
nnetpred<-as.factor(nnetpred)
result<-confusionMatrix(test$Species,nnetpred)
result

## Confusion Matrix and Statistics
## 
##             Reference
## Prediction   setosa versicolor virginica
##   setosa         10          0         0
##   versicolor      0          9         1
##   virginica       0          1         9
## 
## Overall Statistics
##                                           
##                Accuracy : 0.9333          
##                  95% CI : (0.7793, 0.9918)
##     No Information Rate : 0.3333          
##     P-Value [Acc > NIR] : 8.747e-12       
##                                           
##                   Kappa : 0.9             
##                                           
##  Mcnemar's Test P-Value : NA              
## 
## Statistics by Class:
## 
##                      Class: setosa Class: versicolor Class: virginica
## Sensitivity                 1.0000            0.9000           0.9000
## Specificity                 1.0000            0.9500           0.9500
## Pos Pred Value              1.0000            0.9000           0.9000
## Neg Pred Value              1.0000            0.9500           0.9500
## Prevalence                  0.3333            0.3333           0.3333
## Detection Rate              0.3333            0.3000           0.3000
## Detection Prevalence        0.3333            0.3333           0.3333
## Balanced Accuracy           1.0000            0.9250           0.9250

When apply the neural network model with the test data, the accuracy rate is 93.33%. We only have two cases were misclassified.

Ex2

As a mini project, install the keras package and learn how to use it. Then, carry out various tasks that may be useful to your project and studies.

data(iris)
iris2<-iris
iris2[,5] <- as.numeric(iris2[,5]) -1
iris2 <- as.matrix(iris2)
dimnames(iris2) <- NULL

#split the data into training and test data set
index <- sample(2, nrow(iris), replace=TRUE, prob=c(0.8, 0.2))
train_x <- iris2[index==1, 1:4]
train_y <- iris2[index==1, 5]
test_x <- iris[index==2, 1:4]
test_y <- iris2[index==2, 5]

# One hot encode
train_y <- to_categorical(train_y)

## Loaded Tensorflow version 2.7.0

test_y <- to_categorical(test_y)

model <- keras_model_sequential() 

# Add layers
model<-model %>% 
    layer_dense(units = 10, activation = 'relu', input_shape = ncol(train_x)) %>% 
    layer_dropout(0.2) %>%
    layer_dense(units = 10, activation = "relu") %>%
    layer_dropout(0.2) %>%
    layer_dense(units = 3, activation = 'softmax')



model<-model %>% compile(
     loss = 'categorical_crossentropy',
     optimizer = 'adam',
     metrics = 'accuracy'
 )
model

## Model
## Model: "sequential"
## ________________________________________________________________________________
##  Layer (type)                       Output Shape                    Param #     
## ================================================================================
##  dense_2 (Dense)                    (None, 10)                      50          
##                                                                                 
##  dropout_1 (Dropout)                (None, 10)                      0           
##                                                                                 
##  dense_1 (Dense)                    (None, 10)                      110         
##                                                                                 
##  dropout (Dropout)                  (None, 10)                      0           
##                                                                                 
##  dense (Dense)                      (None, 3)                       33          
##                                                                                 
## ================================================================================
## Total params: 193
## Trainable params: 193
## Non-trainable params: 0
## ________________________________________________________________________________

history <- model %>% fit(
     x = train_x,
     y = train_y, 
     epochs = 200,
     batch_size = 5, 
     validation_split = 0.3
 )

# Plot the history
plot(history)

## `geom_smooth()` using formula 'y ~ x'

keras package is a package for deep learning. From the graph, we can see the accuracy can be around 90% when epoch reach 60 and there are not many change after that.

Data609_HW8

Mengqin Cai

12/11/2021

Ex1

Ex2