library(tree)
## Warning: package 'tree' was built under R version 4.1.3
## Registered S3 method overwritten by 'tree':
## method from
## print.tree cli
RO_Train <- read.csv("/Users/S-Wri/Documents/MSc Data Science/Data Analytics with R/Data Analytics using R/DAR Coursework/Room_Occupancy_Training_set.txt",header = TRUE)
RO_Test <- read.csv("/Users/S-Wri/Documents/MSc Data Science/Data Analytics with R/Data Analytics using R/DAR Coursework/Room_Occupancy_Testing_set.txt",header = TRUE)
RO_Train$Occupancy <- ifelse(RO_Train$Occupancy == 1, 'Yes','No')
RO_Train$Occupancy <- as.factor(RO_Train$Occupancy)
tree.room = tree(Occupancy ~ ., data = RO_Train)
tree.room
## node), split, n, deviance, yval, (yprob)
## * denotes terminal node
##
## 1) root 1000 1107.000 No ( 0.75800 0.24200 )
## 2) Light < 189.708 706 0.000 No ( 1.00000 0.00000 ) *
## 3) Light > 189.708 294 274.400 Yes ( 0.17687 0.82313 )
## 6) Humidity < 26.2065 61 80.840 No ( 0.62295 0.37705 )
## 12) Light < 447.5 19 7.835 Yes ( 0.05263 0.94737 ) *
## 13) Light > 447.5 42 30.660 No ( 0.88095 0.11905 ) *
## 7) Humidity > 26.2065 233 105.900 Yes ( 0.06009 0.93991 )
## 14) Temperature < 22.6417 192 0.000 Yes ( 0.00000 1.00000 ) *
## 15) Temperature > 22.6417 41 52.640 Yes ( 0.34146 0.65854 )
## 30) Humidity < 26.9038 22 28.840 No ( 0.63636 0.36364 ) *
## 31) Humidity > 26.9038 19 0.000 Yes ( 0.00000 1.00000 ) *
tree.room.pred = predict(tree.room, RO_Test, type = 'class')
#create a confusion matrix
tree.accuracy <- table(tree.room.pred, RO_Test$Occupancy)
#calculates the accuracy of the SVM model by summing the diagonal elements of the confusion matrix and dividing by the total number of instances.
rf.accuracy <- sum(diag(tree.accuracy)) / sum(tree.accuracy)
rf.accuracy
## [1] 0.7433333
The accuracy of this decision tree classifier is 74.3%.
plot(tree.room)
text(tree.room,pretty = 0)
summary(tree.room)
##
## Classification tree:
## tree(formula = Occupancy ~ ., data = RO_Train)
## Variables actually used in tree construction:
## [1] "Light" "Humidity" "Temperature"
## Number of terminal nodes: 6
## Residual mean deviance: 0.06774 = 67.34 / 994
## Misclassification error rate: 0.014 = 14 / 1000
This decision tree classifier has three predictor variables: Light, Humidity and Temperature. The tree structure has 6 nodes and has a miss classification error rate of 0.014 on the training data.
The first node split is determined by Light where light less then 189.71 are classed as not occupied and rooms greater then this are further split on Humidity. In this split if Humidity is less then 26.21 and light is less then 447.5 then the room is occupied else it is not. If the humidity is greater then 26.21 the tree splits again based on temperature.
If humidity is greater then 26.21 and temperature is less then 22.64 then the room will be occupied. Otherwise if the temperature is greater then 22.64 the tree splits again on humidity.
If temperature is greater then 22.6417 and humidity is less then 26.9038 the the room will not be occupied. If humidity is greater then 26.90 then is will be occupied.
library(randomForest)
set.seed(123)
rf.room <- randomForest(Occupancy ~ ., RO_Train)
rf.room.pred <- predict(rf.room, RO_Test, type = 'class')
rf.conf_mat <- table(rf.room.pred,RO_Test$Occupancy)
rf.accuracy <- sum(diag(rf.conf_mat)) / sum(rf.conf_mat)
rf.accuracy
## [1] 0.7666667
The accuracy of the random forest classifier is 76.7%. This is 2.4% improvement from the decision tree classifier.
importance(rf.room)
## MeanDecreaseGini
## Temperature 59.39006
## Humidity 23.29022
## Light 146.65171
## CO2 107.35119
## HumidityRatio 29.94961
The RFC gini index shows that Light is the most important feature for predicting room occupancy, followed by CO2 and then Temperature.