Data and Library Set Up

library(tree)
## Warning: package 'tree' was built under R version 4.1.3
## Registered S3 method overwritten by 'tree':
##   method     from
##   print.tree cli
RO_Train <- read.csv("/Users/S-Wri/Documents/MSc Data Science/Data Analytics with R/Data Analytics using R/DAR Coursework/Room_Occupancy_Training_set.txt",header = TRUE)
RO_Test <- read.csv("/Users/S-Wri/Documents/MSc Data Science/Data Analytics with R/Data Analytics using R/DAR Coursework/Room_Occupancy_Testing_set.txt",header = TRUE)

RO_Train$Occupancy <- ifelse(RO_Train$Occupancy == 1, 'Yes','No')
RO_Train$Occupancy <- as.factor(RO_Train$Occupancy)

Decision Tree Classifier and Testing Predictive Performance

tree.room = tree(Occupancy ~ ., data = RO_Train)

tree.room
## node), split, n, deviance, yval, (yprob)
##       * denotes terminal node
## 
##  1) root 1000 1107.000 No ( 0.75800 0.24200 )  
##    2) Light < 189.708 706    0.000 No ( 1.00000 0.00000 ) *
##    3) Light > 189.708 294  274.400 Yes ( 0.17687 0.82313 )  
##      6) Humidity < 26.2065 61   80.840 No ( 0.62295 0.37705 )  
##       12) Light < 447.5 19    7.835 Yes ( 0.05263 0.94737 ) *
##       13) Light > 447.5 42   30.660 No ( 0.88095 0.11905 ) *
##      7) Humidity > 26.2065 233  105.900 Yes ( 0.06009 0.93991 )  
##       14) Temperature < 22.6417 192    0.000 Yes ( 0.00000 1.00000 ) *
##       15) Temperature > 22.6417 41   52.640 Yes ( 0.34146 0.65854 )  
##         30) Humidity < 26.9038 22   28.840 No ( 0.63636 0.36364 ) *
##         31) Humidity > 26.9038 19    0.000 Yes ( 0.00000 1.00000 ) *
tree.room.pred = predict(tree.room, RO_Test, type = 'class')

#create a confusion matrix 
tree.accuracy <- table(tree.room.pred, RO_Test$Occupancy)

#calculates the accuracy of the SVM model by summing the diagonal elements of the confusion matrix and dividing by the total number of instances.

rf.accuracy <- sum(diag(tree.accuracy)) / sum(tree.accuracy)

rf.accuracy
## [1] 0.7433333

The accuracy of this decision tree classifier is 74.3%.


Decision Tree

plot(tree.room)
text(tree.room,pretty = 0)

summary(tree.room)
## 
## Classification tree:
## tree(formula = Occupancy ~ ., data = RO_Train)
## Variables actually used in tree construction:
## [1] "Light"       "Humidity"    "Temperature"
## Number of terminal nodes:  6 
## Residual mean deviance:  0.06774 = 67.34 / 994 
## Misclassification error rate: 0.014 = 14 / 1000

This decision tree classifier has three predictor variables: Light, Humidity and Temperature. The tree structure has 6 nodes and has a miss classification error rate of 0.014 on the training data.

The first node split is determined by Light where light less then 189.71 are classed as not occupied and rooms greater then this are further split on Humidity. In this split if Humidity is less then 26.21 and light is less then 447.5 then the room is occupied else it is not. If the humidity is greater then 26.21 the tree splits again based on temperature.

If humidity is greater then 26.21 and temperature is less then 22.64 then the room will be occupied. Otherwise if the temperature is greater then 22.64 the tree splits again on humidity.

If temperature is greater then 22.6417 and humidity is less then 26.9038 the the room will not be occupied. If humidity is greater then 26.90 then is will be occupied.


Random Forest Classifier and Predictive Performance

library(randomForest)
set.seed(123)

rf.room <- randomForest(Occupancy ~ ., RO_Train)

rf.room.pred <- predict(rf.room, RO_Test, type = 'class')
rf.conf_mat <- table(rf.room.pred,RO_Test$Occupancy)

rf.accuracy <- sum(diag(rf.conf_mat)) / sum(rf.conf_mat)

rf.accuracy
## [1] 0.7666667

The accuracy of the random forest classifier is 76.7%. This is 2.4% improvement from the decision tree classifier.


RFC Important Features

importance(rf.room)
##               MeanDecreaseGini
## Temperature           59.39006
## Humidity              23.29022
## Light                146.65171
## CO2                  107.35119
## HumidityRatio         29.94961

The RFC gini index shows that Light is the most important feature for predicting room occupancy, followed by CO2 and then Temperature.