#Set Working Directory

setwd("~/CST-425")

#Exploirng DATA

The data set that is being read in is on forest fires. the data set gives the month, rain, wind, temperature etc. We are going to be looking specifically, the month, rain, wind and the temperature.

data <- read.csv("~/CST-425/forestfires.csv", header = TRUE)

data.subset <- data[c('month', 'rain', 'wind', 'temp')]

str(data.subset)
## 'data.frame':    517 obs. of  4 variables:
##  $ month: chr  "mar" "oct" "oct" "mar" ...
##  $ rain : num  0 0 0 0.2 0 0 0 0 0 0 ...
##  $ wind : num  6.7 0.9 1.3 4 1.8 5.4 3.1 2.2 5.4 4 ...
##  $ temp : num  8.2 18 14.6 8.3 11.4 22.2 24.1 8 13.1 22.8 ...

#Normalizing data

This normalizes the data set, this insures that the data presented in unbiazed.

head(data.subset)
##   month rain wind temp
## 1   mar  0.0  6.7  8.2
## 2   oct  0.0  0.9 18.0
## 3   oct  0.0  1.3 14.6
## 4   mar  0.2  4.0  8.3
## 5   mar  0.0  1.8 11.4
## 6   aug  0.0  5.4 22.2
normalize <- function(x) {
  return ((x - min(x))/(max(x) -min(x)))
}

data.subset.n <- as.data.frame(lapply(data.subset[,2:4], normalize))

head(data.subset.n)
##      rain       wind      temp
## 1 0.00000 0.70000000 0.1929260
## 2 0.00000 0.05555556 0.5080386
## 3 0.00000 0.10000000 0.3987138
## 4 0.03125 0.40000000 0.1961415
## 5 0.00000 0.15555556 0.2958199
## 6 0.00000 0.55555556 0.6430868

#Splicing the data

This will show split the data into traning and testing sets. In this case we are going to be putting 70% of the data into a training set. The other 30% of the data is going to be placed in the testing set.

set.seed(123)

dat.d <- sample(1:nrow(data.subset.n),size = nrow(data.subset.n)*0.7,replace = FALSE)

train.data <- data.subset.n[dat.d,]
test.data <- data.subset.n[-dat.d,]

train.data_labels <-data.subset.n[dat.d,1]
test.data_labels <- data.subset.n[-dat.d,1]

#Machine learning

At this point we have to build training data set, so we can initialize k. First we take the NROW of the training data set and provides an output of [1]361. We can take the squareroot of this to find the value of k.

NROW(train.data_labels)
## [1] 361
knn.19 <- knn(train = train.data, test = test.data, cl = train.data_labels, k = 19)

#Model Evaluation

Now we evaluate the model that was created above

ACC.19 <- 100 * sum(test.data_labels == knn.19)/NROW(test.data_labels)

ACC.19
## [1] 96.15385
table(knn.19, test.data_labels)
##          test.data_labels
## knn.19      0 0.03125 0.0625 0.125 0.21875   1
##   0       150       1      1     2       1   1
##   0.03125   0       0      0     0       0   0
##   0.15625   0       0      0     0       0   0