#Set Working Directory
setwd("~/CST-425")
#Exploirng DATA
The data set that is being read in is on forest fires. the data set gives the month, rain, wind, temperature etc. We are going to be looking specifically, the month, rain, wind and the temperature.
data <- read.csv("~/CST-425/forestfires.csv", header = TRUE)
data.subset <- data[c('month', 'rain', 'wind', 'temp')]
str(data.subset)
## 'data.frame': 517 obs. of 4 variables:
## $ month: chr "mar" "oct" "oct" "mar" ...
## $ rain : num 0 0 0 0.2 0 0 0 0 0 0 ...
## $ wind : num 6.7 0.9 1.3 4 1.8 5.4 3.1 2.2 5.4 4 ...
## $ temp : num 8.2 18 14.6 8.3 11.4 22.2 24.1 8 13.1 22.8 ...
#Normalizing data
This normalizes the data set, this insures that the data presented in unbiazed.
head(data.subset)
## month rain wind temp
## 1 mar 0.0 6.7 8.2
## 2 oct 0.0 0.9 18.0
## 3 oct 0.0 1.3 14.6
## 4 mar 0.2 4.0 8.3
## 5 mar 0.0 1.8 11.4
## 6 aug 0.0 5.4 22.2
normalize <- function(x) {
return ((x - min(x))/(max(x) -min(x)))
}
data.subset.n <- as.data.frame(lapply(data.subset[,2:4], normalize))
head(data.subset.n)
## rain wind temp
## 1 0.00000 0.70000000 0.1929260
## 2 0.00000 0.05555556 0.5080386
## 3 0.00000 0.10000000 0.3987138
## 4 0.03125 0.40000000 0.1961415
## 5 0.00000 0.15555556 0.2958199
## 6 0.00000 0.55555556 0.6430868
#Splicing the data
This will show split the data into traning and testing sets. In this case we are going to be putting 70% of the data into a training set. The other 30% of the data is going to be placed in the testing set.
set.seed(123)
dat.d <- sample(1:nrow(data.subset.n),size = nrow(data.subset.n)*0.7,replace = FALSE)
train.data <- data.subset.n[dat.d,]
test.data <- data.subset.n[-dat.d,]
train.data_labels <-data.subset.n[dat.d,1]
test.data_labels <- data.subset.n[-dat.d,1]
#Machine learning
At this point we have to build training data set, so we can initialize k. First we take the NROW of the training data set and provides an output of [1]361. We can take the squareroot of this to find the value of k.
NROW(train.data_labels)
## [1] 361
knn.19 <- knn(train = train.data, test = test.data, cl = train.data_labels, k = 19)
#Model Evaluation
Now we evaluate the model that was created above
ACC.19 <- 100 * sum(test.data_labels == knn.19)/NROW(test.data_labels)
ACC.19
## [1] 96.15385
table(knn.19, test.data_labels)
## test.data_labels
## knn.19 0 0.03125 0.0625 0.125 0.21875 1
## 0 150 1 1 2 1 1
## 0.03125 0 0 0 0 0 0
## 0.15625 0 0 0 0 0 0