Our client is developing a system to be deployed on large industrial campuses, shopping malls, etc to help people to navigate a complex, unfamiliar interior space without getting lost. While GPS works fairly reliably outdoors, it generally doesn’t work indoors, so a different technology is necessary. Our client would like us to investigate the feasibility of using “wifi fingerprinting” to determine a person’s location in indoor spaces. Wifi fingerprinting uses the signals from multiple wifi hotspots within the building to determine location, analogously to how GPS uses satellite signals.
The objective of the task is to evaluate the application of machine learning techniques to the problem of indoor locationing via wifi fingerprinting. For this task we predicted LATITUDE and LONGITUDE using WAP signals (regression problem). Three machine learning techniques were selected: 1) Linear Regression, 2) k-NN, and 3) Random forest. The results are summmarized in the plots at the bottom of this report.
The dataset contains the following attributes:
There were no missing values.
library(dplyr)
library(tidyr)
library(ggplot2)
library(tidyverse)
library(caret)
library(readxl)
library(rmarkdown)
library(writexl)
library(gridExtra)
training <- read_csv("C:/Users/Y.S. Kim/Desktop/Ubiqum/Wifi/Data/trainingData.csv")
valid <- read_csv("C:/Users/Y.S. Kim/Desktop/Ubiqum/Wifi/Data/validationData.csv")
Pre-processing (1)
The trainingdataset is very large (19937 observations). Two methods to overcome this problem was:
# Subset by building
train1_B0 <- training %>%
filter(BUILDINGID == 0)
train1_B1 <- training %>%
filter(BUILDINGID == 1)
train1_B2 <- training %>%
filter(BUILDINGID == 2)
valid1_B0 <- valid %>%
filter(BUILDINGID == 0)
valid1_B1 <- valid %>%
filter(BUILDINGID == 1)
valid1_B2 <- valid %>%
filter(BUILDINGID == 2)
#take a sample (n=2500)
#Building 0
train1_B0_lat <- train1_B0[sample(1:nrow(train1_B0), 2500, replace = FALSE),]
#Building 1
train1_B1_lat <- train1_B1[sample(1:nrow(train1_B1), 2500, replace = FALSE),]
#Building 2
train1_B2_lat <- train1_B2[sample(1:nrow(train1_B2), 2500, replace = FALSE),]
#Building 0
train1_B0_lon <- train1_B0[sample(1:nrow(train1_B0), 2500, replace = FALSE),]
#Building 1
train1_B1_lon <- train1_B1[sample(1:nrow(train1_B1), 2500, replace = FALSE),]
#Building 2
train1_B2_lon <- train1_B2[sample(1:nrow(train1_B2), 2500, replace = FALSE),]
Modalization (1)
#Building 0
# Split data into trainset and testset (80/20)
set.seed(212)
inTraining <- createDataPartition(train1_B0_lat$LATITUDE, p = .8, list = FALSE)
trainset1_B0_lat <- train1_B0_lat[inTraining,]
testset1_B0_lat <- train1_B0_lat[-inTraining,]
#Train model
mod1_lm_B0_lat <- train(LATITUDE~.,
data = trainset1_B0_lat %>%
select(starts_with("WAP"), LATITUDE),
method = "lm",
trControl = trainControl(method = "repeatedcv",
number = 5,
repeats = 1))
#train results
mod1_lm_B0_lat
#test results
pred1_lm_B0_lat_test <- predict(mod1_lm_B0_lat, newdata = testset1_B0_lat)
postResample(testset1_B0_lat$LATITUDE, pred1_lm_B0_lat_test)
#validation results
pred1_lm_B0_lat_validation <- predict(object = mod1_lm_B0_lat, newdata = valid1_B0)
postResample(valid1_B0$LATITUDE, pred1_lm_B0_lat_validation)
#Building 1
# Split data into trainset and testset (80/20)
set.seed(212)
inTraining <- createDataPartition(train1_B1_lat$LATITUDE, p = .8, list = FALSE)
trainset1_B1_lat <- train1_B1_lat[inTraining,]
testset1_B1_lat <- train1_B1_lat[-inTraining,]
#Train model
mod1_lm_B1_lat <- train(LATITUDE~.,
data = trainset1_B1_lat %>%
select(starts_with("WAP"), LATITUDE),
method = "lm",
trControl = trainControl(method = "repeatedcv",
number = 5,
repeats = 1))
#train results
mod1_lm_B1_lat
#test results
pred1_lm_B1_lat_test <- predict(mod1_lm_B1_lat, newdata = testset1_B1_lat)
postResample(testset1_B1_lat$LATITUDE, pred1_lm_B1_lat_test)
#validation results
pred1_lm_B1_lat_validation <- predict(object = mod1_lm_B1_lat, newdata = valid1_B1)
postResample(valid1_B1$LATITUDE, pred1_lm_B1_lat_validation)
#Building 2
# Split data into trainset and testset (80/20)
set.seed(212)
inTraining <- createDataPartition(train1_B2_lat$LATITUDE, p = .8, list = FALSE)
trainset1_B2_lat <- train1_B2_lat[inTraining,]
testset1_B2_lat <- train1_B2_lat[-inTraining,]
**Linear Regression**
#Train model
mod1_lm_B2_lat <- train(LATITUDE~.,
data = trainset1_B2_lat %>%
select(starts_with("WAP"), LATITUDE),
method = "lm",
trControl = trainControl(method = "repeatedcv",
number = 5,
repeats = 1))
#train results
mod1_lm_B2_lat
#test results
pred1_lm_B2_lat_test <- predict(mod1_lm_B2_lat, newdata = testset1_B2_lat)
postResample(testset1_B2_lat$LATITUDE, pred1_lm_B2_lat_test)
#validation results
pred1_lm_B2_lat_validation <- predict(object = mod1_lm_B2_lat, newdata = valid1_B2)
postResample(valid1_B2$LATITUDE, pred1_lm_B2_lat_validation)
#Building 0
#Train model
mod1_knn_B0_lat <- train(LATITUDE~.,
data = trainset1_B0_lat %>%
select(starts_with("WAP"), LATITUDE),
method = "knn",
trControl = trainControl(method = "repeatedcv",
number = 5,
repeats = 1))
#train results
mod1_knn_B0_lat
#test results
pred1_knn_B0_lat_test <- predict(mod1_knn_B0_lat, newdata = testset1_B0_lat)
postResample(testset1_B0_lat$LATITUDE, pred1_knn_B0_lat_test)
#validation results
pred1_knn_B0_lat_validation <- predict(object = mod1_knn_B0_lat, newdata = valid1_B0)
postResample(valid1_B0$LATITUDE, pred1_knn_B0_lat_validation)
#Building 1
#Train model
mod1_knn_B1_lat <- train(LATITUDE~.,
data = trainset1_B1_lat %>%
select(starts_with("WAP"), LATITUDE),
method = "knn",
trControl = trainControl(method = "repeatedcv",
number = 5,
repeats = 1))
#train results
mod1_knn_B1_lat
#test results
pred1_knn_B1_lat_test <- predict(mod1_knn_B1_lat, newdata = testset1_B1_lat)
postResample(testset1_B1_lat$LATITUDE, pred1_knn_B1_lat_test)
#validation results
pred1_knn_B1_lat_validation <- predict(object = mod1_knn_B1_lat, newdata = valid1_B1)
postResample(valid1_B1$LATITUDE, pred1_knn_B1_lat_validation)
#Building 2
#Train model
mod1_knn_B2_lat <- train(LATITUDE~.,
data = trainset1_B2_lat %>%
select(starts_with("WAP"), LATITUDE),
method = "knn",
trControl = trainControl(method = "repeatedcv",
number = 5,
repeats = 1))
#train results
mod1_knn_B2_lat
#test results
pred1_knn_B2_lat_test <- predict(mod1_knn_B2_lat, newdata = testset1_B2_lat)
postResample(testset1_B2_lat$LATITUDE, pred1_knn_B2_lat_test)
#validation results
pred1_knn_B2_lat_validation <- predict(object = mod1_knn_B2_lat, newdata = valid1_B2)
postResample(valid1_B2$LATITUDE, pred1_knn_B2_lat_validation)
#Building 0
#Train model
mod1_rf_B0_lat <- train(LATITUDE~.,
data = trainset1_B0_lat %>%
select(starts_with("WAP"), LATITUDE),
method = "rf",
trControl = trainControl(method = "repeatedcv",
number = 5,
repeats = 1))
#train results
mod1_rf_B0_lat
#test results
pred1_rf_B0_lat_test <- predict(mod1_rf_B0_lat, newdata = testset1_B0_lat)
postResample(testset1_B0_lat$LATITUDE, pred1_rf_B0_lat_test)
#validation results
pred1_rf_B0_lat_validation <- predict(object = mod1_rf_B0_lat, newdata = valid1_B0)
postResample(valid1_B0$LATITUDE, pred1_rf_B0_lat_validation)
#Building 1
#Train model
mod1_rf_B1_lat <- train(LATITUDE~.,
data = trainset1_B1_lat %>%
select(starts_with("WAP"), LATITUDE),
method = "rf",
trControl = trainControl(method = "repeatedcv",
number = 5,
repeats = 1))
#train results
mod1_rf_B1_lat
#test results
pred1_rf_B1_lat_test <- predict(mod1_rf_B1_lat, newdata = testset1_B1_lat)
postResample(testset1_B1_lat$LATITUDE, pred1_rf_B1_lat_test)
#validation results
pred1_rf_B1_lat_validation <- predict(object = mod1_rf_B1_lat, newdata = valid1_B1)
postResample(valid1_B1$LATITUDE, pred1_rf_B1_lat_validation)
#Building 2
#Train model
mod1_rf_B2_lat <- train(LATITUDE~.,
data = trainset1_B2_lat %>%
select(starts_with("WAP"), LATITUDE),
method = "rf",
trControl = trainControl(method = "repeatedcv",
number = 5,
repeats = 1))
#train results
mod1_rf_B2_lat
#test results
pred1_rf_B2_lat_test <- predict(mod1_rf_B2_lat, newdata = testset1_B2_lat)
postResample(testset1_B2_lat$LATITUDE, pred1_rf_B2_lat_test)
#validation results
pred1_rf_B2_lat_validation <- predict(object = mod1_rf_B2_lat, newdata = valid1_B2)
postResample(valid1_B2$LATITUDE, pred1_rf_B2_lat_validation)
#Building 0
# Split data into trainset and testset (80/20)
set.seed(212)
inTraining <- createDataPartition(train1_B0_lon$LONGITUDE, p = .8, list = FALSE)
trainset1_B0_lon <- train1_B0_lon[inTraining,]
testset1_B0_lon <- train1_B0_lon[-inTraining,]
#Train model
mod1_lm_B0_lon <- train(LONGITUDE~.,
data = trainset1_B0_lon %>%
select(starts_with("WAP"), LONGITUDE),
method = "lm",
trControl = trainControl(method = "repeatedcv",
number = 5,
repeats = 1))
#train results
mod1_lm_B0_lon
#test results
pred1_lm_B0_lon_test <- predict(mod1_lm_B0_lon, newdata = testset1_B0_lon)
postResample(testset1_B0_lon$LONGITUDE, pred1_lm_B0_lon_test)
#validation results
pred1_lm_B0_lon_validation <- predict(object = mod1_lm_B0_lon, newdata = valid1_B0)
postResample(valid1_B0$LONGITUDE, pred1_lm_B0_lon_validation)
#Building 1
# Split data into trainset and testset (80/20)
set.seed(212)
inTraining <- createDataPartition(train1_B1_lon$LONGITUDE, p = .8, list = FALSE)
trainset1_B1_lon <- train1_B1_lon[inTraining,]
testset1_B1_lon <- train1_B1_lon[-inTraining,]
#Train model
mod1_lm_B1_lon <- train(LONGITUDE~.,
data = trainset1_B1_lon %>%
select(starts_with("WAP"), LONGITUDE),
method = "lm",
trControl = trainControl(method = "repeatedcv",
number = 5,
repeats = 1))
#train results
mod1_lm_B1_lon
#test results
pred1_lm_B1_lon_test <- predict(mod1_lm_B1_lon, newdata = testset1_B1_lon)
postResample(testset1_B1_lon$LONGITUDE, pred1_lm_B1_lon_test)
#validation results
pred1_lm_B1_lon_validation <- predict(object = mod1_lm_B1_lon, newdata = valid1_B1)
postResample(valid1_B1$LONGITUDE, pred1_lm_B1_lon_validation)
#Building 2
# Split data into trainset and testset (80/20)
set.seed(212)
inTraining <- createDataPartition(train1_B2_lon$LONGITUDE, p = .8, list = FALSE)
trainset1_B2_lon <- train1_B2_lon[inTraining,]
testset1_B2_lon <- train1_B2_lon[-inTraining,]
#Train model
mod1_lm_B2_lon <- train(LONGITUDE~.,
data = trainset1_B2_lon %>%
select(starts_with("WAP"), LONGITUDE),
method = "lm",
trControl = trainControl(method = "repeatedcv",
number = 5,
repeats = 1))
#train results
mod1_lm_B2_lon
#test results
pred1_lm_B2_lon_test <- predict(mod1_lm_B2_lon, newdata = testset1_B2_lon)
postResample(testset1_B2_lon$LONGITUDE, pred1_lm_B2_lon_test)
#validation results
pred1_lm_B2_lon_validation <- predict(object = mod1_lm_B2_lon, newdata = valid1_B2)
postResample(valid1_B2$LONGITUDE, pred1_lm_B2_lon_validation)
#Building 0
#Train model
mod1_knn_B0_lon <- train(LONGITUDE~.,
data = trainset1_B0_lon %>%
select(starts_with("WAP"), LONGITUDE),
method = "knn",
trControl = trainControl(method = "repeatedcv",
number = 5,
repeats = 1))
#train results
mod1_knn_B0_lon
#test results
pred1_knn_B0_lon_test <- predict(mod1_knn_B0_lon, newdata = testset1_B0_lon)
postResample(testset1_B0_lon$LONGITUDE, pred1_knn_B0_lon_test)
#validation results
pred1_knn_B0_validation <- predict(object = mod1_knn_B0_lon, newdata = valid1_B0)
postResample(valid1_B0$LONGITUDE, pred1_knn_B0_validation)
#Building 1
#Train model
mod1_knn_B1_lon <- train(LONGITUDE~.,
data = trainset1_B1_lon %>%
select(starts_with("WAP"), LONGITUDE),
method = "knn",
trControl = trainControl(method = "repeatedcv",
number = 5,
repeats = 1))
#train results
mod1_knn_B1_lon
#test results
pred1_knn_B1_lon_test <- predict(mod1_knn_B1_lon, newdata = testset1_B1_lon)
postResample(testset1_B1_lon$LONGITUDE, pred1_knn_B1_lon_test)
#validation results
pred1_knn_B1_lon_validation <- predict(object = mod1_knn_B1_lon, newdata = valid1_B1)
postResample(valid1_B1$LONGITUDE, pred1_knn_B1_lon_validation)
#Building 2
#Train model
mod1_knn_B2_lon <- train(LONGITUDE~.,
data = trainset1_B2_lon %>%
select(starts_with("WAP"), LONGITUDE),
method = "knn",
trControl = trainControl(method = "repeatedcv",
number = 5,
repeats = 1))
#train results
mod1_knn_B2_lon
#test results
pred1_knn_B2_lon_test <- predict(mod1_knn_B2_lon, newdata = testset1_B2_lon)
postResample(testset1_B2_lon$LONGITUDE, pred1_knn_B2_lon_test)
#validation results
pred1_knn_B2_lon_validation <- predict(object = mod1_knn_B2_lon, newdata = valid1_B2)
postResample(valid1_B2$LONGITUDE, pred1_knn_B2_lon_validation)
#Building 0
#Train model
mod1_rf_B0_lon <- train(LONGITUDE~.,
data = trainset1_B0_lon %>%
select(starts_with("WAP"), LONGITUDE),
method = "rf",
trControl = trainControl(method = "repeatedcv",
number = 5,
repeats = 1))
#train results
mod1_rf_B0_lon
#test results
pred1_rf_B0_lon_test <- predict(mod1_rf_B0_lon, newdata = testset1_B0_lon)
postResample(testset1_B0_lon$LONGITUDE, pred1_rf_B0_lon_test)
#validation results
pred1_rf_B0_lon_validation <- predict(object = mod1_rf_B0_lon, newdata = valid1_B0)
postResample(valid1_B0$LONGITUDE, pred1_rf_B0_lon_validation)
#Building 1
#Train model
mod1_rf_B1_lon <- train(LONGITUDE~.,
data = trainset1_B1_lon %>%
select(starts_with("WAP"), LONGITUDE),
method = "rf",
trControl = trainControl(method = "repeatedcv",
number = 5,
repeats = 1))
#train results
mod1_rf_B1_lon
#test results
pred1_rf_B1_lon_test <- predict(mod1_rf_B1_lon, newdata = testset1_B1_lon)
postResample(testset1_B1_lon$LONGITUDE, pred1_rf_B1_lon_test)
#validation results
pred1_rf_B1_lon_validation <- predict(object = mod1_rf_B1_lon, newdata = valid1_B1)
postResample(valid1_B1$LONGITUDE, pred1_rf_B1_lon_validation)
#Building 2
#Train model
mod1_rf_B2_lon <- train(LONGITUDE~.,
data = trainset1_B2_lon %>%
select(starts_with("WAP"), LONGITUDE),
method = "rf",
trControl = trainControl(method = "repeatedcv",
number = 5,
repeats = 1))
#train results
mod1_rf_B2_lon
#test results
pred1_rf_B2_lon_test <- predict(mod1_rf_B2_lon, newdata = testset1_B2_lon)
postResample(testset1_B2_lon$LONGITUDE, pred1_rf_B2_lon_test)
#validation results
pred1_rf_B2_lon_validation <- predict(object = mod1_rf_B2_lon, newdata = valid1_B2)
postResample(valid1_B2$LONGITUDE, pred1_rf_B2_lon_validation)
Pre-processing (2)
#Exploration of trainingdata, distribution of WAPs
WAP <- training %>%
select(starts_with("WAP"))
w <- WAP[,1:520]
w <- stack(w)
w <- w[-grep(0, w$values),]
hist(w$values, xlab = "WAP strength", main = "Distribution of WAPs signal stength", col = "blue")
# Outliers (-30 to 0) -----------------------------------------------------
#filter out high values -30 to 0
outlier <- training %>%
rownames_to_column(var = "id") %>%
pivot_longer(
cols = starts_with("WAP"),
names_to = "WAP",
values_to = "values"
) %>%
filter(between(values, -30, 0))
hist(outlier$values, xlab = "WAP strength", main = "Distribution of WAPs signal stength (Outliers)", col = "red")
outlier_data <- training %>%
rownames_to_column(var = "id") %>%
filter(id %in% outlier$id)
training2 <- training
training2$id <- seq.int(nrow(training2))
training3 <- training2[,c(ncol(training2), 1:(ncol(training2)-1))]
training3$ID <- NULL
# removing outliers
training_no_outlier <- training3[!(training3$id %in% outlier_data$id), ]
# Rescale to 0-105 --------------------------------------------------------
#outliers
outlier_data_rs <- outlier_data
outlier_data_rs$id <- NULL
outlier_data_rs[outlier_data_rs == 100] <- -105
outlier_data_rs[,1:520] <- outlier_data_rs[,1:520] + 105
#training without outliers
training_no_outlier_rs <- training_no_outlier
training_no_outlier_rs$id <- NULL
training_no_outlier_rs[training_no_outlier_rs == 100] <- -105
training_no_outlier_rs[,1:520] <- training_no_outlier_rs[,1:520] + 105
#validation
valid_rs <- valid
valid_rs[valid_rs == 100] <- -105
valid_rs[,1:520] <- valid_rs[,1:520] + 105
# Subset by building for modalization 2 --------------------------------------
train2_B0 <- training_no_outlier_rs %>%
filter(BUILDINGID == 0)
train2_B1 <- training_no_outlier_rs %>%
filter(BUILDINGID == 1)
train2_B2 <- training_no_outlier_rs %>%
filter(BUILDINGID == 2)
valid2_B0 <- valid_rs %>%
filter(BUILDINGID == 0)
valid2_B1 <- valid_rs %>%
filter(BUILDINGID == 1)
valid2_B2 <- valid_rs %>%
filter(BUILDINGID == 2)
#Building 0
train2_B0_lat <- train2_B0[sample(1:nrow(train2_B0), 2500, replace = FALSE),]
#Building 1
train2_B1_lat <- train2_B1[sample(1:nrow(train2_B1), 2500, replace = FALSE),]
#Building 2
train2_B2_lat <- train2_B2[sample(1:nrow(train2_B2), 2500, replace = FALSE),]
#Building 0
train2_B0_lon <- train2_B0[sample(1:nrow(train2_B0), 2500, replace = FALSE),]
#Building 1
train2_B1_lon <- train2_B1[sample(1:nrow(train2_B1), 2500, replace = FALSE),]
#Building 2
train2_B2_lon <- train2_B2[sample(1:nrow(train2_B2), 2500, replace = FALSE),]
Modalization (2)
#Building 0
# Split data into trainset and testset (80/20)
set.seed(212)
inTraining <- createDataPartition(train2_B0_lat$LATITUDE, p = .8, list = FALSE)
trainset2_B0_lat <- train2_B0_lat[inTraining,]
testset2_B0_lat <- train2_B0_lat[-inTraining,]
#Train model
mod2_lm_B0_lat <- train(LATITUDE~.,
data = trainset2_B0_lat %>%
select(starts_with("WAP"), LATITUDE),
method = "lm",
trControl = trainControl(method = "repeatedcv",
number = 5,
repeats = 1))
#train results
mod2_lm_B0_lat
#test results
pred2_lm_B0_lat_test <- predict(mod2_lm_B0_lat, newdata = testset2_B0_lat)
postResample(testset2_B0_lat$LATITUDE, pred2_lm_B0_lat_test)
#validation results
pred2_lm_B0_lat_validation <- predict(object = mod2_lm_B0_lat, newdata = valid2_B0)
postResample(valid2_B0$LATITUDE, pred2_lm_B0_lat_validation)
#Building 1
# Split data into trainset and testset (80/20)
set.seed(212)
inTraining <- createDataPartition(train2_B1_lat$LATITUDE, p = .8, list = FALSE)
trainset2_B1_lat <- train2_B1_lat[inTraining,]
testset2_B1_lat <- train2_B1_lat[-inTraining,]
#Train model
mod2_lm_B1_lat <- train(LATITUDE~.,
data = trainset2_B1_lat %>%
select(starts_with("WAP"), LATITUDE),
method = "lm",
trControl = trainControl(method = "repeatedcv",
number = 5,
repeats = 1))
#train results
mod2_lm_B1_lat
#test results
pred2_lm_B1_lat_test <- predict(mod2_lm_B1_lat, newdata = testset2_B1_lat)
postResample(testset2_B1_lat$LATITUDE, pred2_lm_B1_lat_test)
#validation results
pred2_lm_B1_lat_validation <- predict(object = mod2_lm_B1_lat, newdata = valid2_B1)
postResample(valid2_B1$LATITUDE, pred2_lm_B1_lat_validation)
#Building 2
# Split data into trainset and testset (80/20)
set.seed(212)
inTraining <- createDataPartition(train2_B2_lat$LATITUDE, p = .8, list = FALSE)
trainset2_B2_lat <- train2_B2_lat[inTraining,]
testset2_B2_lat <- train2_B2_lat[-inTraining,]
#Train model
mod2_lm_B2_lat <- train(LATITUDE~.,
data = trainset2_B2_lat %>%
select(starts_with("WAP"), LATITUDE),
method = "lm",
trControl = trainControl(method = "repeatedcv",
number = 5,
repeats = 1))
#train results
mod2_lm_B2_lat
#test results
pred2_lm_B2_lat_test <- predict(mod2_lm_B2_lat, newdata = testset2_B2_lat)
postResample(testset2_B2_lat$LATITUDE, pred2_lm_B2_lat_test)
#validation results
pred2_lm_B2_lat_validation <- predict(object = mod2_lm_B2_lat, newdata = valid2_B2)
postResample(valid2_B2$LATITUDE, pred2_lm_B2_lat_validation)
#Building 0
#Train model
mod2_knn_B0_lat <- train(LATITUDE~.,
data = trainset2_B0_lat %>%
select(starts_with("WAP"), LATITUDE),
method = "knn",
trControl = trainControl(method = "repeatedcv",
number = 5,
repeats = 1))
#train results
mod2_knn_B0_lat
#test results
pred2_knn_B0_lat_test <- predict(mod2_knn_B0_lat, newdata = testset2_B0_lat)
postResample(testset2_B0_lat$LATITUDE, pred2_knn_B0_lat_test)
#validation results
pred2_knn_B0_lat_validation <- predict(object = mod2_knn_B0_lat, newdata = valid2_B0)
postResample(valid2_B0$LATITUDE, pred2_knn_B0_lat_validation)
#Building 1
#Train model
mod2_knn_B1_lat <- train(LATITUDE~.,
data = trainset2_B1_lat %>%
select(starts_with("WAP"), LATITUDE),
method = "knn",
trControl = trainControl(method = "repeatedcv",
number = 5,
repeats = 1))
#train results
mod2_knn_B1_lat
#test results
pred2_knn_B1_lat_test <- predict(mod2_knn_B1_lat, newdata = testset2_B1_lat)
postResample(testset2_B1_lat$LATITUDE, pred2_knn_B1_lat_test)
#validation results
pred2_knn_B1_lat_validation <- predict(object = mod2_knn_B1_lat, newdata = valid2_B1)
postResample(valid2_B1$LATITUDE, pred2_knn_B1_lat_validation)
#Building 2
#Train model
mod2_knn_B2_lat <- train(LATITUDE~.,
data = trainset2_B2_lat %>%
select(starts_with("WAP"), LATITUDE),
method = "knn",
trControl = trainControl(method = "repeatedcv",
number = 5,
repeats = 1))
#train results
mod2_knn_B2_lat
#test results
pred2_knn_B2_lat_test <- predict(mod2_knn_B2_lat, newdata = testset2_B2_lat)
postResample(testset2_B2_lat$LATITUDE, pred2_knn_B2_lat_test)
#validation results
pred2_knn_B2_lat_validation <- predict(object = mod2_knn_B2_lat, newdata = valid2_B2)
postResample(valid2_B2$LATITUDE, pred2_knn_B2_lat_validation)
#Building 0
#Train model
mod2_rf_B0_lat <- train(LATITUDE~.,
data = trainset2_B0_lat %>%
select(starts_with("WAP"), LATITUDE),
method = "rf",
trControl = trainControl(method = "repeatedcv",
number = 5,
repeats = 1))
#train results
mod2_rf_B0_lat
#test results
pred2_rf_B0_lat_test <- predict(mod2_rf_B0_lat, newdata = testset2_B0_lat)
postResample(testset2_B0_lat$LATITUDE, pred2_rf_B0_lat_test)
#validation results
pred2_rf_B0_lat_validation <- predict(object = mod2_rf_B0_lat, newdata = valid2_B0)
postResample(valid2_B0$LATITUDE, pred2_rf_B0_lat_validation)
#Building 1
#Train model
mod2_rf_B1_lat <- train(LATITUDE~.,
data = trainset2_B1_lat %>%
select(starts_with("WAP"), LATITUDE),
method = "rf",
trControl = trainControl(method = "repeatedcv",
number = 5,
repeats = 1))
#train results
mod2_rf_B1_lat
#test results
pred2_rf_B1_lat_test <- predict(mod2_rf_B1_lat, newdata = testset2_B1_lat)
postResample(testset2_B1_lat$LATITUDE, pred2_rf_B1_lat_test)
#validation results
pred2_rf_B1_lat_validation <- predict(object = mod2_rf_B1_lat, newdata = valid2_B1)
postResample(valid2_B1$LATITUDE, pred2_rf_B1_lat_validation)
#Building 2
#Train model
mod2_rf_B2_lat <- train(LATITUDE~.,
data = trainset2_B2_lat %>%
select(starts_with("WAP"), LATITUDE),
method = "rf",
trControl = trainControl(method = "repeatedcv",
number = 5,
repeats = 1))
#train results
mod2_rf_B2_lat
#test results
pred2_rf_B2_lat_test <- predict(mod2_rf_B2_lat, newdata = testset2_B2_lat)
postResample(testset2_B2_lat$LATITUDE, pred2_rf_B2_lat_test)
#validation results
pred2_rf_B2_lat_validation <- predict(object = mod2_rf_B2_lat, newdata = valid2_B2)
postResample(valid2_B2$LATITUDE, pred2_rf_B2_lat_validation)
#Building 0
# Split data into trainset and testset (80/20)
set.seed(212)
inTraining <- createDataPartition(train2_B0_lon$LONGITUDE, p = .8, list = FALSE)
trainset2_B0_lon <- train2_B0_lon[inTraining,]
testset2_B0_lon <- train2_B0_lon[-inTraining,]
#Train model
mod2_lm_B0_lon <- train(LONGITUDE~.,
data = trainset2_B0_lon %>%
select(starts_with("WAP"), LONGITUDE),
method = "lm",
trControl = trainControl(method = "repeatedcv",
number = 5,
repeats = 1))
#train results
mod2_lm_B0_lon
#test results
pred2_lm_B0_lon_test <- predict(mod2_lm_B0_lon, newdata = testset2_B0_lon)
postResample(testset2_B0_lon$LONGITUDE, pred2_lm_B0_lon_test)
#validation results
pred2_lm_B0_lon_validation <- predict(object = mod2_lm_B0_lon, newdata = valid2_B0)
postResample(valid2_B0$LONGITUDE, pred2_lm_B0_lon_validation)
#Building 1
# Split data into trainset and testset (80/20)
set.seed(212)
inTraining <- createDataPartition(train2_B1_lon$LONGITUDE, p = .8, list = FALSE)
trainset2_B1_lon <- train2_B1_lon[inTraining,]
testset2_B1_lon <- train2_B1_lon[-inTraining,]
#Train model
mod2_lm_B1_lon <- train(LONGITUDE~.,
data = trainset2_B1_lon %>%
select(starts_with("WAP"), LONGITUDE),
method = "lm",
trControl = trainControl(method = "repeatedcv",
number = 5,
repeats = 1))
#train results
mod2_lm_B1_lon
#test results
pred2_lm_B1_lon_test <- predict(mod2_lm_B1_lon, newdata = testset2_B1_lon)
postResample(testset2_B1_lon$LONGITUDE, pred2_lm_B1_lon_test)
#validation results
pred2_lm_B1_lon_validation <- predict(object = mod2_lm_B1_lon, newdata = valid2_B1)
postResample(valid2_B1$LONGITUDE, pred2_lm_B1_lon_validation)
#Building 2
# Split data into trainset and testset (80/20)
set.seed(212)
inTraining <- createDataPartition(train2_B2_lon$LONGITUDE, p = .8, list = FALSE)
trainset2_B2_lon <- train2_B2_lon[inTraining,]
testset2_B2_lon <- train2_B2_lon[-inTraining,]
#Train model
mod2_lm_B2_lon <- train(LONGITUDE~.,
data = trainset2_B2_lon %>%
select(starts_with("WAP"), LONGITUDE),
method = "lm",
trControl = trainControl(method = "repeatedcv",
number = 5,
repeats = 1))
#train results
mod2_lm_B2_lon
#test results
pred2_lm_B2_lon_test <- predict(mod2_lm_B2_lon, newdata = testset2_B2_lon)
postResample(testset2_B2_lon$LONGITUDE, pred2_lm_B2_lon_test)
#validation results
pred2_lm_B2_lon_validation <- predict(object = mod2_lm_B2_lon, newdata = valid2_B2)
postResample(valid2_B2$LONGITUDE, pred2_lm_B2_lon_validation)
#Building 0
#Train model
mod2_knn_B0_lon <- train(LONGITUDE~.,
data = trainset2_B0_lon %>%
select(starts_with("WAP"), LONGITUDE),
method = "knn",
trControl = trainControl(method = "repeatedcv",
number = 5,
repeats = 1))
#train results
mod2_knn_B0_lon
#test results
pred2_knn_B0_lon_test <- predict(mod2_knn_B0_lon, newdata = testset2_B0_lon)
postResample(testset2_B0_lon$LONGITUDE, pred2_knn_B0_lon_test)
#validation results
pred2_knn_B0_validation <- predict(object = mod2_knn_B0_lon, newdata = valid2_B0)
postResample(valid2_B0$LONGITUDE, pred2_knn_B0_validation)
#Building 1
#Train model
mod2_knn_B1_lon <- train(LONGITUDE~.,
data = trainset2_B1_lon %>%
select(starts_with("WAP"), LONGITUDE),
method = "knn",
trControl = trainControl(method = "repeatedcv",
number = 5,
repeats = 1))
#train results
mod2_knn_B1_lon
#test results
pred2_knn_B1_lon_test <- predict(mod2_knn_B1_lon, newdata = testset2_B1_lon)
postResample(testset2_B1_lon$LONGITUDE, pred2_knn_B1_lon_test)
#validation results
pred2_knn_B1_lon_validation <- predict(object = mod2_knn_B1_lon, newdata = valid2_B1)
postResample(valid2_B1$LONGITUDE, pred2_knn_B1_lon_validation)
#Building 2
#Train model
mod2_knn_B2_lon <- train(LONGITUDE~.,
data = trainset2_B2_lon %>%
select(starts_with("WAP"), LONGITUDE),
method = "knn",
trControl = trainControl(method = "repeatedcv",
number = 5,
repeats = 1))
#train results
mod2_knn_B2_lon
#test results
pred2_knn_B2_lon_test <- predict(mod2_knn_B2_lon, newdata = testset2_B2_lon)
postResample(testset2_B2_lon$LONGITUDE, pred2_knn_B2_lon_test)
#validation results
pred2_knn_B2_lon_validation <- predict(object = mod2_knn_B2_lon, newdata = valid2_B2)
postResample(valid2_B2$LONGITUDE, pred2_knn_B2_lon_validation)
#Building 0
#Train model
mod2_rf_B0_lon <- train(LONGITUDE~.,
data = trainset2_B0_lon %>%
select(starts_with("WAP"), LONGITUDE),
method = "rf",
trControl = trainControl(method = "repeatedcv",
number = 5,
repeats = 1))
#train results
mod2_rf_B0_lon
#test results
pred2_rf_B0_lon_test <- predict(mod2_rf_B0_lon, newdata = testset2_B0_lon)
postResample(testset2_B0_lon$LONGITUDE, pred2_rf_B0_lon_test)
#validation results
pred2_rf_B0_lon_validation <- predict(object = mod2_rf_B0_lon, newdata = valid2_B0)
postResample(valid2_B0$LONGITUDE, pred2_rf_B0_lon_validation)
#Building 1
#Train model
mod2_rf_B1_lon <- train(LONGITUDE~.,
data = trainset2_B1_lon %>%
select(starts_with("WAP"), LONGITUDE),
method = "rf",
trControl = trainControl(method = "repeatedcv",
number = 5,
repeats = 1))
#train results
mod2_rf_B1_lon
#test results
pred2_rf_B1_lon_test <- predict(mod2_rf_B1_lon, newdata = testset2_B1_lon)
postResample(testset2_B1_lon$LONGITUDE, pred2_rf_B1_lon_test)
#validation results
pred2_rf_B1_lon_validation <- predict(object = mod2_rf_B1_lon, newdata = valid2_B1)
postResample(valid2_B1$LONGITUDE, pred2_rf_B1_lon_validation)
#Building 2
#Train model
mod2_rf_B2_lon <- train(LONGITUDE~.,
data = trainset2_B2_lon %>%
select(starts_with("WAP"), LONGITUDE),
method = "rf",
trControl = trainControl(method = "repeatedcv",
number = 5,
repeats = 1))
#train results
mod2_rf_B2_lon
#test results
pred2_rf_B2_lon_test <- predict(mod2_rf_B2_lon, newdata = testset2_B2_lon)
postResample(testset2_B2_lon$LONGITUDE, pred2_rf_B2_lon_test)
#validation results
pred2_rf_B2_lon_validation <- predict(object = mod2_rf_B2_lon, newdata = valid2_B2)
postResample(valid2_B2$LONGITUDE, pred2_rf_B2_lon_validation)
Pre-processing (3)
training_no_outlier_rs$BUILDINGID <- as.factor(training_no_outlier_rs$BUILDINGID)
training_no_outlier_rs$USERID <- as.factor(training_no_outlier_rs$USERID)
training_no_outlier_rs$PHONEID <- as.factor(training_no_outlier_rs$PHONEID)
# remove columns with zero variance --------------------------------------
training_factorcolumns <- training_no_outlier_rs[,521:529]
training_no_outlier_rs_nzv <- training_no_outlier_rs %>%
select(starts_with("WAP")) %>%
select_if(function(x) var(x) != 0)
training_no_outlier_rs_nzv <- cbind(training_no_outlier_rs_nzv, training_factorcolumns)
# define no 0 variance columns on training
relevant_columns <- names(training_no_outlier_rs_nzv)
# select the no 0 variance columns on testing
valid_rs_nzv <- valid_rs %>%
select(relevant_columns)
# remove rows with zero variance ------------------------------------------
# nrow(training_no_outlier_rs_nzv) #19429
# ncol(training_no_outlier_rs_nzv) #470
trainvar <- training_no_outlier_rs_nzv
#training
waps_training <- trainvar[,1:464]
which(apply(waps_training, 1, var) == 0)
# remove rows with 0 variance
trainvar <- trainvar[-which(apply(waps_training, 1, var) == 0), ]
which(apply(trainvar[,1:464], 1, var) == 0) #integer(0)
# remove duplicates in training set -------------------------------------------------------
dup_tr <- duplicated(trainvar[,1:464])
duplicates_tr <- trainvar[dup_tr,]
noduplicates_tr <- trainvar[!dup_tr,]
# Subset for Modalization 3-------------------------------------------------------------------------
train3_B0 <- noduplicates_tr %>%
filter(BUILDINGID == 0)
train3_B1 <- noduplicates_tr %>%
filter(BUILDINGID == 1)
train3_B2 <- noduplicates_tr %>%
filter(BUILDINGID == 2)
valid3_B0 <- valid_rs_nzv %>%
filter(BUILDINGID == 0)
valid3_B1 <- valid_rs_nzv %>%
filter(BUILDINGID == 1)
valid3_B2 <- valid_rs_nzv %>%
filter(BUILDINGID == 2)
#Building 0
train3_B0_lat <- train3_B0[sample(1:nrow(train3_B0), 2500, replace = FALSE),]
#Building 1
train3_B1_lat <- train3_B1[sample(1:nrow(train3_B1), 2500, replace = FALSE),]
#Building 2
train3_B2_lat <- train3_B2[sample(1:nrow(train3_B2), 2500, replace = FALSE),]
#Building 0
train3_B0_lon <- train3_B0[sample(1:nrow(train3_B0), 2500, replace = FALSE),]
#Building 1
train3_B1_lon <- train3_B1[sample(1:nrow(train3_B1), 2500, replace = FALSE),]
#Building 2
train3_B2_lon <- train3_B2[sample(1:nrow(train3_B2), 2500, replace = FALSE),]
Modalization (3)
#Building 0
# Split data into trainset and testset (80/20)
set.seed(212)
inTraining <- createDataPartition(train3_B0_lat$LATITUDE, p = .8, list = FALSE)
trainset3_B0_lat <- train3_B0_lat[inTraining,]
testset3_B0_lat <- train3_B0_lat[-inTraining,]
#Train model
mod3_lm_B0_lat <- train(LATITUDE~.,
data = trainset3_B0_lat %>%
select(starts_with("WAP"), LATITUDE),
method = "lm",
trControl = trainControl(method = "repeatedcv",
number = 5,
repeats = 1))
#train results
mod3_lm_B0_lat
#test results
pred3_lm_B0_lat_test <- predict(mod3_lm_B0_lat, newdata = testset3_B0_lat)
postResample(testset3_B0_lat$LATITUDE, pred3_lm_B0_lat_test)
#validation results
pred3_lm_B0_lat_validation <- predict(object = mod3_lm_B0_lat, newdata = valid3_B0)
postResample(valid3_B0$LATITUDE, pred3_lm_B0_lat_validation)
#Building 1
# Split data into trainset and testset (80/20)
set.seed(212)
inTraining <- createDataPartition(train3_B1_lat$LATITUDE, p = .8, list = FALSE)
trainset3_B1_lat <- train3_B1_lat[inTraining,]
testset3_B1_lat <- train3_B1_lat[-inTraining,]
#Train model
mod3_lm_B1_lat <- train(LATITUDE~.,
data = trainset3_B1_lat %>%
select(starts_with("WAP"), LATITUDE),
method = "lm",
trControl = trainControl(method = "repeatedcv",
number = 5,
repeats = 1))
#train results
mod3_lm_B1_lat
#test results
pred3_lm_B1_lat_test <- predict(mod3_lm_B1_lat, newdata = testset3_B1_lat)
postResample(testset3_B1_lat$LATITUDE, pred3_lm_B1_lat_test)
#validation results
pred3_lm_B1_lat_validation <- predict(object = mod3_lm_B1_lat, newdata = valid3_B1)
postResample(valid3_B1$LATITUDE, pred3_lm_B1_lat_validation)
#Building 2
# Split data into trainset and testset (80/20)
set.seed(212)
inTraining <- createDataPartition(train3_B2_lat$LATITUDE, p = .8, list = FALSE)
trainset3_B2_lat <- train3_B2_lat[inTraining,]
testset3_B2_lat <- train3_B2_lat[-inTraining,]
#Train model
mod3_lm_B2_lat <- train(LATITUDE~.,
data = trainset3_B2_lat %>%
select(starts_with("WAP"), LATITUDE),
method = "lm",
trControl = trainControl(method = "repeatedcv",
number = 5,
repeats = 1))
#train results
mod3_lm_B2_lat
#test results
pred3_lm_B2_lat_test <- predict(mod3_lm_B2_lat, newdata = testset3_B2_lat)
postResample(testset3_B2_lat$LATITUDE, pred3_lm_B2_lat_test)
#validation results
pred3_lm_B2_lat_validation <- predict(object = mod3_lm_B2_lat, newdata = valid3_B2)
postResample(valid3_B2$LATITUDE, pred3_lm_B2_lat_validation)
#Building 0
#Train model
mod3_knn_B0_lat <- train(LATITUDE~.,
data = trainset3_B0_lat %>%
select(starts_with("WAP"), LATITUDE),
method = "knn",
trControl = trainControl(method = "repeatedcv",
number = 5,
repeats = 1))
#train results
mod3_knn_B0_lat
#test results
pred3_knn_B0_lat_test <- predict(mod3_knn_B0_lat, newdata = testset3_B0_lat)
postResample(testset3_B0_lat$LATITUDE, pred3_knn_B0_lat_test)
#validation results
pred3_knn_B0_lat_validation <- predict(object = mod3_knn_B0_lat, newdata = valid3_B0)
postResample(valid3_B0$LATITUDE, pred3_knn_B0_lat_validation)
#Building 1
#Train model
mod3_knn_B1_lat <- train(LATITUDE~.,
data = trainset3_B1_lat %>%
select(starts_with("WAP"), LATITUDE),
method = "knn",
trControl = trainControl(method = "repeatedcv",
number = 5,
repeats = 1))
#train results
mod3_knn_B1_lat
#test results
pred3_knn_B1_lat_test <- predict(mod3_knn_B1_lat, newdata = testset3_B1_lat)
postResample(testset3_B1_lat$LATITUDE, pred3_knn_B1_lat_test)
#validation results
pred3_knn_B1_lat_validation <- predict(object = mod3_knn_B1_lat, newdata = valid3_B1)
postResample(valid3_B1$LATITUDE, pred3_knn_B1_lat_validation)
#Building 2
#Train model
mod3_knn_B2_lat <- train(LATITUDE~.,
data = trainset3_B2_lat %>%
select(starts_with("WAP"), LATITUDE),
method = "knn",
trControl = trainControl(method = "repeatedcv",
number = 5,
repeats = 1))
#train results
mod3_knn_B2_lat
#test results
pred3_knn_B2_lat_test <- predict(mod3_knn_B2_lat, newdata = testset3_B2_lat)
postResample(testset3_B2_lat$LATITUDE, pred3_knn_B2_lat_test)
#validation results
pred3_knn_B2_lat_validation <- predict(object = mod3_knn_B2_lat, newdata = valid3_B2)
postResample(valid3_B2$LATITUDE, pred3_knn_B2_lat_validation)
#Building 0
#Train model
mod3_rf_B0_lat <- train(LATITUDE~.,
data = trainset3_B0_lat %>%
select(starts_with("WAP"), LATITUDE),
method = "rf",
trControl = trainControl(method = "repeatedcv",
number = 5,
repeats = 1))
#train results
mod3_rf_B0_lat
#test results
pred3_rf_B0_lat_test <- predict(mod3_rf_B0_lat, newdata = testset3_B0_lat)
postResample(testset3_B0_lat$LATITUDE, pred3_rf_B0_lat_test)
#validation results
pred3_rf_B0_lat_validation <- predict(object = mod3_rf_B0_lat, newdata = valid3_B0)
postResample(valid3_B0$LATITUDE, pred3_rf_B0_lat_validation)
#Building 1
#Train model
mod3_rf_B1_lat <- train(LATITUDE~.,
data = trainset3_B1_lat %>%
select(starts_with("WAP"), LATITUDE),
method = "rf",
trControl = trainControl(method = "repeatedcv",
number = 5,
repeats = 1))
#train results
mod3_rf_B1_lat
#test results
pred3_rf_B1_lat_test <- predict(mod3_rf_B1_lat, newdata = testset3_B1_lat)
postResample(testset3_B1_lat$LATITUDE, pred3_rf_B1_lat_test)
#validation results
pred3_rf_B1_lat_validation <- predict(object = mod3_rf_B1_lat, newdata = valid3_B1)
postResample(valid3_B1$LATITUDE, pred3_rf_B1_lat_validation)
#Building 2
#Train model
mod3_rf_B2_lat <- train(LATITUDE~.,
data = trainset3_B2_lat %>%
select(starts_with("WAP"), LATITUDE),
method = "rf",
trControl = trainControl(method = "repeatedcv",
number = 5,
repeats = 1))
#train results
mod3_rf_B2_lat
#test results
pred3_rf_B2_lat_test <- predict(mod3_rf_B2_lat, newdata = testset3_B2_lat)
postResample(testset3_B2_lat$LATITUDE, pred3_rf_B2_lat_test)
#validation results
pred3_rf_B2_lat_validation <- predict(object = mod3_rf_B2_lat, newdata = valid3_B2)
postResample(valid3_B2$LATITUDE, pred3_rf_B2_lat_validation)
#Building 0
# Split data into trainset and testset (80/20)
set.seed(212)
inTraining <- createDataPartition(train3_B0_lon$LONGITUDE, p = .8, list = FALSE)
trainset3_B0_lon <- train3_B0_lon[inTraining,]
testset3_B0_lon <- train3_B0_lon[-inTraining,]
#Train model
mod3_lm_B0_lon <- train(LONGITUDE~.,
data = trainset3_B0_lon %>%
select(starts_with("WAP"), LONGITUDE),
method = "lm",
trControl = trainControl(method = "repeatedcv",
number = 5,
repeats = 1))
#train results
mod3_lm_B0_lon
#test results
pred3_lm_B0_lon_test <- predict(mod3_lm_B0_lon, newdata = testset3_B0_lon)
postResample(testset3_B0_lon$LONGITUDE, pred3_lm_B0_lon_test)
#validation results
pred3_lm_B0_lon_validation <- predict(object = mod3_lm_B0_lon, newdata = valid3_B0)
postResample(valid3_B0$LONGITUDE, pred3_lm_B0_lon_validation)
#Building 1
# Split data into trainset and testset (80/20)
set.seed(212)
inTraining <- createDataPartition(train3_B1_lon$LONGITUDE, p = .8, list = FALSE)
trainset3_B1_lon <- train3_B1_lon[inTraining,]
testset3_B1_lon <- train3_B1_lon[-inTraining,]
#Train model
mod3_lm_B1_lon <- train(LONGITUDE~.,
data = trainset3_B1_lon %>%
select(starts_with("WAP"), LONGITUDE),
method = "lm",
trControl = trainControl(method = "repeatedcv",
number = 5,
repeats = 1))
#train results
mod3_lm_B1_lon
#test results
pred3_lm_B1_lon_test <- predict(mod3_lm_B1_lon, newdata = testset3_B1_lon)
postResample(testset3_B1_lon$LONGITUDE, pred3_lm_B1_lon_test)
#validation results
pred3_lm_B1_lon_validation <- predict(object = mod3_lm_B1_lon, newdata = valid3_B1)
postResample(valid3_B1$LONGITUDE, pred3_lm_B1_lon_validation)
#Building 2
# Split data into trainset and testset (80/20)
set.seed(212)
inTraining <- createDataPartition(train3_B2_lon$LONGITUDE, p = .8, list = FALSE)
trainset3_B2_lon <- train3_B2_lon[inTraining,]
testset3_B2_lon <- train3_B2_lon[-inTraining,]
#Train model
mod3_lm_B2_lon <- train(LONGITUDE~.,
data = trainset3_B2_lon %>%
select(starts_with("WAP"), LONGITUDE),
method = "lm",
trControl = trainControl(method = "repeatedcv",
number = 5,
repeats = 1))
#train results
mod3_lm_B2_lon
#test results
pred3_lm_B2_lon_test <- predict(mod3_lm_B2_lon, newdata = testset3_B2_lon)
postResample(testset3_B2_lon$LONGITUDE, pred3_lm_B2_lon_test)
#validation results
pred3_lm_B2_lon_validation <- predict(object = mod3_lm_B2_lon, newdata = valid3_B2)
postResample(valid3_B2$LONGITUDE, pred3_lm_B2_lon_validation)
#Building 0
#Train model
mod3_knn_B0_lon <- train(LONGITUDE~.,
data = trainset3_B0_lon %>%
select(starts_with("WAP"), LONGITUDE),
method = "knn",
trControl = trainControl(method = "repeatedcv",
number = 5,
repeats = 1))
#train results
mod3_knn_B0_lon
#test results
pred3_knn_B0_lon_test <- predict(mod3_knn_B0_lon, newdata = testset3_B0_lon)
postResample(testset3_B0_lon$LONGITUDE, pred3_knn_B0_lon_test)
#validation results
pred3_knn_B0_validation <- predict(object = mod3_knn_B0_lon, newdata = valid3_B0)
postResample(valid3_B0$LONGITUDE, pred3_knn_B0_validation)
#Building 1
#Train model
mod3_knn_B1_lon <- train(LONGITUDE~.,
data = trainset3_B1_lon %>%
select(starts_with("WAP"), LONGITUDE),
method = "knn",
trControl = trainControl(method = "repeatedcv",
number = 5,
repeats = 1))
#train results
mod3_knn_B1_lon
#test results
pred3_knn_B1_lon_test <- predict(mod3_knn_B1_lon, newdata = testset3_B1_lon)
postResample(testset3_B1_lon$LONGITUDE, pred3_knn_B1_lon_test)
#validation results
pred3_knn_B1_lon_validation <- predict(object = mod3_knn_B1_lon, newdata = valid3_B1)
postResample(valid3_B1$LONGITUDE, pred3_knn_B1_lon_validation)
#Building 2
#Train model
mod3_knn_B2_lon <- train(LONGITUDE~.,
data = trainset3_B2_lon %>%
select(starts_with("WAP"), LONGITUDE),
method = "knn",
trControl = trainControl(method = "repeatedcv",
number = 5,
repeats = 1))
#train results
mod3_knn_B2_lon
#test results
pred3_knn_B2_lon_test <- predict(mod3_knn_B2_lon, newdata = testset3_B2_lon)
postResample(testset3_B2_lon$LONGITUDE, pred3_knn_B2_lon_test)
#validation results
pred3_knn_B2_lon_validation <- predict(object = mod3_knn_B2_lon, newdata = valid3_B2)
postResample(valid3_B2$LONGITUDE, pred3_knn_B2_lon_validation)
#Building 0
#Train model
mod3_rf_B0_lon <- train(LONGITUDE~.,
data = trainset3_B0_lon %>%
select(starts_with("WAP"), LONGITUDE),
method = "rf",
trControl = trainControl(method = "repeatedcv",
number = 5,
repeats = 1))
#train results
mod3_rf_B0_lon
#test results
pred3_rf_B0_lon_test <- predict(mod3_rf_B0_lon, newdata = testset3_B0_lon)
postResample(testset3_B0_lon$LONGITUDE, pred3_rf_B0_lon_test)
#validation results
pred3_rf_B0_lon_validation <- predict(object = mod3_rf_B0_lon, newdata = valid3_B0)
postResample(valid3_B0$LONGITUDE, pred3_rf_B0_lon_validation)
#Building 1
#Train model
mod3_rf_B1_lon <- train(LONGITUDE~.,
data = trainset3_B1_lon %>%
select(starts_with("WAP"), LONGITUDE),
method = "rf",
trControl = trainControl(method = "repeatedcv",
number = 5,
repeats = 1))
#train results
mod3_rf_B1_lon
#test results
pred3_rf_B1_lon_test <- predict(mod3_rf_B1_lon, newdata = testset3_B1_lon)
postResample(testset3_B1_lon$LONGITUDE, pred3_rf_B1_lon_test)
#validation results
pred3_rf_B1_lon_validation <- predict(object = mod3_rf_B1_lon, newdata = valid3_B1)
postResample(valid3_B1$LONGITUDE, pred3_rf_B1_lon_validation)
#Building 2
#Train model
mod3_rf_B2_lon <- train(LONGITUDE~.,
data = trainset3_B2_lon %>%
select(starts_with("WAP"), LONGITUDE),
method = "rf",
trControl = trainControl(method = "repeatedcv",
number = 5,
repeats = 1))
#train results
mod3_rf_B2_lon
#test results
pred3_rf_B2_lon_test <- predict(mod3_rf_B2_lon, newdata = testset3_B2_lon)
postResample(testset3_B2_lon$LONGITUDE, pred3_rf_B2_lon_test)
#validation results
pred3_rf_B2_lon_validation <- predict(object = mod3_rf_B2_lon, newdata = valid3_B2)
postResample(valid3_B2$LONGITUDE, pred3_rf_B2_lon_validation)
Pre-processing (4)
# remove phoneid 17 --------------------------------------------------------
noduplicates_tr <- noduplicates_tr %>%
filter(PHONEID != 17)
valid %>%
filter(PHONEID == 17)
#No phoneid in validationset
# Normalize rows to 0 and 1 -----------------------------------------------
# train
nonnumeric_tr <- noduplicates_tr %>%
select(LONGITUDE:PHONEID)
norm_min_max <- function(row) {
(row - min(row))/(max(row) - min(row))
}
train2 <- noduplicates_tr %>%
select(starts_with("WAP")) %>%
apply(1, function(x) norm_min_max(x)) %>%
t() %>%
as_tibble()
train3 <- cbind(train2, nonnumeric_tr)
# validation
nonnumeric_val <- valid_rs_nzv %>%
select(LONGITUDE:PHONEID)
norm_min_max <- function(row) {
(row - min(row))/(max(row) - min(row))
}
valid2 <- valid_rs_nzv %>%
select(starts_with("WAP")) %>%
apply(1, function(x) norm_min_max(x)) %>%
t() %>%
as_tibble()
valid3 <- cbind(valid2, nonnumeric_val)
# Subset by building for modalization 4------------------------------------------------------
train4_B0 <- train3 %>%
filter(BUILDINGID == 0)
train4_B1 <- train3 %>%
filter(BUILDINGID == 1)
train4_B2 <- train3 %>%
filter(BUILDINGID == 2)
valid4_B0 <- valid3 %>%
filter(BUILDINGID == 0)
valid4_B1 <- valid3 %>%
filter(BUILDINGID == 1)
valid4_B2 <- valid3 %>%
filter(BUILDINGID == 2)
#Building 0
train4_B0_lat <- train4_B0[sample(1:nrow(train4_B0), 1500, replace = FALSE),]
#Building 1
train4_B1_lat <- train4_B1[sample(1:nrow(train4_B1), 1500, replace = FALSE),]
#Building 2
train4_B2_lat <- train4_B2[sample(1:nrow(train4_B2), 1500, replace = FALSE),]
#Building 0
train4_B0_lon <- train4_B0[sample(1:nrow(train4_B0), 1500, replace = FALSE),]
#Building 1
train4_B1_lon <- train4_B1[sample(1:nrow(train4_B1), 1500, replace = FALSE),]
#Building 2
train4_B2_lon <- train4_B2[sample(1:nrow(train4_B2), 1500, replace = FALSE),]
Modalization (4)
#Building 0
# Split data into trainset and testset (80/20)
set.seed(212)
inTraining <- createDataPartition(train4_B0_lat$LATITUDE, p = .8, list = FALSE)
trainset4_B0_lat <- train4_B0_lat[inTraining,]
testset4_B0_lat <- train4_B0_lat[-inTraining,]
#Train model
mod4_lm_B0_lat <- train(LATITUDE~.,
data = trainset4_B0_lat %>%
select(starts_with("WAP"), LATITUDE),
method = "lm",
trControl = trainControl(method = "repeatedcv",
number = 5,
repeats = 1))
#train results
mod4_lm_B0_lat
#test results
pred4_lm_B0_lat_test <- predict(mod4_lm_B0_lat, newdata = testset4_B0_lat)
postResample(testset4_B0_lat$LATITUDE, pred4_lm_B0_lat_test)
#validation results
pred4_lm_B0_lat_validation <- predict(object = mod4_lm_B0_lat, newdata = valid4_B0)
postResample(valid4_B0$LATITUDE, pred4_lm_B0_lat_validation)
#Building 1
# Split data into trainset and testset (80/20)
set.seed(212)
inTraining <- createDataPartition(train4_B1_lat$LATITUDE, p = .8, list = FALSE)
trainset4_B1_lat <- train4_B1_lat[inTraining,]
testset4_B1_lat <- train4_B1_lat[-inTraining,]
#Train model
mod4_lm_B1_lat <- train(LATITUDE~.,
data = trainset4_B1_lat %>%
select(starts_with("WAP"), LATITUDE),
method = "lm",
trControl = trainControl(method = "repeatedcv",
number = 5,
repeats = 1))
#train results
mod4_lm_B1_lat
#test results
pred4_lm_B1_lat_test <- predict(mod4_lm_B1_lat, newdata = testset4_B1_lat)
postResample(testset4_B1_lat$LATITUDE, pred4_lm_B1_lat_test)
#validation results
pred4_lm_B1_lat_validation <- predict(object = mod4_lm_B1_lat, newdata = valid4_B1)
postResample(valid4_B1$LATITUDE, pred4_lm_B1_lat_validation)
#Building 2
# Split data into trainset and testset (80/20)
set.seed(212)
inTraining <- createDataPartition(train4_B2_lat$LATITUDE, p = .8, list = FALSE)
trainset4_B2_lat <- train4_B2_lat[inTraining,]
testset4_B2_lat <- train4_B2_lat[-inTraining,]
#Train model
mod4_lm_B2_lat <- train(LATITUDE~.,
data = trainset4_B2_lat %>%
select(starts_with("WAP"), LATITUDE),
method = "lm",
trControl = trainControl(method = "repeatedcv",
number = 5,
repeats = 1))
#train results
mod4_lm_B2_lat
#test results
pred4_lm_B2_lat_test <- predict(mod4_lm_B2_lat, newdata = testset4_B2_lat)
postResample(testset4_B2_lat$LATITUDE, pred4_lm_B2_lat_test)
#validation results
pred4_lm_B2_lat_validation <- predict(object = mod4_lm_B2_lat, newdata = valid4_B2)
postResample(valid4_B2$LATITUDE, pred4_lm_B2_lat_validation)
#Building 0
#Train model
mod4_knn_B0_lat <- train(LATITUDE~.,
data = trainset4_B0_lat %>%
select(starts_with("WAP"), LATITUDE),
method = "knn",
trControl = trainControl(method = "repeatedcv",
number = 5,
repeats = 1))
#train results
mod4_knn_B0_lat
#test results
pred4_knn_B0_lat_test <- predict(mod4_knn_B0_lat, newdata = testset4_B0_lat)
postResample(testset4_B0_lat$LATITUDE, pred4_knn_B0_lat_test)
#validation results
pred4_knn_B0_lat_validation <- predict(object = mod4_knn_B0_lat, newdata = valid4_B0)
postResample(valid4_B0$LATITUDE, pred4_knn_B0_lat_validation)
#Building 1
#Train model
mod4_knn_B1_lat <- train(LATITUDE~.,
data = trainset4_B1_lat %>%
select(starts_with("WAP"), LATITUDE),
method = "knn",
trControl = trainControl(method = "repeatedcv",
number = 5,
repeats = 1))
#train results
mod4_knn_B1_lat
#test results
pred4_knn_B1_lat_test <- predict(mod4_knn_B1_lat, newdata = testset4_B1_lat)
postResample(testset4_B1_lat$LATITUDE, pred4_knn_B1_lat_test)
#validation results
pred4_knn_B1_lat_validation <- predict(object = mod4_knn_B1_lat, newdata = valid4_B1)
postResample(valid4_B1$LATITUDE, pred4_knn_B1_lat_validation)
#Building 2
#Train model
mod4_knn_B2_lat <- train(LATITUDE~.,
data = trainset4_B2_lat %>%
select(starts_with("WAP"), LATITUDE),
method = "knn",
trControl = trainControl(method = "repeatedcv",
number = 5,
repeats = 1))
#train results
mod4_knn_B2_lat
#test results
pred4_knn_B2_lat_test <- predict(mod4_knn_B2_lat, newdata = testset4_B2_lat)
postResample(testset4_B2_lat$LATITUDE, pred4_knn_B2_lat_test)
#validation results
pred4_knn_B2_lat_validation <- predict(object = mod4_knn_B2_lat, newdata = valid4_B2)
postResample(valid4_B2$LATITUDE, pred4_knn_B2_lat_validation)
#Building 0
#Train model
mod4_rf_B0_lat <- train(LATITUDE~.,
data = trainset4_B0_lat %>%
select(starts_with("WAP"), LATITUDE),
method = "rf",
trControl = trainControl(method = "repeatedcv",
number = 5,
repeats = 1))
#train results
mod4_rf_B0_lat
#test results
pred4_rf_B0_lat_test <- predict(mod4_rf_B0_lat, newdata = testset4_B0_lat)
postResample(testset4_B0_lat$LATITUDE, pred4_rf_B0_lat_test)
#validation results
pred4_rf_B0_lat_validation <- predict(object = mod4_rf_B0_lat, newdata = valid4_B0)
postResample(valid4_B0$LATITUDE, pred4_rf_B0_lat_validation)
#Building 1
#Train model
mod4_rf_B1_lat <- train(LATITUDE~.,
data = trainset4_B1_lat %>%
select(starts_with("WAP"), LATITUDE),
method = "rf",
trControl = trainControl(method = "repeatedcv",
number = 5,
repeats = 1))
#train results
mod4_rf_B1_lat
#test results
pred4_rf_B1_lat_test <- predict(mod4_rf_B1_lat, newdata = testset4_B1_lat)
postResample(testset4_B1_lat$LATITUDE, pred4_rf_B1_lat_test)
#validation results
pred4_rf_B1_lat_validation <- predict(object = mod4_rf_B1_lat, newdata = valid4_B1)
postResample(valid4_B1$LATITUDE, pred_rf4_B1_lat_validation)
#Building 2
#Train model
mod4_rf_B2_lat <- train(LATITUDE~.,
data = trainset4_B2_lat %>%
select(starts_with("WAP"), LATITUDE),
method = "rf",
trControl = trainControl(method = "repeatedcv",
number = 5,
repeats = 1))
#train results
mod4_rf_B2_lat
#test results
pred4_rf_B2_lat_test <- predict(mod4_rf_B2_lat, newdata = testset4_B2_lat)
postResample(testset4_B2_lat$LATITUDE, pred4_rf_B2_lat_test)
#validation results
pred4_rf_B2_lat_validation <- predict(object = mod4_rf_B2_lat, newdata = valid4_B2)
postResample(valid4_B2$LATITUDE, pred4_rf_B2_lat_validation)
#Building 0
# Split data into trainset and testset (80/20)
set.seed(212)
inTraining <- createDataPartition(train4_B0_lon$LONGITUDE, p = .8, list = FALSE)
trainset4_B0_lon <- train4_B0_lon[inTraining,]
testset4_B0_lon <- train4_B0_lon[-inTraining,]
#Train model
mod4_lm_B0_lon <- train(LONGITUDE~.,
data = trainset4_B0_lon %>%
select(starts_with("WAP"), LONGITUDE),
method = "lm",
trControl = trainControl(method = "repeatedcv",
number = 5,
repeats = 1))
#train results
mod4_lm_B0_lon
#test results
pred4_lm_B0_lon_test <- predict(mod4_lm_B0_lon, newdata = testset4_B0_lon)
postResample(testset4_B0_lon$LONGITUDE, pred4_lm_B0_lon_test)
#validation results
pred4_lm_B0_lon_validation <- predict(object = mod4_lm_B0_lon, newdata = valid4_B0)
postResample(valid4_B0$LONGITUDE, pred4_lm_B0_lon_validation)
#Building 1
# Split data into trainset and testset (80/20)
set.seed(212)
inTraining <- createDataPartition(train4_B1_lon$LONGITUDE, p = .8, list = FALSE)
trainset4_B1_lon <- train4_B1_lon[inTraining,]
testset4_B1_lon <- train4_B1_lon[-inTraining,]
#Train model
mod4_lm_B1_lon <- train(LONGITUDE~.,
data = trainset4_B1_lon %>%
select(starts_with("WAP"), LONGITUDE),
method = "lm",
trControl = trainControl(method = "repeatedcv",
number = 5,
repeats = 1))
#train results
mod4_lm_B1_lon
#test results
pred4_lm_B1_lon_test <- predict(mod4_lm_B1_lon, newdata = testset4_B1_lon)
postResample(testset4_B1_lon$LONGITUDE, pred4_lm_B1_lon_test)
#validation results
pred4_lm_B1_lon_validation <- predict(object = mod4_lm_B1_lon, newdata = valid4_B1)
postResample(valid4_B1$LONGITUDE, pred4_lm_B1_lon_validation)
#Building 2
# Split data into trainset and testset (80/20)
set.seed(212)
inTraining <- createDataPartition(train4_B2_lon$LONGITUDE, p = .8, list = FALSE)
trainset4_B2_lon <- train4_B2_lon[inTraining,]
testset4_B2_lon <- train4_B2_lon[-inTraining,]
#Train model
mod4_lm_B2_lon <- train(LONGITUDE~.,
data = trainset4_B2_lon %>%
select(starts_with("WAP"), LONGITUDE),
method = "lm",
trControl = trainControl(method = "repeatedcv",
number = 5,
repeats = 1))
#train results
mod4_lm_B2_lon
#test results
pred4_lm_B2_lon_test <- predict(mod4_lm_B2_lon, newdata = testset4_B2_lon)
postResample(testset4_B2_lon$LONGITUDE, pred4_lm_B2_lon_test)
#validation results
pred4_lm_B2_lon_validation <- predict(object = mod4_lm_B2_lon, newdata = valid4_B2)
postResample(valid4_B2$LONGITUDE, pred4_lm_B2_lon_validation)
#Building 0
#Train model
mod4_knn_B0_lon <- train(LONGITUDE~.,
data = trainset4_B0_lon %>%
select(starts_with("WAP"), LONGITUDE),
method = "knn",
trControl = trainControl(method = "repeatedcv",
number = 5,
repeats = 1))
#train results
mod4_knn_B0_lon
#test results
pred4_knn_B0_lon_test <- predict(mod4_knn_B0_lon, newdata = testset4_B0_lon)
postResample(testset4_B0_lon$LONGITUDE, pred4_knn_B0_lon_test)
#validation results
pred4_knn_B0_lon_validation <- predict(object = mod4_knn_B0_lon, newdata = valid4_B0)
postResample(valid4_B0$LONGITUDE, pred4_knn_B0_lon_validation)
#Building 1
#Train model
mod4_knn_B1_lon <- train(LONGITUDE~.,
data = trainset4_B1_lon %>%
select(starts_with("WAP"), LONGITUDE),
method = "knn",
trControl = trainControl(method = "repeatedcv",
number = 5,
repeats = 1))
#train results
mod4_knn_B1_lon
#test results
pred4_knn_B1_lon_test <- predict(mod4_knn_B1_lon, newdata = testset4_B1_lon)
postResample(testset4_B1_lon$LONGITUDE, pred4_knn_B1_lon_test)
#validation results
pred4_knn_B1_lon_validation <- predict(object = mod4_knn_B1_lon, newdata = valid4_B1)
postResample(valid4_B1$LONGITUDE, pred4_knn_B1_lon_validation)
#Building 2
#Train model
mod4_knn_B2_lon <- train(LONGITUDE~.,
data = trainset4_B2_lon %>%
select(starts_with("WAP"), LONGITUDE),
method = "knn",
trControl = trainControl(method = "repeatedcv",
number = 5,
repeats = 1))
#train results
mod4_knn_B2_lon
#test results
pred4_knn_B2_lon_test <- predict(mod4_knn_B2_lon, newdata = testset4_B2_lon)
postResample(testset4_B2_lon$LONGITUDE, pred4_knn_B2_lon_test)
#validation results
pred4_knn_B2_lon_validation <- predict(object = mod4_knn_B2_lon, newdata = valid4_B2)
postResample(valid4_B2$LONGITUDE, pred4_knn_B2_lon_validation)
#Building 0
#Train model
mod4_rf_B0_lon <- train(LONGITUDE~.,
data = trainset4_B0_lon %>%
select(starts_with("WAP"), LONGITUDE),
method = "rf",
trControl = trainControl(method = "repeatedcv",
number = 5,
repeats = 1))
#train results
mod4_rf_B0_lon
#test results
pred4_rf_B0_lon_test <- predict(mod4_rf_B0_lon, newdata = testset4_B0_lon)
postResample(testset4_B0_lon$LONGITUDE, pred4_rf_B0_lon_test)
#validation results
pred4_rf_B0_lon_validation <- predict(object = mod4_rf_B0_lon, newdata = valid4_B0)
postResample(valid4_B0$LONGITUDE, pred4_rf_B0_lon_validation)
#Building 1
#Train model
mod4_rf_B1_lon <- train(LONGITUDE~.,
data = trainset4_B1_lon %>%
select(starts_with("WAP"), LONGITUDE),
method = "rf",
trControl = trainControl(method = "repeatedcv",
number = 5,
repeats = 1))
#train results
mod4_rf_B1_lon
#test results
pred4_rf_B1_lon_test <- predict(mod4_rf_B1_lon, newdata = testset4_B1_lon)
postResample(testset4_B1_lon$LONGITUDE, pred4_rf_B1_lon_test)
#validation results
pred4_rf_B1_lon_validation <- predict(object = mod4_rf_B1_lon, newdata = valid4_B1)
postResample(valid4_B1$LONGITUDE, pred4_rf_B1_lon_validation)
#Building 2
#Train model
mod4_rf_B2_lon <- train(LONGITUDE~.,
data = trainset4_B2_lon %>%
select(starts_with("WAP"), LONGITUDE),
method = "rf",
trControl = trainControl(method = "repeatedcv",
number = 5,
repeats = 1))
#train results
mod4_rf_B2_lon
#test results
pred4_rf_B2_lon_test <- predict(mod4_rf_B2_lon, newdata = testset4_B2_lon)
postResample(testset4_B2_lon$LONGITUDE, pred4_rf_B2_lon_test)
#validation results
pred4_rf_B2_lon_validation <- predict(object = mod4_rf_B2_lon, newdata = valid4_B2)
postResample(valid4_B2$LONGITUDE, pred4_rf_B2_lon_validation)
***
We defined a good prediction as an error range of less than 7.5 meters, whereas a moderate prediction was defined as >7.5 and <15 meters, and a bad prediction as >15 meters. The range was calculated by Euclidean distance, by taking the square root of the sum of squared errors of latitude and longitude. Plotting Manhattan distance gave similar results. k-NN had the best quality predictions reflected by a better good/moderate/bad distribution compared with LM and Random forest (55%/32%/13%).
For this regression problem we used 3 Machine Learning algorithms. kNN and Random forest had the best performance metrics with median errors of 3.5 vs. 4.0 meter (latitude) and 3.6 vs. 5.1 meter (longitude) respectively. However it should be noted that the standard deviation of the errors are relatively large, as illustrated by the density plots. The outliers are mostly located in building 2. Rescaling of the WAP values (to 0-105) during pre-processing (2) resulted in the largest relative improvement in performance metrics, compared with other pre-processing steps. Filtering out the range -30 to 0 dBm was necessary, since -30 dBm is the maximal achievable RSSI value.
The performance of Wi-Fi fingerprinting used for indoor positioning by predicting latitude and longitude is relatively inaccurate. However the interpretation depends on how we define the treshold for a good prediction. kNN can be considered as a feasible method if we accept an error range of 7.5 meter.
Reduce redundant access points Some WAPs are located close to eachother providing the same information, it will improve the model if the WAPs are equally distributed across the buildings.
Analysis of internal structure Also, it could be useful to perform analysis on how the internal structure of the building is related to the access points. Are there spaces where the access points are blocked by a wall, or closet that interfere with the WAP signal? Maybe reorganizing the access points will minimize this interference.
Reduce time between training and validation, and update/refine the model Third, reducing the time between training and validation using the same WAPs will increase the generalizability. The validation fingerprints were taken 3 months later and 55 new WAPs appeared in the validation phase. The implication is that, whenever new WAPs are introduced, it is important to update and (re)train the models, using the new WAPs as predictors to achieve better results.