Background

Our client is developing a system to be deployed on large industrial campuses, shopping malls, etc to help people to navigate a complex, unfamiliar interior space without getting lost. While GPS works fairly reliably outdoors, it generally doesn’t work indoors, so a different technology is necessary. Our client would like us to investigate the feasibility of using “wifi fingerprinting” to determine a person’s location in indoor spaces. Wifi fingerprinting uses the signals from multiple wifi hotspots within the building to determine location, analogously to how GPS uses satellite signals.

Objective

The objective of the task is to evaluate the application of machine learning techniques to the problem of indoor locationing via wifi fingerprinting. For this task we predicted LATITUDE and LONGITUDE using WAP signals (regression problem). Three machine learning techniques were selected: 1) Linear Regression, 2) k-NN, and 3) Random forest. The results are summmarized in the plots at the bottom of this report.

Initial exploration of dataset

The dataset contains the following attributes:

There were no missing values.


Loading the required packages
library(dplyr)
library(tidyr)
library(ggplot2)
library(tidyverse)
library(caret)
library(readxl)
library(rmarkdown)
library(writexl)
library(gridExtra)
Loading the datasets “trainingData” and “validationData”
training <- read_csv("C:/Users/Y.S. Kim/Desktop/Ubiqum/Wifi/Data/trainingData.csv")
valid <- read_csv("C:/Users/Y.S. Kim/Desktop/Ubiqum/Wifi/Data/validationData.csv")

Pre-processing (1)

The trainingdataset is very large (19937 observations). Two methods to overcome this problem was:

  • Subset by building
# Subset by building
train1_B0 <- training %>% 
  filter(BUILDINGID == 0)
train1_B1 <- training %>% 
  filter(BUILDINGID == 1)
train1_B2 <- training %>% 
  filter(BUILDINGID == 2)

valid1_B0 <- valid %>% 
  filter(BUILDINGID == 0)
valid1_B1 <- valid %>% 
  filter(BUILDINGID == 1)
valid1_B2 <- valid %>% 
  filter(BUILDINGID == 2)
  • Take random samples from the dataset
#take a sample (n=2500)
#Building 0
train1_B0_lat <- train1_B0[sample(1:nrow(train1_B0), 2500, replace = FALSE),]
#Building 1
train1_B1_lat <- train1_B1[sample(1:nrow(train1_B1), 2500, replace = FALSE),]
#Building 2
train1_B2_lat <- train1_B2[sample(1:nrow(train1_B2), 2500, replace = FALSE),]

#Building 0
train1_B0_lon <- train1_B0[sample(1:nrow(train1_B0), 2500, replace = FALSE),]
#Building 1
train1_B1_lon <- train1_B1[sample(1:nrow(train1_B1), 2500, replace = FALSE),]
#Building 2
train1_B2_lon <- train1_B2[sample(1:nrow(train1_B2), 2500, replace = FALSE),]

Modalization (1)

Latitude

  • Linear Regression
#Building 0
# Split data into trainset and testset (80/20)
set.seed(212)
inTraining <- createDataPartition(train1_B0_lat$LATITUDE, p = .8, list = FALSE)
trainset1_B0_lat <- train1_B0_lat[inTraining,]
testset1_B0_lat <- train1_B0_lat[-inTraining,]

#Train model
mod1_lm_B0_lat <- train(LATITUDE~., 
                        data = trainset1_B0_lat %>%
                          select(starts_with("WAP"), LATITUDE),
                        method = "lm",
                        trControl = trainControl(method = "repeatedcv", 
                                                 number = 5, 
                                                 repeats = 1))
#train results
mod1_lm_B0_lat

#test results
pred1_lm_B0_lat_test <- predict(mod1_lm_B0_lat, newdata = testset1_B0_lat)
postResample(testset1_B0_lat$LATITUDE, pred1_lm_B0_lat_test)

#validation results
pred1_lm_B0_lat_validation <- predict(object = mod1_lm_B0_lat, newdata = valid1_B0) 
postResample(valid1_B0$LATITUDE, pred1_lm_B0_lat_validation) 

#Building 1
# Split data into trainset and testset (80/20)
set.seed(212)
inTraining <- createDataPartition(train1_B1_lat$LATITUDE, p = .8, list = FALSE)
trainset1_B1_lat <- train1_B1_lat[inTraining,]
testset1_B1_lat <- train1_B1_lat[-inTraining,]

#Train model
mod1_lm_B1_lat <- train(LATITUDE~., 
                        data = trainset1_B1_lat %>%
                          select(starts_with("WAP"), LATITUDE),
                        method = "lm",
                        trControl = trainControl(method = "repeatedcv", 
                                                 number = 5, 
                                                 repeats = 1))
#train results
mod1_lm_B1_lat

#test results
pred1_lm_B1_lat_test <- predict(mod1_lm_B1_lat, newdata = testset1_B1_lat)
postResample(testset1_B1_lat$LATITUDE, pred1_lm_B1_lat_test)

#validation results
pred1_lm_B1_lat_validation <- predict(object = mod1_lm_B1_lat, newdata = valid1_B1) 
postResample(valid1_B1$LATITUDE, pred1_lm_B1_lat_validation) 

#Building 2
# Split data into trainset and testset (80/20)
set.seed(212)
inTraining <- createDataPartition(train1_B2_lat$LATITUDE, p = .8, list = FALSE)
trainset1_B2_lat <- train1_B2_lat[inTraining,]
testset1_B2_lat <- train1_B2_lat[-inTraining,]

**Linear Regression**
#Train model
mod1_lm_B2_lat <- train(LATITUDE~., 
                        data = trainset1_B2_lat %>%
                          select(starts_with("WAP"), LATITUDE),
                        method = "lm",
                        trControl = trainControl(method = "repeatedcv", 
                                                 number = 5, 
                                                 repeats = 1))
#train results
mod1_lm_B2_lat

#test results
pred1_lm_B2_lat_test <- predict(mod1_lm_B2_lat, newdata = testset1_B2_lat)
postResample(testset1_B2_lat$LATITUDE, pred1_lm_B2_lat_test)

#validation results
pred1_lm_B2_lat_validation <- predict(object = mod1_lm_B2_lat, newdata = valid1_B2) 
postResample(valid1_B2$LATITUDE, pred1_lm_B2_lat_validation) 
  • k-NN
#Building 0
#Train model
mod1_knn_B0_lat <- train(LATITUDE~., 
                         data = trainset1_B0_lat %>%
                           select(starts_with("WAP"), LATITUDE),
                         method = "knn",
                         trControl = trainControl(method = "repeatedcv", 
                                                  number = 5, 
                                                  repeats = 1))
#train results
mod1_knn_B0_lat

#test results
pred1_knn_B0_lat_test <- predict(mod1_knn_B0_lat, newdata = testset1_B0_lat)
postResample(testset1_B0_lat$LATITUDE, pred1_knn_B0_lat_test)

#validation results
pred1_knn_B0_lat_validation <- predict(object = mod1_knn_B0_lat, newdata = valid1_B0) 
postResample(valid1_B0$LATITUDE, pred1_knn_B0_lat_validation)  

#Building 1
#Train model
mod1_knn_B1_lat <- train(LATITUDE~., 
                         data = trainset1_B1_lat %>%
                           select(starts_with("WAP"), LATITUDE),
                         method = "knn",
                         trControl = trainControl(method = "repeatedcv", 
                                                  number = 5, 
                                                  repeats = 1))
#train results
mod1_knn_B1_lat

#test results
pred1_knn_B1_lat_test <- predict(mod1_knn_B1_lat, newdata = testset1_B1_lat)
postResample(testset1_B1_lat$LATITUDE, pred1_knn_B1_lat_test)

#validation results
pred1_knn_B1_lat_validation <- predict(object = mod1_knn_B1_lat, newdata = valid1_B1) 
postResample(valid1_B1$LATITUDE, pred1_knn_B1_lat_validation) 

#Building 2
#Train model
mod1_knn_B2_lat <- train(LATITUDE~., 
                         data = trainset1_B2_lat %>%
                           select(starts_with("WAP"), LATITUDE),
                         method = "knn",
                         trControl = trainControl(method = "repeatedcv", 
                                                  number = 5, 
                                                  repeats = 1))
#train results
mod1_knn_B2_lat

#test results
pred1_knn_B2_lat_test <- predict(mod1_knn_B2_lat, newdata = testset1_B2_lat)
postResample(testset1_B2_lat$LATITUDE, pred1_knn_B2_lat_test)

#validation results
pred1_knn_B2_lat_validation <- predict(object = mod1_knn_B2_lat, newdata = valid1_B2) 
postResample(valid1_B2$LATITUDE, pred1_knn_B2_lat_validation) 
  • Random forest
#Building 0
#Train model
mod1_rf_B0_lat <- train(LATITUDE~., 
                        data = trainset1_B0_lat %>%
                          select(starts_with("WAP"), LATITUDE),
                        method = "rf",
                        trControl = trainControl(method = "repeatedcv", 
                                                 number = 5, 
                                                 repeats = 1))
#train results
mod1_rf_B0_lat


#test results
pred1_rf_B0_lat_test <- predict(mod1_rf_B0_lat, newdata = testset1_B0_lat)
postResample(testset1_B0_lat$LATITUDE, pred1_rf_B0_lat_test)

#validation results
pred1_rf_B0_lat_validation <- predict(object = mod1_rf_B0_lat, newdata = valid1_B0) 
postResample(valid1_B0$LATITUDE, pred1_rf_B0_lat_validation) 

#Building 1
#Train model
mod1_rf_B1_lat <- train(LATITUDE~., 
                        data = trainset1_B1_lat %>%
                          select(starts_with("WAP"), LATITUDE),
                        method = "rf",
                        trControl = trainControl(method = "repeatedcv", 
                                                 number = 5, 
                                                 repeats = 1))
#train results
mod1_rf_B1_lat

#test results
pred1_rf_B1_lat_test <- predict(mod1_rf_B1_lat, newdata = testset1_B1_lat)
postResample(testset1_B1_lat$LATITUDE, pred1_rf_B1_lat_test)

#validation results
pred1_rf_B1_lat_validation <- predict(object = mod1_rf_B1_lat, newdata = valid1_B1) 
postResample(valid1_B1$LATITUDE, pred1_rf_B1_lat_validation) 

#Building 2
#Train model
mod1_rf_B2_lat <- train(LATITUDE~., 
                        data = trainset1_B2_lat %>%
                          select(starts_with("WAP"), LATITUDE),
                        method = "rf",
                        trControl = trainControl(method = "repeatedcv", 
                                                 number = 5, 
                                                 repeats = 1))
#train results
mod1_rf_B2_lat

#test results
pred1_rf_B2_lat_test <- predict(mod1_rf_B2_lat, newdata = testset1_B2_lat)
postResample(testset1_B2_lat$LATITUDE, pred1_rf_B2_lat_test)

#validation results
pred1_rf_B2_lat_validation <- predict(object = mod1_rf_B2_lat, newdata = valid1_B2) 
postResample(valid1_B2$LATITUDE, pred1_rf_B2_lat_validation) 

Longitude

  • Linear Regression
#Building 0
# Split data into trainset and testset (80/20)
set.seed(212)
inTraining <- createDataPartition(train1_B0_lon$LONGITUDE, p = .8, list = FALSE)
trainset1_B0_lon <- train1_B0_lon[inTraining,]
testset1_B0_lon <- train1_B0_lon[-inTraining,]

#Train model
mod1_lm_B0_lon <- train(LONGITUDE~., 
                        data = trainset1_B0_lon %>%
                          select(starts_with("WAP"), LONGITUDE),
                        method = "lm",
                        trControl = trainControl(method = "repeatedcv", 
                                                 number = 5, 
                                                 repeats = 1))
#train results
mod1_lm_B0_lon

#test results
pred1_lm_B0_lon_test <- predict(mod1_lm_B0_lon, newdata = testset1_B0_lon)
postResample(testset1_B0_lon$LONGITUDE, pred1_lm_B0_lon_test)

#validation results
pred1_lm_B0_lon_validation <- predict(object = mod1_lm_B0_lon, newdata = valid1_B0) 
postResample(valid1_B0$LONGITUDE, pred1_lm_B0_lon_validation) 

#Building 1
# Split data into trainset and testset (80/20)
set.seed(212)
inTraining <- createDataPartition(train1_B1_lon$LONGITUDE, p = .8, list = FALSE)
trainset1_B1_lon <- train1_B1_lon[inTraining,]
testset1_B1_lon <- train1_B1_lon[-inTraining,]

#Train model
mod1_lm_B1_lon <- train(LONGITUDE~., 
                        data = trainset1_B1_lon %>%
                          select(starts_with("WAP"), LONGITUDE),
                        method = "lm",
                        trControl = trainControl(method = "repeatedcv", 
                                                 number = 5, 
                                                 repeats = 1))
#train results
mod1_lm_B1_lon

#test results
pred1_lm_B1_lon_test <- predict(mod1_lm_B1_lon, newdata = testset1_B1_lon)
postResample(testset1_B1_lon$LONGITUDE, pred1_lm_B1_lon_test)

#validation results
pred1_lm_B1_lon_validation <- predict(object = mod1_lm_B1_lon, newdata = valid1_B1) 
postResample(valid1_B1$LONGITUDE, pred1_lm_B1_lon_validation) 

#Building 2
# Split data into trainset and testset (80/20)
set.seed(212)
inTraining <- createDataPartition(train1_B2_lon$LONGITUDE, p = .8, list = FALSE)
trainset1_B2_lon <- train1_B2_lon[inTraining,]
testset1_B2_lon <- train1_B2_lon[-inTraining,]

#Train model
mod1_lm_B2_lon <- train(LONGITUDE~., 
                        data = trainset1_B2_lon %>%
                          select(starts_with("WAP"), LONGITUDE),
                        method = "lm",
                        trControl = trainControl(method = "repeatedcv", 
                                                 number = 5, 
                                                 repeats = 1))
#train results
mod1_lm_B2_lon

#test results
pred1_lm_B2_lon_test <- predict(mod1_lm_B2_lon, newdata = testset1_B2_lon)
postResample(testset1_B2_lon$LONGITUDE, pred1_lm_B2_lon_test)

#validation results
pred1_lm_B2_lon_validation <- predict(object = mod1_lm_B2_lon, newdata = valid1_B2) 
postResample(valid1_B2$LONGITUDE, pred1_lm_B2_lon_validation) 
  • k-NN
#Building 0
#Train model
mod1_knn_B0_lon <- train(LONGITUDE~., 
                         data = trainset1_B0_lon %>%
                           select(starts_with("WAP"), LONGITUDE),
                         method = "knn",
                         trControl = trainControl(method = "repeatedcv", 
                                                  number = 5, 
                                                  repeats = 1))
#train results
mod1_knn_B0_lon

#test results
pred1_knn_B0_lon_test <- predict(mod1_knn_B0_lon, newdata = testset1_B0_lon)
postResample(testset1_B0_lon$LONGITUDE, pred1_knn_B0_lon_test)

#validation results
pred1_knn_B0_validation <- predict(object = mod1_knn_B0_lon, newdata = valid1_B0) 
postResample(valid1_B0$LONGITUDE, pred1_knn_B0_validation)

#Building 1
#Train model
mod1_knn_B1_lon <- train(LONGITUDE~., 
                         data = trainset1_B1_lon %>%
                           select(starts_with("WAP"), LONGITUDE),
                         method = "knn",
                         trControl = trainControl(method = "repeatedcv", 
                                                  number = 5, 
                                                  repeats = 1))
#train results
mod1_knn_B1_lon

#test results
pred1_knn_B1_lon_test <- predict(mod1_knn_B1_lon, newdata = testset1_B1_lon)
postResample(testset1_B1_lon$LONGITUDE, pred1_knn_B1_lon_test)

#validation results
pred1_knn_B1_lon_validation <- predict(object = mod1_knn_B1_lon, newdata = valid1_B1) 
postResample(valid1_B1$LONGITUDE, pred1_knn_B1_lon_validation) 

#Building 2
#Train model
mod1_knn_B2_lon <- train(LONGITUDE~., 
                         data = trainset1_B2_lon %>%
                           select(starts_with("WAP"), LONGITUDE),
                         method = "knn",
                         trControl = trainControl(method = "repeatedcv", 
                                                  number = 5, 
                                                  repeats = 1))
#train results
mod1_knn_B2_lon

#test results
pred1_knn_B2_lon_test <- predict(mod1_knn_B2_lon, newdata = testset1_B2_lon)
postResample(testset1_B2_lon$LONGITUDE, pred1_knn_B2_lon_test)

#validation results
pred1_knn_B2_lon_validation <- predict(object = mod1_knn_B2_lon, newdata = valid1_B2) 
postResample(valid1_B2$LONGITUDE, pred1_knn_B2_lon_validation) 
  • Random forest
#Building 0
#Train model
mod1_rf_B0_lon <- train(LONGITUDE~., 
                        data = trainset1_B0_lon %>%
                          select(starts_with("WAP"), LONGITUDE),
                        method = "rf",
                        trControl = trainControl(method = "repeatedcv", 
                                                 number = 5, 
                                                 repeats = 1))
#train results
mod1_rf_B0_lon


#test results
pred1_rf_B0_lon_test <- predict(mod1_rf_B0_lon, newdata = testset1_B0_lon)
postResample(testset1_B0_lon$LONGITUDE, pred1_rf_B0_lon_test)

#validation results
pred1_rf_B0_lon_validation <- predict(object = mod1_rf_B0_lon, newdata = valid1_B0) 
postResample(valid1_B0$LONGITUDE, pred1_rf_B0_lon_validation) 

#Building 1
#Train model
mod1_rf_B1_lon <- train(LONGITUDE~., 
                        data = trainset1_B1_lon %>%
                          select(starts_with("WAP"), LONGITUDE),
                        method = "rf",
                        trControl = trainControl(method = "repeatedcv", 
                                                 number = 5, 
                                                 repeats = 1))
#train results
mod1_rf_B1_lon

#test results
pred1_rf_B1_lon_test <- predict(mod1_rf_B1_lon, newdata = testset1_B1_lon)
postResample(testset1_B1_lon$LONGITUDE, pred1_rf_B1_lon_test)

#validation results
pred1_rf_B1_lon_validation <- predict(object = mod1_rf_B1_lon, newdata = valid1_B1) 
postResample(valid1_B1$LONGITUDE, pred1_rf_B1_lon_validation) 

#Building 2
#Train model
mod1_rf_B2_lon <- train(LONGITUDE~., 
                        data = trainset1_B2_lon %>%
                          select(starts_with("WAP"), LONGITUDE),
                        method = "rf",
                        trControl = trainControl(method = "repeatedcv", 
                                                 number = 5, 
                                                 repeats = 1))
#train results
mod1_rf_B2_lon

#test results
pred1_rf_B2_lon_test <- predict(mod1_rf_B2_lon, newdata = testset1_B2_lon)
postResample(testset1_B2_lon$LONGITUDE, pred1_rf_B2_lon_test)

#validation results
pred1_rf_B2_lon_validation <- predict(object = mod1_rf_B2_lon, newdata = valid1_B2) 
postResample(valid1_B2$LONGITUDE, pred1_rf_B2_lon_validation) 

Pre-processing (2)

  • Identify and remove Outliers
#Exploration of trainingdata, distribution of WAPs
WAP <- training %>% 
  select(starts_with("WAP")) 
w <- WAP[,1:520]
w <- stack(w)
w <- w[-grep(0, w$values),]
hist(w$values, xlab = "WAP strength", main = "Distribution of WAPs signal stength", col = "blue")

# Outliers (-30 to 0) -----------------------------------------------------
#filter out high values -30 to 0
outlier <- training %>% 
  rownames_to_column(var = "id") %>% 
  pivot_longer(
    cols = starts_with("WAP"), 
    names_to = "WAP", 
    values_to = "values"
  ) %>% 
  filter(between(values, -30, 0))

hist(outlier$values, xlab = "WAP strength", main = "Distribution of WAPs signal stength (Outliers)", col = "red")

outlier_data <- training %>% 
  rownames_to_column(var = "id") %>% 
  filter(id %in% outlier$id)

training2 <- training
training2$id <- seq.int(nrow(training2))

training3 <- training2[,c(ncol(training2), 1:(ncol(training2)-1))]
training3$ID <- NULL

# removing outliers
training_no_outlier <- training3[!(training3$id %in% outlier_data$id), ]
  • Rescale WAP values
# Rescale to 0-105 --------------------------------------------------------
#outliers
outlier_data_rs <- outlier_data
outlier_data_rs$id <- NULL
outlier_data_rs[outlier_data_rs == 100] <- -105
outlier_data_rs[,1:520] <- outlier_data_rs[,1:520] + 105

#training without outliers
training_no_outlier_rs <- training_no_outlier
training_no_outlier_rs$id <- NULL
training_no_outlier_rs[training_no_outlier_rs == 100] <- -105
training_no_outlier_rs[,1:520] <- training_no_outlier_rs[,1:520] + 105

#validation
valid_rs <- valid
valid_rs[valid_rs == 100] <- -105
valid_rs[,1:520] <- valid_rs[,1:520] + 105
  • Subset by building
# Subset by building for modalization 2 --------------------------------------
train2_B0 <- training_no_outlier_rs %>% 
  filter(BUILDINGID == 0)
train2_B1 <- training_no_outlier_rs %>% 
  filter(BUILDINGID == 1)
train2_B2 <- training_no_outlier_rs %>% 
  filter(BUILDINGID == 2)

valid2_B0 <- valid_rs %>% 
  filter(BUILDINGID == 0)
valid2_B1 <- valid_rs %>% 
  filter(BUILDINGID == 1)
valid2_B2 <- valid_rs %>% 
  filter(BUILDINGID == 2)
  • Take random samples from the dataset
#Building 0
train2_B0_lat <- train2_B0[sample(1:nrow(train2_B0), 2500, replace = FALSE),]
#Building 1
train2_B1_lat <- train2_B1[sample(1:nrow(train2_B1), 2500, replace = FALSE),]
#Building 2
train2_B2_lat <- train2_B2[sample(1:nrow(train2_B2), 2500, replace = FALSE),]

#Building 0
train2_B0_lon <- train2_B0[sample(1:nrow(train2_B0), 2500, replace = FALSE),]
#Building 1
train2_B1_lon <- train2_B1[sample(1:nrow(train2_B1), 2500, replace = FALSE),]
#Building 2
train2_B2_lon <- train2_B2[sample(1:nrow(train2_B2), 2500, replace = FALSE),]

Modalization (2)

Latitude

  • Linear Regression
#Building 0
# Split data into trainset and testset (80/20)
set.seed(212)
inTraining <- createDataPartition(train2_B0_lat$LATITUDE, p = .8, list = FALSE)
trainset2_B0_lat <- train2_B0_lat[inTraining,]
testset2_B0_lat <- train2_B0_lat[-inTraining,]

#Train model
mod2_lm_B0_lat <- train(LATITUDE~., 
                        data = trainset2_B0_lat %>%
                          select(starts_with("WAP"), LATITUDE),
                        method = "lm",
                        trControl = trainControl(method = "repeatedcv", 
                                                 number = 5, 
                                                 repeats = 1))
#train results
mod2_lm_B0_lat

#test results
pred2_lm_B0_lat_test <- predict(mod2_lm_B0_lat, newdata = testset2_B0_lat)
postResample(testset2_B0_lat$LATITUDE, pred2_lm_B0_lat_test)

#validation results
pred2_lm_B0_lat_validation <- predict(object = mod2_lm_B0_lat, newdata = valid2_B0) 
postResample(valid2_B0$LATITUDE, pred2_lm_B0_lat_validation) 

#Building 1
# Split data into trainset and testset (80/20)
set.seed(212)
inTraining <- createDataPartition(train2_B1_lat$LATITUDE, p = .8, list = FALSE)
trainset2_B1_lat <- train2_B1_lat[inTraining,]
testset2_B1_lat <- train2_B1_lat[-inTraining,]

#Train model
mod2_lm_B1_lat <- train(LATITUDE~., 
                        data = trainset2_B1_lat %>%
                          select(starts_with("WAP"), LATITUDE),
                        method = "lm",
                        trControl = trainControl(method = "repeatedcv", 
                                                 number = 5, 
                                                 repeats = 1))
#train results
mod2_lm_B1_lat

#test results
pred2_lm_B1_lat_test <- predict(mod2_lm_B1_lat, newdata = testset2_B1_lat)
postResample(testset2_B1_lat$LATITUDE, pred2_lm_B1_lat_test)

#validation results
pred2_lm_B1_lat_validation <- predict(object = mod2_lm_B1_lat, newdata = valid2_B1) 
postResample(valid2_B1$LATITUDE, pred2_lm_B1_lat_validation)

#Building 2
# Split data into trainset and testset (80/20)
set.seed(212)
inTraining <- createDataPartition(train2_B2_lat$LATITUDE, p = .8, list = FALSE)
trainset2_B2_lat <- train2_B2_lat[inTraining,]
testset2_B2_lat <- train2_B2_lat[-inTraining,]

#Train model
mod2_lm_B2_lat <- train(LATITUDE~., 
                        data = trainset2_B2_lat %>%
                          select(starts_with("WAP"), LATITUDE),
                        method = "lm",
                        trControl = trainControl(method = "repeatedcv", 
                                                 number = 5, 
                                                 repeats = 1))
#train results
mod2_lm_B2_lat

#test results
pred2_lm_B2_lat_test <- predict(mod2_lm_B2_lat, newdata = testset2_B2_lat)
postResample(testset2_B2_lat$LATITUDE, pred2_lm_B2_lat_test)

#validation results
pred2_lm_B2_lat_validation <- predict(object = mod2_lm_B2_lat, newdata = valid2_B2) 
postResample(valid2_B2$LATITUDE, pred2_lm_B2_lat_validation) 
  • k-NN
#Building 0
#Train model
mod2_knn_B0_lat <- train(LATITUDE~., 
                         data = trainset2_B0_lat %>%
                           select(starts_with("WAP"), LATITUDE),
                         method = "knn",
                         trControl = trainControl(method = "repeatedcv", 
                                                  number = 5, 
                                                  repeats = 1))
#train results
mod2_knn_B0_lat

#test results
pred2_knn_B0_lat_test <- predict(mod2_knn_B0_lat, newdata = testset2_B0_lat)
postResample(testset2_B0_lat$LATITUDE, pred2_knn_B0_lat_test)

#validation results
pred2_knn_B0_lat_validation <- predict(object = mod2_knn_B0_lat, newdata = valid2_B0) 
postResample(valid2_B0$LATITUDE, pred2_knn_B0_lat_validation) 

#Building 1
#Train model
mod2_knn_B1_lat <- train(LATITUDE~., 
                         data = trainset2_B1_lat %>%
                           select(starts_with("WAP"), LATITUDE),
                         method = "knn",
                         trControl = trainControl(method = "repeatedcv", 
                                                  number = 5, 
                                                  repeats = 1))
#train results
mod2_knn_B1_lat

#test results
pred2_knn_B1_lat_test <- predict(mod2_knn_B1_lat, newdata = testset2_B1_lat)
postResample(testset2_B1_lat$LATITUDE, pred2_knn_B1_lat_test)

#validation results
pred2_knn_B1_lat_validation <- predict(object = mod2_knn_B1_lat, newdata = valid2_B1) 
postResample(valid2_B1$LATITUDE, pred2_knn_B1_lat_validation) 

#Building 2
#Train model
mod2_knn_B2_lat <- train(LATITUDE~., 
                         data = trainset2_B2_lat %>%
                           select(starts_with("WAP"), LATITUDE),
                         method = "knn",
                         trControl = trainControl(method = "repeatedcv", 
                                                  number = 5, 
                                                  repeats = 1))
#train results
mod2_knn_B2_lat

#test results
pred2_knn_B2_lat_test <- predict(mod2_knn_B2_lat, newdata = testset2_B2_lat)
postResample(testset2_B2_lat$LATITUDE, pred2_knn_B2_lat_test)

#validation results
pred2_knn_B2_lat_validation <- predict(object = mod2_knn_B2_lat, newdata = valid2_B2) 
postResample(valid2_B2$LATITUDE, pred2_knn_B2_lat_validation) 
  • Random forest
#Building 0
#Train model
mod2_rf_B0_lat <- train(LATITUDE~., 
                        data = trainset2_B0_lat %>%
                          select(starts_with("WAP"), LATITUDE),
                        method = "rf",
                        trControl = trainControl(method = "repeatedcv", 
                                                 number = 5, 
                                                 repeats = 1))
#train results
mod2_rf_B0_lat

#test results
pred2_rf_B0_lat_test <- predict(mod2_rf_B0_lat, newdata = testset2_B0_lat)
postResample(testset2_B0_lat$LATITUDE, pred2_rf_B0_lat_test)

#validation results
pred2_rf_B0_lat_validation <- predict(object = mod2_rf_B0_lat, newdata = valid2_B0) 
postResample(valid2_B0$LATITUDE, pred2_rf_B0_lat_validation)

#Building 1
#Train model
mod2_rf_B1_lat <- train(LATITUDE~., 
                        data = trainset2_B1_lat %>%
                          select(starts_with("WAP"), LATITUDE),
                        method = "rf",
                        trControl = trainControl(method = "repeatedcv", 
                                                 number = 5, 
                                                 repeats = 1))
#train results
mod2_rf_B1_lat

#test results
pred2_rf_B1_lat_test <- predict(mod2_rf_B1_lat, newdata = testset2_B1_lat)
postResample(testset2_B1_lat$LATITUDE, pred2_rf_B1_lat_test)

#validation results
pred2_rf_B1_lat_validation <- predict(object = mod2_rf_B1_lat, newdata = valid2_B1) 
postResample(valid2_B1$LATITUDE, pred2_rf_B1_lat_validation) 

#Building 2
#Train model
mod2_rf_B2_lat <- train(LATITUDE~., 
                        data = trainset2_B2_lat %>%
                          select(starts_with("WAP"), LATITUDE),
                        method = "rf",
                        trControl = trainControl(method = "repeatedcv", 
                                                 number = 5, 
                                                 repeats = 1))
#train results
mod2_rf_B2_lat

#test results
pred2_rf_B2_lat_test <- predict(mod2_rf_B2_lat, newdata = testset2_B2_lat)
postResample(testset2_B2_lat$LATITUDE, pred2_rf_B2_lat_test)

#validation results
pred2_rf_B2_lat_validation <- predict(object = mod2_rf_B2_lat, newdata = valid2_B2) 
postResample(valid2_B2$LATITUDE, pred2_rf_B2_lat_validation) 

Longitude

  • Linear Regression
#Building 0
# Split data into trainset and testset (80/20)
set.seed(212)
inTraining <- createDataPartition(train2_B0_lon$LONGITUDE, p = .8, list = FALSE)
trainset2_B0_lon <- train2_B0_lon[inTraining,]
testset2_B0_lon <- train2_B0_lon[-inTraining,]

#Train model
mod2_lm_B0_lon <- train(LONGITUDE~., 
                        data = trainset2_B0_lon %>%
                          select(starts_with("WAP"), LONGITUDE),
                        method = "lm",
                        trControl = trainControl(method = "repeatedcv", 
                                                 number = 5, 
                                                 repeats = 1))
#train results
mod2_lm_B0_lon

#test results
pred2_lm_B0_lon_test <- predict(mod2_lm_B0_lon, newdata = testset2_B0_lon)
postResample(testset2_B0_lon$LONGITUDE, pred2_lm_B0_lon_test)

#validation results
pred2_lm_B0_lon_validation <- predict(object = mod2_lm_B0_lon, newdata = valid2_B0) 
postResample(valid2_B0$LONGITUDE, pred2_lm_B0_lon_validation) 

#Building 1
# Split data into trainset and testset (80/20)
set.seed(212)
inTraining <- createDataPartition(train2_B1_lon$LONGITUDE, p = .8, list = FALSE)
trainset2_B1_lon <- train2_B1_lon[inTraining,]
testset2_B1_lon <- train2_B1_lon[-inTraining,]

#Train model
mod2_lm_B1_lon <- train(LONGITUDE~., 
                        data = trainset2_B1_lon %>%
                          select(starts_with("WAP"), LONGITUDE),
                        method = "lm",
                        trControl = trainControl(method = "repeatedcv", 
                                                 number = 5, 
                                                 repeats = 1))
#train results
mod2_lm_B1_lon

#test results
pred2_lm_B1_lon_test <- predict(mod2_lm_B1_lon, newdata = testset2_B1_lon)
postResample(testset2_B1_lon$LONGITUDE, pred2_lm_B1_lon_test)

#validation results
pred2_lm_B1_lon_validation <- predict(object = mod2_lm_B1_lon, newdata = valid2_B1) 
postResample(valid2_B1$LONGITUDE, pred2_lm_B1_lon_validation) 

#Building 2
# Split data into trainset and testset (80/20)
set.seed(212)
inTraining <- createDataPartition(train2_B2_lon$LONGITUDE, p = .8, list = FALSE)
trainset2_B2_lon <- train2_B2_lon[inTraining,]
testset2_B2_lon <- train2_B2_lon[-inTraining,]

#Train model
mod2_lm_B2_lon <- train(LONGITUDE~., 
                        data = trainset2_B2_lon %>%
                          select(starts_with("WAP"), LONGITUDE),
                        method = "lm",
                        trControl = trainControl(method = "repeatedcv", 
                                                 number = 5, 
                                                 repeats = 1))
#train results
mod2_lm_B2_lon

#test results
pred2_lm_B2_lon_test <- predict(mod2_lm_B2_lon, newdata = testset2_B2_lon)
postResample(testset2_B2_lon$LONGITUDE, pred2_lm_B2_lon_test)

#validation results
pred2_lm_B2_lon_validation <- predict(object = mod2_lm_B2_lon, newdata = valid2_B2) 
postResample(valid2_B2$LONGITUDE, pred2_lm_B2_lon_validation) 
  • k-NN
#Building 0
#Train model
mod2_knn_B0_lon <- train(LONGITUDE~., 
                         data = trainset2_B0_lon %>%
                           select(starts_with("WAP"), LONGITUDE),
                         method = "knn",
                         trControl = trainControl(method = "repeatedcv", 
                                                  number = 5, 
                                                  repeats = 1))
#train results
mod2_knn_B0_lon

#test results
pred2_knn_B0_lon_test <- predict(mod2_knn_B0_lon, newdata = testset2_B0_lon)
postResample(testset2_B0_lon$LONGITUDE, pred2_knn_B0_lon_test)

#validation results
pred2_knn_B0_validation <- predict(object = mod2_knn_B0_lon, newdata = valid2_B0) 
postResample(valid2_B0$LONGITUDE, pred2_knn_B0_validation) 

#Building 1
#Train model
mod2_knn_B1_lon <- train(LONGITUDE~., 
                         data = trainset2_B1_lon %>%
                           select(starts_with("WAP"), LONGITUDE),
                         method = "knn",
                         trControl = trainControl(method = "repeatedcv", 
                                                  number = 5, 
                                                  repeats = 1))
#train results
mod2_knn_B1_lon

#test results
pred2_knn_B1_lon_test <- predict(mod2_knn_B1_lon, newdata = testset2_B1_lon)
postResample(testset2_B1_lon$LONGITUDE, pred2_knn_B1_lon_test)

#validation results
pred2_knn_B1_lon_validation <- predict(object = mod2_knn_B1_lon, newdata = valid2_B1) 
postResample(valid2_B1$LONGITUDE, pred2_knn_B1_lon_validation) 

#Building 2
#Train model
mod2_knn_B2_lon <- train(LONGITUDE~., 
                         data = trainset2_B2_lon %>%
                           select(starts_with("WAP"), LONGITUDE),
                         method = "knn",
                         trControl = trainControl(method = "repeatedcv", 
                                                  number = 5, 
                                                  repeats = 1))
#train results
mod2_knn_B2_lon

#test results
pred2_knn_B2_lon_test <- predict(mod2_knn_B2_lon, newdata = testset2_B2_lon)
postResample(testset2_B2_lon$LONGITUDE, pred2_knn_B2_lon_test)

#validation results
pred2_knn_B2_lon_validation <- predict(object = mod2_knn_B2_lon, newdata = valid2_B2) 
postResample(valid2_B2$LONGITUDE, pred2_knn_B2_lon_validation) 
  • Random forest
#Building 0
#Train model
mod2_rf_B0_lon <- train(LONGITUDE~., 
                        data = trainset2_B0_lon %>%
                          select(starts_with("WAP"), LONGITUDE),
                        method = "rf",
                        trControl = trainControl(method = "repeatedcv", 
                                                 number = 5, 
                                                 repeats = 1))
#train results
mod2_rf_B0_lon

#test results
pred2_rf_B0_lon_test <- predict(mod2_rf_B0_lon, newdata = testset2_B0_lon)
postResample(testset2_B0_lon$LONGITUDE, pred2_rf_B0_lon_test)

#validation results
pred2_rf_B0_lon_validation <- predict(object = mod2_rf_B0_lon, newdata = valid2_B0) 
postResample(valid2_B0$LONGITUDE, pred2_rf_B0_lon_validation) 

#Building 1
#Train model
mod2_rf_B1_lon <- train(LONGITUDE~., 
                        data = trainset2_B1_lon %>%
                          select(starts_with("WAP"), LONGITUDE),
                        method = "rf",
                        trControl = trainControl(method = "repeatedcv", 
                                                 number = 5, 
                                                 repeats = 1))
#train results
mod2_rf_B1_lon

#test results
pred2_rf_B1_lon_test <- predict(mod2_rf_B1_lon, newdata = testset2_B1_lon)
postResample(testset2_B1_lon$LONGITUDE, pred2_rf_B1_lon_test)

#validation results
pred2_rf_B1_lon_validation <- predict(object = mod2_rf_B1_lon, newdata = valid2_B1) 
postResample(valid2_B1$LONGITUDE, pred2_rf_B1_lon_validation) 

#Building 2
#Train model
mod2_rf_B2_lon <- train(LONGITUDE~., 
                        data = trainset2_B2_lon %>%
                          select(starts_with("WAP"), LONGITUDE),
                        method = "rf",
                        trControl = trainControl(method = "repeatedcv", 
                                                 number = 5, 
                                                 repeats = 1))
#train results
mod2_rf_B2_lon

#test results
pred2_rf_B2_lon_test <- predict(mod2_rf_B2_lon, newdata = testset2_B2_lon)
postResample(testset2_B2_lon$LONGITUDE, pred2_rf_B2_lon_test)

#validation results
pred2_rf_B2_lon_validation <- predict(object = mod2_rf_B2_lon, newdata = valid2_B2) 
postResample(valid2_B2$LONGITUDE, pred2_rf_B2_lon_validation) 

Pre-processing (3)

  • Remove columns with zero variance
training_no_outlier_rs$BUILDINGID <- as.factor(training_no_outlier_rs$BUILDINGID)
training_no_outlier_rs$USERID <- as.factor(training_no_outlier_rs$USERID)
training_no_outlier_rs$PHONEID <- as.factor(training_no_outlier_rs$PHONEID)

# remove columns with zero variance --------------------------------------
training_factorcolumns <- training_no_outlier_rs[,521:529]

training_no_outlier_rs_nzv <- training_no_outlier_rs %>% 
  select(starts_with("WAP")) %>%
  select_if(function(x) var(x) != 0)
training_no_outlier_rs_nzv <- cbind(training_no_outlier_rs_nzv, training_factorcolumns)


# define no 0 variance columns on training
relevant_columns <- names(training_no_outlier_rs_nzv)

# select the no 0 variance columns on testing
valid_rs_nzv <- valid_rs %>%
  select(relevant_columns)

# remove rows with zero variance ------------------------------------------
# nrow(training_no_outlier_rs_nzv) #19429
# ncol(training_no_outlier_rs_nzv) #470
trainvar <- training_no_outlier_rs_nzv

#training
waps_training <- trainvar[,1:464]
which(apply(waps_training, 1, var) == 0)
  • Remove rows with zero variance
# remove rows with 0 variance
trainvar <- trainvar[-which(apply(waps_training, 1, var) == 0), ]
which(apply(trainvar[,1:464], 1, var) == 0) #integer(0)
  • Remove duplicates
# remove duplicates in training set -------------------------------------------------------
dup_tr <- duplicated(trainvar[,1:464])
duplicates_tr <- trainvar[dup_tr,]
noduplicates_tr <- trainvar[!dup_tr,]
  • Subset by building
# Subset for Modalization 3-------------------------------------------------------------------------
train3_B0 <- noduplicates_tr %>% 
  filter(BUILDINGID == 0)
train3_B1 <- noduplicates_tr %>% 
  filter(BUILDINGID == 1)
train3_B2 <- noduplicates_tr %>% 
  filter(BUILDINGID == 2)

valid3_B0 <- valid_rs_nzv %>% 
  filter(BUILDINGID == 0)
valid3_B1 <- valid_rs_nzv %>% 
  filter(BUILDINGID == 1)
valid3_B2 <- valid_rs_nzv %>% 
  filter(BUILDINGID == 2)
  • Take random samples from the dataset
#Building 0
train3_B0_lat <- train3_B0[sample(1:nrow(train3_B0), 2500, replace = FALSE),]
#Building 1
train3_B1_lat <- train3_B1[sample(1:nrow(train3_B1), 2500, replace = FALSE),]
#Building 2
train3_B2_lat <- train3_B2[sample(1:nrow(train3_B2), 2500, replace = FALSE),]

#Building 0
train3_B0_lon <- train3_B0[sample(1:nrow(train3_B0), 2500, replace = FALSE),]
#Building 1
train3_B1_lon <- train3_B1[sample(1:nrow(train3_B1), 2500, replace = FALSE),]
#Building 2
train3_B2_lon <- train3_B2[sample(1:nrow(train3_B2), 2500, replace = FALSE),]

Modalization (3)

Latitude

  • Linear Regression
#Building 0
# Split data into trainset and testset (80/20)
set.seed(212)
inTraining <- createDataPartition(train3_B0_lat$LATITUDE, p = .8, list = FALSE)
trainset3_B0_lat <- train3_B0_lat[inTraining,]
testset3_B0_lat <- train3_B0_lat[-inTraining,]

#Train model
mod3_lm_B0_lat <- train(LATITUDE~., 
                       data = trainset3_B0_lat %>%
                         select(starts_with("WAP"), LATITUDE),
                       method = "lm",
                       trControl = trainControl(method = "repeatedcv", 
                                                number = 5, 
                                                repeats = 1))
#train results
mod3_lm_B0_lat

#test results
pred3_lm_B0_lat_test <- predict(mod3_lm_B0_lat, newdata = testset3_B0_lat)
postResample(testset3_B0_lat$LATITUDE, pred3_lm_B0_lat_test)

#validation results
pred3_lm_B0_lat_validation <- predict(object = mod3_lm_B0_lat, newdata = valid3_B0) 
postResample(valid3_B0$LATITUDE, pred3_lm_B0_lat_validation) 

#Building 1
# Split data into trainset and testset (80/20)
set.seed(212)
inTraining <- createDataPartition(train3_B1_lat$LATITUDE, p = .8, list = FALSE)
trainset3_B1_lat <- train3_B1_lat[inTraining,]
testset3_B1_lat <- train3_B1_lat[-inTraining,]
#Train model
mod3_lm_B1_lat <- train(LATITUDE~., 
                       data = trainset3_B1_lat %>%
                         select(starts_with("WAP"), LATITUDE),
                       method = "lm",
                       trControl = trainControl(method = "repeatedcv", 
                                                number = 5, 
                                                repeats = 1))
#train results
mod3_lm_B1_lat

#test results
pred3_lm_B1_lat_test <- predict(mod3_lm_B1_lat, newdata = testset3_B1_lat)
postResample(testset3_B1_lat$LATITUDE, pred3_lm_B1_lat_test)

#validation results
pred3_lm_B1_lat_validation <- predict(object = mod3_lm_B1_lat, newdata = valid3_B1) 
postResample(valid3_B1$LATITUDE, pred3_lm_B1_lat_validation) 


#Building 2
# Split data into trainset and testset (80/20)
set.seed(212)
inTraining <- createDataPartition(train3_B2_lat$LATITUDE, p = .8, list = FALSE)
trainset3_B2_lat <- train3_B2_lat[inTraining,]
testset3_B2_lat <- train3_B2_lat[-inTraining,]

#Train model
mod3_lm_B2_lat <- train(LATITUDE~., 
                       data = trainset3_B2_lat %>%
                         select(starts_with("WAP"), LATITUDE),
                       method = "lm",
                       trControl = trainControl(method = "repeatedcv", 
                                                number = 5, 
                                                repeats = 1))
#train results
mod3_lm_B2_lat

#test results
pred3_lm_B2_lat_test <- predict(mod3_lm_B2_lat, newdata = testset3_B2_lat)
postResample(testset3_B2_lat$LATITUDE, pred3_lm_B2_lat_test)

#validation results
pred3_lm_B2_lat_validation <- predict(object = mod3_lm_B2_lat, newdata = valid3_B2) 
postResample(valid3_B2$LATITUDE, pred3_lm_B2_lat_validation) 
  • k-NN
#Building 0
#Train model
mod3_knn_B0_lat <- train(LATITUDE~., 
                        data = trainset3_B0_lat %>%
                          select(starts_with("WAP"), LATITUDE),
                        method = "knn",
                        trControl = trainControl(method = "repeatedcv", 
                                                 number = 5, 
                                                 repeats = 1))
#train results
mod3_knn_B0_lat

#test results
pred3_knn_B0_lat_test <- predict(mod3_knn_B0_lat, newdata = testset3_B0_lat)
postResample(testset3_B0_lat$LATITUDE, pred3_knn_B0_lat_test)

#validation results
pred3_knn_B0_lat_validation <- predict(object = mod3_knn_B0_lat, newdata = valid3_B0) 
postResample(valid3_B0$LATITUDE, pred3_knn_B0_lat_validation) 

#Building 1
#Train model
mod3_knn_B1_lat <- train(LATITUDE~., 
                        data = trainset3_B1_lat %>%
                          select(starts_with("WAP"), LATITUDE),
                        method = "knn",
                        trControl = trainControl(method = "repeatedcv", 
                                                 number = 5, 
                                                 repeats = 1))
#train results
mod3_knn_B1_lat

#test results
pred3_knn_B1_lat_test <- predict(mod3_knn_B1_lat, newdata = testset3_B1_lat)
postResample(testset3_B1_lat$LATITUDE, pred3_knn_B1_lat_test)

#validation results
pred3_knn_B1_lat_validation <- predict(object = mod3_knn_B1_lat, newdata = valid3_B1) 
postResample(valid3_B1$LATITUDE, pred3_knn_B1_lat_validation) 

#Building 2
#Train model
mod3_knn_B2_lat <- train(LATITUDE~., 
                        data = trainset3_B2_lat %>%
                          select(starts_with("WAP"), LATITUDE),
                        method = "knn",
                        trControl = trainControl(method = "repeatedcv", 
                                                 number = 5, 
                                                 repeats = 1))
#train results
mod3_knn_B2_lat

#test results
pred3_knn_B2_lat_test <- predict(mod3_knn_B2_lat, newdata = testset3_B2_lat)
postResample(testset3_B2_lat$LATITUDE, pred3_knn_B2_lat_test)

#validation results
pred3_knn_B2_lat_validation <- predict(object = mod3_knn_B2_lat, newdata = valid3_B2) 
postResample(valid3_B2$LATITUDE, pred3_knn_B2_lat_validation) 
  • Random forest
#Building 0
#Train model
mod3_rf_B0_lat <- train(LATITUDE~., 
                       data = trainset3_B0_lat %>%
                         select(starts_with("WAP"), LATITUDE),
                       method = "rf",
                       trControl = trainControl(method = "repeatedcv", 
                                                number = 5, 
                                                repeats = 1))
#train results
mod3_rf_B0_lat

#test results
pred3_rf_B0_lat_test <- predict(mod3_rf_B0_lat, newdata = testset3_B0_lat)
postResample(testset3_B0_lat$LATITUDE, pred3_rf_B0_lat_test)

#validation results
pred3_rf_B0_lat_validation <- predict(object = mod3_rf_B0_lat, newdata = valid3_B0) 
postResample(valid3_B0$LATITUDE, pred3_rf_B0_lat_validation) 

#Building 1
#Train model
mod3_rf_B1_lat <- train(LATITUDE~., 
                       data = trainset3_B1_lat %>%
                         select(starts_with("WAP"), LATITUDE),
                       method = "rf",
                       trControl = trainControl(method = "repeatedcv", 
                                                number = 5, 
                                                repeats = 1))
#train results
mod3_rf_B1_lat

#test results
pred3_rf_B1_lat_test <- predict(mod3_rf_B1_lat, newdata = testset3_B1_lat)
postResample(testset3_B1_lat$LATITUDE, pred3_rf_B1_lat_test)

#validation results
pred3_rf_B1_lat_validation <- predict(object = mod3_rf_B1_lat, newdata = valid3_B1) 
postResample(valid3_B1$LATITUDE, pred3_rf_B1_lat_validation) 

#Building 2
#Train model
mod3_rf_B2_lat <- train(LATITUDE~., 
                       data = trainset3_B2_lat %>%
                         select(starts_with("WAP"), LATITUDE),
                       method = "rf",
                       trControl = trainControl(method = "repeatedcv", 
                                                number = 5, 
                                                repeats = 1))
#train results
mod3_rf_B2_lat

#test results
pred3_rf_B2_lat_test <- predict(mod3_rf_B2_lat, newdata = testset3_B2_lat)
postResample(testset3_B2_lat$LATITUDE, pred3_rf_B2_lat_test)

#validation results
pred3_rf_B2_lat_validation <- predict(object = mod3_rf_B2_lat, newdata = valid3_B2) 
postResample(valid3_B2$LATITUDE, pred3_rf_B2_lat_validation) 

Longitude

  • Linear Regression
#Building 0
# Split data into trainset and testset (80/20)
set.seed(212)
inTraining <- createDataPartition(train3_B0_lon$LONGITUDE, p = .8, list = FALSE)
trainset3_B0_lon <- train3_B0_lon[inTraining,]
testset3_B0_lon <- train3_B0_lon[-inTraining,]

#Train model
mod3_lm_B0_lon <- train(LONGITUDE~., 
                       data = trainset3_B0_lon %>%
                         select(starts_with("WAP"), LONGITUDE),
                       method = "lm",
                       trControl = trainControl(method = "repeatedcv", 
                                                number = 5, 
                                                repeats = 1))
#train results
mod3_lm_B0_lon

#test results
pred3_lm_B0_lon_test <- predict(mod3_lm_B0_lon, newdata = testset3_B0_lon)
postResample(testset3_B0_lon$LONGITUDE, pred3_lm_B0_lon_test)

#validation results
pred3_lm_B0_lon_validation <- predict(object = mod3_lm_B0_lon, newdata = valid3_B0) 
postResample(valid3_B0$LONGITUDE, pred3_lm_B0_lon_validation) 

#Building 1
# Split data into trainset and testset (80/20)
set.seed(212)
inTraining <- createDataPartition(train3_B1_lon$LONGITUDE, p = .8, list = FALSE)
trainset3_B1_lon <- train3_B1_lon[inTraining,]
testset3_B1_lon <- train3_B1_lon[-inTraining,]

#Train model
mod3_lm_B1_lon <- train(LONGITUDE~., 
                       data = trainset3_B1_lon %>%
                         select(starts_with("WAP"), LONGITUDE),
                       method = "lm",
                       trControl = trainControl(method = "repeatedcv", 
                                                number = 5, 
                                                repeats = 1))
#train results
mod3_lm_B1_lon

#test results
pred3_lm_B1_lon_test <- predict(mod3_lm_B1_lon, newdata = testset3_B1_lon)
postResample(testset3_B1_lon$LONGITUDE, pred3_lm_B1_lon_test)

#validation results
pred3_lm_B1_lon_validation <- predict(object = mod3_lm_B1_lon, newdata = valid3_B1) 
postResample(valid3_B1$LONGITUDE, pred3_lm_B1_lon_validation) 

#Building 2
# Split data into trainset and testset (80/20)
set.seed(212)
inTraining <- createDataPartition(train3_B2_lon$LONGITUDE, p = .8, list = FALSE)
trainset3_B2_lon <- train3_B2_lon[inTraining,]
testset3_B2_lon <- train3_B2_lon[-inTraining,]

#Train model
mod3_lm_B2_lon <- train(LONGITUDE~., 
                       data = trainset3_B2_lon %>%
                         select(starts_with("WAP"), LONGITUDE),
                       method = "lm",
                       trControl = trainControl(method = "repeatedcv", 
                                                number = 5, 
                                                repeats = 1))
#train results
mod3_lm_B2_lon

#test results
pred3_lm_B2_lon_test <- predict(mod3_lm_B2_lon, newdata = testset3_B2_lon)
postResample(testset3_B2_lon$LONGITUDE, pred3_lm_B2_lon_test)

#validation results
pred3_lm_B2_lon_validation <- predict(object = mod3_lm_B2_lon, newdata = valid3_B2) 
postResample(valid3_B2$LONGITUDE, pred3_lm_B2_lon_validation)
  • k-NN
#Building 0
#Train model
mod3_knn_B0_lon <- train(LONGITUDE~., 
                        data = trainset3_B0_lon %>%
                          select(starts_with("WAP"), LONGITUDE),
                        method = "knn",
                        trControl = trainControl(method = "repeatedcv", 
                                                 number = 5, 
                                                 repeats = 1))
#train results
mod3_knn_B0_lon

#test results
pred3_knn_B0_lon_test <- predict(mod3_knn_B0_lon, newdata = testset3_B0_lon)
postResample(testset3_B0_lon$LONGITUDE, pred3_knn_B0_lon_test)

#validation results
pred3_knn_B0_validation <- predict(object = mod3_knn_B0_lon, newdata = valid3_B0) 
postResample(valid3_B0$LONGITUDE, pred3_knn_B0_validation) 

#Building 1
#Train model
mod3_knn_B1_lon <- train(LONGITUDE~., 
                        data = trainset3_B1_lon %>%
                          select(starts_with("WAP"), LONGITUDE),
                        method = "knn",
                        trControl = trainControl(method = "repeatedcv", 
                                                 number = 5, 
                                                 repeats = 1))
#train results
mod3_knn_B1_lon

#test results
pred3_knn_B1_lon_test <- predict(mod3_knn_B1_lon, newdata = testset3_B1_lon)
postResample(testset3_B1_lon$LONGITUDE, pred3_knn_B1_lon_test)

#validation results
pred3_knn_B1_lon_validation <- predict(object = mod3_knn_B1_lon, newdata = valid3_B1) 
postResample(valid3_B1$LONGITUDE, pred3_knn_B1_lon_validation) 

#Building 2
#Train model
mod3_knn_B2_lon <- train(LONGITUDE~., 
                        data = trainset3_B2_lon %>%
                          select(starts_with("WAP"), LONGITUDE),
                        method = "knn",
                        trControl = trainControl(method = "repeatedcv", 
                                                 number = 5, 
                                                 repeats = 1))
#train results
mod3_knn_B2_lon

#test results
pred3_knn_B2_lon_test <- predict(mod3_knn_B2_lon, newdata = testset3_B2_lon)
postResample(testset3_B2_lon$LONGITUDE, pred3_knn_B2_lon_test)

#validation results
pred3_knn_B2_lon_validation <- predict(object = mod3_knn_B2_lon, newdata = valid3_B2) 
postResample(valid3_B2$LONGITUDE, pred3_knn_B2_lon_validation) 
  • Random forest
#Building 0
#Train model
mod3_rf_B0_lon <- train(LONGITUDE~., 
                       data = trainset3_B0_lon %>%
                         select(starts_with("WAP"), LONGITUDE),
                       method = "rf",
                       trControl = trainControl(method = "repeatedcv", 
                                                number = 5, 
                                                repeats = 1))
#train results
mod3_rf_B0_lon

#test results
pred3_rf_B0_lon_test <- predict(mod3_rf_B0_lon, newdata = testset3_B0_lon)
postResample(testset3_B0_lon$LONGITUDE, pred3_rf_B0_lon_test)

#validation results
pred3_rf_B0_lon_validation <- predict(object = mod3_rf_B0_lon, newdata = valid3_B0) 
postResample(valid3_B0$LONGITUDE, pred3_rf_B0_lon_validation) 

#Building 1
#Train model
mod3_rf_B1_lon <- train(LONGITUDE~., 
                       data = trainset3_B1_lon %>%
                         select(starts_with("WAP"), LONGITUDE),
                       method = "rf",
                       trControl = trainControl(method = "repeatedcv", 
                                                number = 5, 
                                                repeats = 1))
#train results
mod3_rf_B1_lon

#test results
pred3_rf_B1_lon_test <- predict(mod3_rf_B1_lon, newdata = testset3_B1_lon)
postResample(testset3_B1_lon$LONGITUDE, pred3_rf_B1_lon_test)

#validation results
pred3_rf_B1_lon_validation <- predict(object = mod3_rf_B1_lon, newdata = valid3_B1) 
postResample(valid3_B1$LONGITUDE, pred3_rf_B1_lon_validation) 

#Building 2
#Train model
mod3_rf_B2_lon <- train(LONGITUDE~., 
                       data = trainset3_B2_lon %>%
                         select(starts_with("WAP"), LONGITUDE),
                       method = "rf",
                       trControl = trainControl(method = "repeatedcv", 
                                                number = 5, 
                                                repeats = 1))
#train results
mod3_rf_B2_lon

#test results
pred3_rf_B2_lon_test <- predict(mod3_rf_B2_lon, newdata = testset3_B2_lon)
postResample(testset3_B2_lon$LONGITUDE, pred3_rf_B2_lon_test)

#validation results
pred3_rf_B2_lon_validation <- predict(object = mod3_rf_B2_lon, newdata = valid3_B2) 
postResample(valid3_B2$LONGITUDE, pred3_rf_B2_lon_validation) 

Pre-processing (4)

  • Remove Phoneid 17
# remove phoneid 17 --------------------------------------------------------
noduplicates_tr <- noduplicates_tr %>% 
  filter(PHONEID != 17)

valid %>% 
  filter(PHONEID == 17)
#No phoneid in validationset
  • Normalization to 0-1
# Normalize rows to 0 and 1 -----------------------------------------------
# train
nonnumeric_tr <- noduplicates_tr %>% 
  select(LONGITUDE:PHONEID)

norm_min_max <- function(row) {
  (row - min(row))/(max(row) - min(row))
}

train2 <- noduplicates_tr %>% 
  select(starts_with("WAP")) %>% 
  apply(1, function(x) norm_min_max(x)) %>% 
  t() %>%  
  as_tibble()
train3 <- cbind(train2, nonnumeric_tr)

# validation
nonnumeric_val <- valid_rs_nzv %>% 
  select(LONGITUDE:PHONEID)

norm_min_max <- function(row) {
  (row - min(row))/(max(row) - min(row))
}

valid2 <- valid_rs_nzv %>% 
  select(starts_with("WAP")) %>% 
  apply(1, function(x) norm_min_max(x)) %>% 
  t() %>%  
  as_tibble()
valid3 <- cbind(valid2, nonnumeric_val)
  • Subset by building
# Subset by building for modalization 4------------------------------------------------------
train4_B0 <- train3 %>% 
  filter(BUILDINGID == 0)
train4_B1 <- train3 %>% 
  filter(BUILDINGID == 1)
train4_B2 <- train3 %>% 
  filter(BUILDINGID == 2)

valid4_B0 <- valid3 %>% 
  filter(BUILDINGID == 0)
valid4_B1 <- valid3 %>% 
  filter(BUILDINGID == 1)
valid4_B2 <- valid3 %>% 
  filter(BUILDINGID == 2)
  • Take random samples from the dataset
#Building 0
train4_B0_lat <- train4_B0[sample(1:nrow(train4_B0), 1500, replace = FALSE),]
#Building 1
train4_B1_lat <- train4_B1[sample(1:nrow(train4_B1), 1500, replace = FALSE),]
#Building 2
train4_B2_lat <- train4_B2[sample(1:nrow(train4_B2), 1500, replace = FALSE),]

#Building 0
train4_B0_lon <- train4_B0[sample(1:nrow(train4_B0), 1500, replace = FALSE),]
#Building 1
train4_B1_lon <- train4_B1[sample(1:nrow(train4_B1), 1500, replace = FALSE),]
#Building 2
train4_B2_lon <- train4_B2[sample(1:nrow(train4_B2), 1500, replace = FALSE),]

Modalization (4)

Latitude

  • Linear Regression
#Building 0
# Split data into trainset and testset (80/20)
set.seed(212)
inTraining <- createDataPartition(train4_B0_lat$LATITUDE, p = .8, list = FALSE)
trainset4_B0_lat <- train4_B0_lat[inTraining,]
testset4_B0_lat <- train4_B0_lat[-inTraining,]

#Train model
mod4_lm_B0_lat <- train(LATITUDE~., 
                  data = trainset4_B0_lat %>%
                    select(starts_with("WAP"), LATITUDE),
                  method = "lm",
                  trControl = trainControl(method = "repeatedcv", 
                                           number = 5, 
                                           repeats = 1))
#train results
mod4_lm_B0_lat

#test results
pred4_lm_B0_lat_test <- predict(mod4_lm_B0_lat, newdata = testset4_B0_lat)
postResample(testset4_B0_lat$LATITUDE, pred4_lm_B0_lat_test)

#validation results
pred4_lm_B0_lat_validation <- predict(object = mod4_lm_B0_lat, newdata = valid4_B0) 
postResample(valid4_B0$LATITUDE, pred4_lm_B0_lat_validation) 

#Building 1
# Split data into trainset and testset (80/20)
set.seed(212)
inTraining <- createDataPartition(train4_B1_lat$LATITUDE, p = .8, list = FALSE)
trainset4_B1_lat <- train4_B1_lat[inTraining,]
testset4_B1_lat <- train4_B1_lat[-inTraining,]

#Train model
mod4_lm_B1_lat <- train(LATITUDE~., 
                   data = trainset4_B1_lat %>%
                     select(starts_with("WAP"), LATITUDE),
                   method = "lm",
                   trControl = trainControl(method = "repeatedcv", 
                                            number = 5, 
                                            repeats = 1))
#train results
mod4_lm_B1_lat

#test results
pred4_lm_B1_lat_test <- predict(mod4_lm_B1_lat, newdata = testset4_B1_lat)
postResample(testset4_B1_lat$LATITUDE, pred4_lm_B1_lat_test)

#validation results
pred4_lm_B1_lat_validation <- predict(object = mod4_lm_B1_lat, newdata = valid4_B1) 
postResample(valid4_B1$LATITUDE, pred4_lm_B1_lat_validation) 

#Building 2
# Split data into trainset and testset (80/20)
set.seed(212)
inTraining <- createDataPartition(train4_B2_lat$LATITUDE, p = .8, list = FALSE)
trainset4_B2_lat <- train4_B2_lat[inTraining,]
testset4_B2_lat <- train4_B2_lat[-inTraining,]

#Train model
mod4_lm_B2_lat <- train(LATITUDE~., 
                   data = trainset4_B2_lat %>%
                     select(starts_with("WAP"), LATITUDE),
                   method = "lm",
                   trControl = trainControl(method = "repeatedcv", 
                                            number = 5, 
                                            repeats = 1))
#train results
mod4_lm_B2_lat

#test results
pred4_lm_B2_lat_test <- predict(mod4_lm_B2_lat, newdata = testset4_B2_lat)
postResample(testset4_B2_lat$LATITUDE, pred4_lm_B2_lat_test)

#validation results
pred4_lm_B2_lat_validation <- predict(object = mod4_lm_B2_lat, newdata = valid4_B2) 
postResample(valid4_B2$LATITUDE, pred4_lm_B2_lat_validation) 
  • k-NN
#Building 0
#Train model
mod4_knn_B0_lat <- train(LATITUDE~., 
                   data = trainset4_B0_lat %>%
                     select(starts_with("WAP"), LATITUDE),
                   method = "knn",
                   trControl = trainControl(method = "repeatedcv", 
                                            number = 5, 
                                            repeats = 1))
#train results
mod4_knn_B0_lat

#test results
pred4_knn_B0_lat_test <- predict(mod4_knn_B0_lat, newdata = testset4_B0_lat)
postResample(testset4_B0_lat$LATITUDE, pred4_knn_B0_lat_test)

#validation results
pred4_knn_B0_lat_validation <- predict(object = mod4_knn_B0_lat, newdata = valid4_B0) 
postResample(valid4_B0$LATITUDE, pred4_knn_B0_lat_validation) 

#Building 1
#Train model
mod4_knn_B1_lat <- train(LATITUDE~., 
                    data = trainset4_B1_lat %>%
                      select(starts_with("WAP"), LATITUDE),
                    method = "knn",
                    trControl = trainControl(method = "repeatedcv", 
                                             number = 5, 
                                             repeats = 1))
#train results
mod4_knn_B1_lat

#test results
pred4_knn_B1_lat_test <- predict(mod4_knn_B1_lat, newdata = testset4_B1_lat)
postResample(testset4_B1_lat$LATITUDE, pred4_knn_B1_lat_test)

#validation results
pred4_knn_B1_lat_validation <- predict(object = mod4_knn_B1_lat, newdata = valid4_B1) 
postResample(valid4_B1$LATITUDE, pred4_knn_B1_lat_validation) 

#Building 2
#Train model
mod4_knn_B2_lat <- train(LATITUDE~., 
                    data = trainset4_B2_lat %>%
                      select(starts_with("WAP"), LATITUDE),
                    method = "knn",
                    trControl = trainControl(method = "repeatedcv", 
                                             number = 5, 
                                             repeats = 1))
#train results
mod4_knn_B2_lat

#test results
pred4_knn_B2_lat_test <- predict(mod4_knn_B2_lat, newdata = testset4_B2_lat)
postResample(testset4_B2_lat$LATITUDE, pred4_knn_B2_lat_test)

#validation results
pred4_knn_B2_lat_validation <- predict(object = mod4_knn_B2_lat, newdata = valid4_B2) 
postResample(valid4_B2$LATITUDE, pred4_knn_B2_lat_validation)
  • Random forest
#Building 0
#Train model
mod4_rf_B0_lat <- train(LATITUDE~., 
                    data = trainset4_B0_lat %>%
                      select(starts_with("WAP"), LATITUDE),
                    method = "rf",
                    trControl = trainControl(method = "repeatedcv", 
                                             number = 5, 
                                             repeats = 1))
#train results
mod4_rf_B0_lat

#test results
pred4_rf_B0_lat_test <- predict(mod4_rf_B0_lat, newdata = testset4_B0_lat)
postResample(testset4_B0_lat$LATITUDE, pred4_rf_B0_lat_test)

#validation results
pred4_rf_B0_lat_validation <- predict(object = mod4_rf_B0_lat, newdata = valid4_B0) 
postResample(valid4_B0$LATITUDE, pred4_rf_B0_lat_validation) 

#Building 1
#Train model
mod4_rf_B1_lat <- train(LATITUDE~., 
                   data = trainset4_B1_lat %>%
                     select(starts_with("WAP"), LATITUDE),
                   method = "rf",
                   trControl = trainControl(method = "repeatedcv", 
                                            number = 5, 
                                            repeats = 1))
#train results
mod4_rf_B1_lat

#test results
pred4_rf_B1_lat_test <- predict(mod4_rf_B1_lat, newdata = testset4_B1_lat)
postResample(testset4_B1_lat$LATITUDE, pred4_rf_B1_lat_test)

#validation results
pred4_rf_B1_lat_validation <- predict(object = mod4_rf_B1_lat, newdata = valid4_B1) 
postResample(valid4_B1$LATITUDE, pred_rf4_B1_lat_validation) 

#Building 2
#Train model
mod4_rf_B2_lat <- train(LATITUDE~., 
                   data = trainset4_B2_lat %>%
                     select(starts_with("WAP"), LATITUDE),
                   method = "rf",
                   trControl = trainControl(method = "repeatedcv", 
                                            number = 5, 
                                            repeats = 1))
#train results
mod4_rf_B2_lat

#test results
pred4_rf_B2_lat_test <- predict(mod4_rf_B2_lat, newdata = testset4_B2_lat)
postResample(testset4_B2_lat$LATITUDE, pred4_rf_B2_lat_test)

#validation results
pred4_rf_B2_lat_validation <- predict(object = mod4_rf_B2_lat, newdata = valid4_B2) 
postResample(valid4_B2$LATITUDE, pred4_rf_B2_lat_validation) 

Longitude

  • Linear Regression
#Building 0
# Split data into trainset and testset (80/20)
set.seed(212)
inTraining <- createDataPartition(train4_B0_lon$LONGITUDE, p = .8, list = FALSE)
trainset4_B0_lon <- train4_B0_lon[inTraining,]
testset4_B0_lon <- train4_B0_lon[-inTraining,]

#Train model
mod4_lm_B0_lon <- train(LONGITUDE~., 
                   data = trainset4_B0_lon %>%
                     select(starts_with("WAP"), LONGITUDE),
                   method = "lm",
                   trControl = trainControl(method = "repeatedcv", 
                                            number = 5, 
                                            repeats = 1))
#train results
mod4_lm_B0_lon

#test results
pred4_lm_B0_lon_test <- predict(mod4_lm_B0_lon, newdata = testset4_B0_lon)
postResample(testset4_B0_lon$LONGITUDE, pred4_lm_B0_lon_test)

#validation results
pred4_lm_B0_lon_validation <- predict(object = mod4_lm_B0_lon, newdata = valid4_B0) 
postResample(valid4_B0$LONGITUDE, pred4_lm_B0_lon_validation) 

#Building 1
# Split data into trainset and testset (80/20)
set.seed(212)
inTraining <- createDataPartition(train4_B1_lon$LONGITUDE, p = .8, list = FALSE)
trainset4_B1_lon <- train4_B1_lon[inTraining,]
testset4_B1_lon <- train4_B1_lon[-inTraining,]

#Train model
mod4_lm_B1_lon <- train(LONGITUDE~., 
                   data = trainset4_B1_lon %>%
                     select(starts_with("WAP"), LONGITUDE),
                   method = "lm",
                   trControl = trainControl(method = "repeatedcv", 
                                            number = 5, 
                                            repeats = 1))
#train results
mod4_lm_B1_lon

#test results
pred4_lm_B1_lon_test <- predict(mod4_lm_B1_lon, newdata = testset4_B1_lon)
postResample(testset4_B1_lon$LONGITUDE, pred4_lm_B1_lon_test)

#validation results
pred4_lm_B1_lon_validation <- predict(object = mod4_lm_B1_lon, newdata = valid4_B1) 
postResample(valid4_B1$LONGITUDE, pred4_lm_B1_lon_validation) 

#Building 2
# Split data into trainset and testset (80/20)
set.seed(212)
inTraining <- createDataPartition(train4_B2_lon$LONGITUDE, p = .8, list = FALSE)
trainset4_B2_lon <- train4_B2_lon[inTraining,]
testset4_B2_lon <- train4_B2_lon[-inTraining,]

#Train model
mod4_lm_B2_lon <- train(LONGITUDE~., 
                   data = trainset4_B2_lon %>%
                     select(starts_with("WAP"), LONGITUDE),
                   method = "lm",
                   trControl = trainControl(method = "repeatedcv", 
                                            number = 5, 
                                            repeats = 1))
#train results
mod4_lm_B2_lon

#test results
pred4_lm_B2_lon_test <- predict(mod4_lm_B2_lon, newdata = testset4_B2_lon)
postResample(testset4_B2_lon$LONGITUDE, pred4_lm_B2_lon_test)

#validation results
pred4_lm_B2_lon_validation <- predict(object = mod4_lm_B2_lon, newdata = valid4_B2) 
postResample(valid4_B2$LONGITUDE, pred4_lm_B2_lon_validation) 
  • k-NN
#Building 0
#Train model
mod4_knn_B0_lon <- train(LONGITUDE~., 
                    data = trainset4_B0_lon %>%
                      select(starts_with("WAP"), LONGITUDE),
                    method = "knn",
                    trControl = trainControl(method = "repeatedcv", 
                                             number = 5, 
                                             repeats = 1))
#train results
mod4_knn_B0_lon

#test results
pred4_knn_B0_lon_test <- predict(mod4_knn_B0_lon, newdata = testset4_B0_lon)
postResample(testset4_B0_lon$LONGITUDE, pred4_knn_B0_lon_test)

#validation results
pred4_knn_B0_lon_validation <- predict(object = mod4_knn_B0_lon, newdata = valid4_B0) 
postResample(valid4_B0$LONGITUDE, pred4_knn_B0_lon_validation)

#Building 1
#Train model
mod4_knn_B1_lon <- train(LONGITUDE~., 
                    data = trainset4_B1_lon %>%
                      select(starts_with("WAP"), LONGITUDE),
                    method = "knn",
                    trControl = trainControl(method = "repeatedcv", 
                                             number = 5, 
                                             repeats = 1))
#train results
mod4_knn_B1_lon

#test results
pred4_knn_B1_lon_test <- predict(mod4_knn_B1_lon, newdata = testset4_B1_lon)
postResample(testset4_B1_lon$LONGITUDE, pred4_knn_B1_lon_test)

#validation results
pred4_knn_B1_lon_validation <- predict(object = mod4_knn_B1_lon, newdata = valid4_B1) 
postResample(valid4_B1$LONGITUDE, pred4_knn_B1_lon_validation) 

#Building 2
#Train model
mod4_knn_B2_lon <- train(LONGITUDE~., 
                    data = trainset4_B2_lon %>%
                      select(starts_with("WAP"), LONGITUDE),
                    method = "knn",
                    trControl = trainControl(method = "repeatedcv", 
                                             number = 5, 
                                             repeats = 1))
#train results
mod4_knn_B2_lon

#test results
pred4_knn_B2_lon_test <- predict(mod4_knn_B2_lon, newdata = testset4_B2_lon)
postResample(testset4_B2_lon$LONGITUDE, pred4_knn_B2_lon_test)

#validation results
pred4_knn_B2_lon_validation <- predict(object = mod4_knn_B2_lon, newdata = valid4_B2) 
postResample(valid4_B2$LONGITUDE, pred4_knn_B2_lon_validation) 
  • Random forest
#Building 0
#Train model
mod4_rf_B0_lon <- train(LONGITUDE~., 
                   data = trainset4_B0_lon %>%
                     select(starts_with("WAP"), LONGITUDE),
                   method = "rf",
                   trControl = trainControl(method = "repeatedcv", 
                                            number = 5, 
                                            repeats = 1))
#train results
mod4_rf_B0_lon

#test results
pred4_rf_B0_lon_test <- predict(mod4_rf_B0_lon, newdata = testset4_B0_lon)
postResample(testset4_B0_lon$LONGITUDE, pred4_rf_B0_lon_test)

#validation results
pred4_rf_B0_lon_validation <- predict(object = mod4_rf_B0_lon, newdata = valid4_B0) 
postResample(valid4_B0$LONGITUDE, pred4_rf_B0_lon_validation) 

#Building 1
#Train model
mod4_rf_B1_lon <- train(LONGITUDE~., 
                   data = trainset4_B1_lon %>%
                     select(starts_with("WAP"), LONGITUDE),
                   method = "rf",
                   trControl = trainControl(method = "repeatedcv", 
                                            number = 5, 
                                            repeats = 1))
#train results
mod4_rf_B1_lon

#test results
pred4_rf_B1_lon_test <- predict(mod4_rf_B1_lon, newdata = testset4_B1_lon)
postResample(testset4_B1_lon$LONGITUDE, pred4_rf_B1_lon_test)

#validation results
pred4_rf_B1_lon_validation <- predict(object = mod4_rf_B1_lon, newdata = valid4_B1) 
postResample(valid4_B1$LONGITUDE, pred4_rf_B1_lon_validation) 

#Building 2
#Train model
mod4_rf_B2_lon <- train(LONGITUDE~., 
                   data = trainset4_B2_lon %>%
                     select(starts_with("WAP"), LONGITUDE),
                   method = "rf",
                   trControl = trainControl(method = "repeatedcv", 
                                            number = 5, 
                                            repeats = 1))
#train results
mod4_rf_B2_lon

#test results
pred4_rf_B2_lon_test <- predict(mod4_rf_B2_lon, newdata = testset4_B2_lon)
postResample(testset4_B2_lon$LONGITUDE, pred4_rf_B2_lon_test)

#validation results
pred4_rf_B2_lon_validation <- predict(object = mod4_rf_B2_lon, newdata = valid4_B2) 
postResample(valid4_B2$LONGITUDE, pred4_rf_B2_lon_validation) 

Performance metrics of Latitude

Performance metrics of Longitude


Distribution of absolute errors (density plot)

Distribution of relative errors (density plot)

***

Predicted vs. observed value


Quality of predictions

We defined a good prediction as an error range of less than 7.5 meters, whereas a moderate prediction was defined as >7.5 and <15 meters, and a bad prediction as >15 meters. The range was calculated by Euclidean distance, by taking the square root of the sum of squared errors of latitude and longitude. Plotting Manhattan distance gave similar results. k-NN had the best quality predictions reflected by a better good/moderate/bad distribution compared with LM and Random forest (55%/32%/13%).


Summary

For this regression problem we used 3 Machine Learning algorithms. kNN and Random forest had the best performance metrics with median errors of 3.5 vs. 4.0 meter (latitude) and 3.6 vs. 5.1 meter (longitude) respectively. However it should be noted that the standard deviation of the errors are relatively large, as illustrated by the density plots. The outliers are mostly located in building 2. Rescaling of the WAP values (to 0-105) during pre-processing (2) resulted in the largest relative improvement in performance metrics, compared with other pre-processing steps. Filtering out the range -30 to 0 dBm was necessary, since -30 dBm is the maximal achievable RSSI value.


Conclusion

The performance of Wi-Fi fingerprinting used for indoor positioning by predicting latitude and longitude is relatively inaccurate. However the interpretation depends on how we define the treshold for a good prediction. kNN can be considered as a feasible method if we accept an error range of 7.5 meter.


Recommendation

Reduce redundant access points Some WAPs are located close to eachother providing the same information, it will improve the model if the WAPs are equally distributed across the buildings.

Analysis of internal structure Also, it could be useful to perform analysis on how the internal structure of the building is related to the access points. Are there spaces where the access points are blocked by a wall, or closet that interfere with the WAP signal? Maybe reorganizing the access points will minimize this interference.

Reduce time between training and validation, and update/refine the model Third, reducing the time between training and validation using the same WAPs will increase the generalizability. The validation fingerprints were taken 3 months later and 55 new WAPs appeared in the validation phase. The implication is that, whenever new WAPs are introduced, it is important to update and (re)train the models, using the new WAPs as predictors to achieve better results.