This recommender system collects a rating of 10 colors by a sample of 10 people and recommend the highest rated colors. The dataset was created as a matrix of 10 colors columns and 10 people in rows. The values were selected at random ranging from 1-5 , with 1 as the lowest rating and 5 as the highest ratings. The cells with 0 values where replaced with NA. A set.seed(123) was used to retain reproducible values.
set.seed(123)
data.matrix <- matrix(nrow=10, ncol=10)
colnames(data.matrix)<-c(paste("color", 1:10, sep=""))
rownames(data.matrix)<-c(paste("person", 1:10, sep=""))
for (i in 1:10)
{
color.values<-rpois(5, lambda=sample(x=2.5,size=1))
data.matrix[i,] <- color.values
}
data.matrix[data.matrix == 0] <- NA
data.matrix
## color1 color2 color3 color4 color5 color6 color7 color8 color9 color10
## person1 2 1 2 3 NA 2 1 2 3 NA
## person2 4 2 2 5 2 4 2 2 5 2
## person3 2 NA 4 1 NA 2 NA 4 1 NA
## person4 3 2 1 1 4 3 2 1 1 4
## person5 1 1 1 NA NA 1 1 1 NA NA
## person6 2 1 2 NA 1 2 1 2 NA 1
## person7 NA NA NA NA 1 NA NA NA NA 1
## person8 1 1 1 1 2 1 1 1 1 2
## person9 2 NA 1 2 NA 2 NA 1 2 NA
## person10 NA NA 2 2 1 NA NA 2 2 1
The 100 cell matrix was split into a trainset with 70% of the data and a testing dataset with 30% of the data.
#Partition Data
set.seed(123)
trainidx<-sample(nrow(data.matrix),round(0.7*nrow(data.matrix)),replace=F)
traindata<-data.matrix[trainidx,]
testdata<-data.matrix[-trainidx,]
The raw average (mean) value for this dataset is: 1.9259259
We calculate the RMSE by taking the square root of the of mean of datasets minus the mean squared. Na.rm is used to remove na values.
RMSE for the training dataset is: 1.0156525
RMSE for the testing dataset is: 1.0855075
We calculate bias by subtracting dataset mean from the mean of each item. We use rowMeans and colMeans functions to derive the mean of each item.
rdata<-rowMeans(traindata, na.rm=TRUE)-mdata
cdata<-colMeans(testdata, na.rm=TRUE)-mdata
The bias of each person is:
## person3 person10 person2 person8 person6 person9
## 0.40740741 -0.25925926 1.07407407 -0.72592593 -0.42592593 -0.25925926
## person1
## 0.07407407
The bias of each color is:
## color1 color2 color3 color4 color5 color6
## 0.07407407 -0.42592593 -0.92592593 -0.92592593 0.57407407 0.07407407
## color7 color8 color9 color10
## -0.42592593 -0.92592593 -0.92592593 0.57407407
For the baseline predictor, we add the bias of each person plus bias of each color plus the original mean. We rounded each baseline value to 2 decimal places. The results is a matrix of all predicted baseline values. We then caculate the mean of all values in the matrix.
rbdata = NULL
for (j in 1:length(cdata))
{
rbdata<-cbind(rbdata, rdata[1:length(rdata)]%+%cdata[j]+mdata)
}
dfrbdata <- as.matrix(round(rbdata,2))
colnames(dfrbdata)<-colnames(traindata)
rownames(dfrbdata)<-rownames(traindata)
rbmean<-mean(rbdata)
The baseline predictions are:
## color1 color2 color3 color4 color5 color6 color7 color8 color9 color10
## person3 2.41 1.91 1.41 1.41 2.91 2.41 1.91 1.41 1.41 2.91
## person10 1.74 1.24 0.74 0.74 2.24 1.74 1.24 0.74 0.74 2.24
## person2 3.07 2.57 2.07 2.07 3.57 3.07 2.57 2.07 2.07 3.57
## person8 1.27 0.77 0.27 0.27 1.77 1.27 0.77 0.27 0.27 1.77
## person6 1.57 1.07 0.57 0.57 2.07 1.57 1.07 0.57 0.57 2.07
## person9 1.74 1.24 0.74 0.74 2.24 1.74 1.24 0.74 0.74 2.24
## person1 2.07 1.57 1.07 1.07 2.57 2.07 1.57 1.07 1.07 2.57
The mean of all baseline predictions is: 1.5835979
Once again, for RMSE we take the square root of the of mean of baseline predictors minus the mean squared. Na.rm is used to remove na values.
RMSE for the training baseline predictors is: 1.0717922
RMSE for the testing baseline predictors is: 1.0573606
trresults<-round((1-rbtrmse/trrmse)*100,2)
teresults<-round((1-rbtermse/termse)*100,2)
We divide the RMSE of the orginal training and testing data set by the RMSE of the baseline predictor training and test dataset. The end result is a slight improvement and a slight degradation of predictions. The percent improvement on the training dataset is a negative value: -5.53%. However, there is a slight percent improvement for the testing dataset: 2.59%
In conclusion, this recommender system is not a useful system to predict or recommend values. The training and testing results are both low and only the test value has a postive percent value of improvement.
Code used in analysis
knitr::opts_chunk$set(
echo = FALSE,
message = FALSE,
warning = FALSE
)
#knitr::opts_chunk$set(echo = TRUE)
require(knitr)
library(ggplot2)
library(tidyr)
library(MASS)
library(psych)
library(kableExtra)
library(dplyr)
library(faraway)
library(gridExtra)
library(reshape2)
library(reshape)
library(leaps)
library(pROC)
library(caret)
library(naniar)
library(pander)
library(pROC)
library(mlbench)
library(e1071)
library(fpp2)
library(mlr)
set.seed(123)
data.matrix <- matrix(nrow=10, ncol=10)
colnames(data.matrix)<-c(paste("color", 1:10, sep=""))
rownames(data.matrix)<-c(paste("person", 1:10, sep=""))
for (i in 1:10)
{
color.values<-rpois(5, lambda=sample(x=2.5,size=1))
data.matrix[i,] <- color.values
}
data.matrix[data.matrix == 0] <- NA
data.matrix
#Partition Data
set.seed(123)
trainidx<-sample(nrow(data.matrix),round(0.7*nrow(data.matrix)),replace=F)
traindata<-data.matrix[trainidx,]
testdata<-data.matrix[-trainidx,]
mdata<-mean(traindata, na.rm=TRUE)
trrmse<-sqrt(mean((traindata-mdata)^2, na.rm=TRUE))
termse<-sqrt(mean((testdata-mdata)^2, na.rm=TRUE))
rdata<-rowMeans(traindata, na.rm=TRUE)-mdata
cdata<-colMeans(testdata, na.rm=TRUE)-mdata
rdata
cdata
rbdata = NULL
for (j in 1:length(cdata))
{
rbdata<-cbind(rbdata, rdata[1:length(rdata)]%+%cdata[j]+mdata)
}
dfrbdata <- as.matrix(round(rbdata,2))
colnames(dfrbdata)<-colnames(traindata)
rownames(dfrbdata)<-rownames(traindata)
rbmean<-mean(rbdata)
dfrbdata
rbtrmse<-sqrt(mean((traindata-rbmean)^2, na.rm=TRUE))
rbtermse<-sqrt(mean((testdata-rbmean)^2, na.rm=TRUE))
trresults<-round((1-rbtrmse/trrmse)*100,2)
teresults<-round((1-rbtermse/termse)*100,2)