library(dplyr)
library(ggplot2)
library(recommenderlab)
library(kableExtra)
By taking data review data from Amazon, in this project a guideline to build and evaluate a recommender system will first be presented. First a model will be designed and evaluated both on ratings and recommendations. From there the recommendations evaluation will be used to select the best out of several different models designed following the same steps as our first model. Model selection will be based on RSME. With that best model on hand, both diversity and novelty will be introduced. The effects of both on the performance metric RSME will be presented. Finally in the conclusion section a short discussion on possible online evaluation improvements will be presented.
Datasets are available on Julian McAuley’s site: http://jmcauley.ucsd.edu/data/amazon/links.html
Data for instant video was selected. Data is present in rows for each user/item rating entry. The data was transformed into a user/item utility matrix and then transformed into the recommenderlab library’s sparse matrix data class.
dataRaw <- read.csv("ratings_Amazon_Instant_Video.csv", header = FALSE,colClasses=c("character","character","character","NULL"))[1:80000,]
colnames(dataRaw)<-c('user','item','rating')
numberOfUsers<-length(unique(dataRaw[["user"]]))
numberOfItems<-length(unique(dataRaw[["item"]]))
utilMatrix<-data.frame(matrix(NA, ncol = numberOfItems, nrow = numberOfUsers))
colnames(utilMatrix)<-unique(dataRaw[["item"]])
rownames(utilMatrix)<-unique(dataRaw[["user"]])
for(i in 1:nrow(dataRaw)) {
utilMatrix[dataRaw[i,1],dataRaw[i,2]]<-dataRaw[i,3]
}
#head(utilMatrix,20) %>% kable() %>% kable_styling() %>% scroll_box(width = "800px", height = "400px")
dataAll <- as(as.matrix(utilMatrix), "realRatingMatrix")
dataAll
## 70125 x 4733 rating matrix of class 'realRatingMatrix' with 80000 ratings.
First step in data preparation was to select only users and items with 5 or more ratings. This with the objective of avoiding items and users with very low entries which do not add much information to the system. Although this might have an effect on both diversity and novelty as will be evident in that section of the project.
data<-dataAll[rowCounts(dataAll)>5,]
data<-data[,colCounts(data)>5]
data<-data[rowCounts(data)>5,]
data
## 38 x 67 rating matrix of class 'realRatingMatrix' with 319 ratings.
The heat map of the data shows the reduced set of users and items. It is also somewhat evident how each user and item have several, more than 5, ratings.
image(data)
We can also observe the average ratings per user, which is important when deciding if normalization is required.
averageRatingPerUser<-rowMeans(data)
hist(averageRatingPerUser)
As seen in the data exploration section, user ratings seem to be skewed towards the high side of the rating scale. For this reason we normalize the data before using it to build our recommender.
dataNorm<-normalize(data)
averageRatingPerUser<-rowMeans(dataNorm)
hist(averageRatingPerUser)
The new heat-map shows our normalized dataset, now with continuous ratings.
image(dataNorm)
We can see how now the ratings are centralized around 0.
sum(rowMeans(dataNorm))
## [1] 1.383021e-15
For our recommender evaluation we do a simple split of our data selecting 80% for training and the remaining 20% for testing. Here we use the bootstrap function in recommenderlab to increase the number of cases to use. Also to note is defining what a good rating is, this will be used during one of the recommender evaluation methods.
percentTrain<-0.8
min(rowCounts(data))
## [1] 6
keep<-5
goodRating<-3
nEval<-1
evalSets<-evaluationScheme(data=data,method="bootstrap",train=percentTrain,given=keep,goodRating=goodRating,k=nEval)
evalSets
## Evaluation scheme with 5 items given
## Method: 'bootstrap' with 1 run(s).
## Training set proportion: 0.800
## Good ratings: >=3.000000
## Data set: 38 x 67 rating matrix of class 'realRatingMatrix' with 319 ratings.
We verify the amount of data we have in our training dataset.
getData(evalSets,"train")
## 30 x 67 rating matrix of class 'realRatingMatrix' with 236 ratings.
As we want to build several models and select the best performing one before experimenting with diversity and novelty, we start by going thru all the steps to design a recommender with an initial model. We start by using our training data to train an item based collaborative filtering recommender.
model<-"IBCF"
param<-NULL
evalRecommender<-Recommender(data = getData(evalSets,"train"),method=model,parameter=param)
itemsToRecommend<-5
evalPrediction<-predict(object = evalRecommender,newdata=getData(evalSets,"known"),n=itemsToRecommend,type="ratings")
A histogram shows us the distribution of predicted recomendations from our first model.
hist(rowCounts(evalPrediction))
To select a model we will need a way to measure their performance, we look at two ways of measuring how good predictions from a recommender are, the first based on RMSE (Root Mean Squared Error), the second on a confusion metric and ROC and precision/recall curves.
First evaluation method compares the estimated ratings with the real ones and then calculating the difference as a mean error.
evalAccuracy<-calcPredictionAccuracy(x=evalPrediction,data=getData(evalSets,"unknown"),byUser=FALSE)
head(evalAccuracy)
## RMSE MSE MAE
## 1.1570069 1.3386649 0.8556967
Another was to evaluate a recommender is to compare the recommendations from the model with actual positive ratings. Here we use our selection of good rating (defined as 3 in the data split section) as the threshold between good and bad recommendations.
results<-evaluate(x=evalSets,method=model,n=seq(10,100,10))
## IBCF run fold/sample [model time/prediction time]
## 1 [0.032sec/0.007sec]
getConfusionMatrix(results)[[1]]
## TP FP FN TN precision recall TPR
## 10 0.6470588 7.588235 3.0000000 50.76471 0.07857143 0.1420513 0.1420513
## 20 1.1764706 15.294118 2.4705882 43.05882 0.07142857 0.3834188 0.3834188
## 30 1.8823529 22.764706 1.7647059 35.58824 0.07627258 0.4958974 0.4958974
## 40 2.5882353 28.058824 1.0588235 30.29412 0.08049752 0.6032479 0.6032479
## 50 2.7647059 29.588235 0.8823529 28.76471 0.07672146 0.6232479 0.6232479
## 60 2.7647059 29.588235 0.8823529 28.76471 0.07672146 0.6232479 0.6232479
## 70 2.7647059 29.588235 0.8823529 28.76471 0.07672146 0.6232479 0.6232479
## 80 2.7647059 29.588235 0.8823529 28.76471 0.07672146 0.6232479 0.6232479
## 90 2.7647059 29.588235 0.8823529 28.76471 0.07672146 0.6232479 0.6232479
## 100 2.7647059 29.588235 0.8823529 28.76471 0.07672146 0.6232479 0.6232479
## FPR
## 10 0.1308902
## 20 0.2644881
## 30 0.3929107
## 40 0.4854912
## 50 0.5147154
## 60 0.5147154
## 70 0.5147154
## 80 0.5147154
## 90 0.5147154
## 100 0.5147154
The confusion matrix gives us results for all the positive and negative predictions, but we can also look at the ROC curve.We will use this curve to compare different model and select the best performing.
plot(results,annotate=TRUE)
Precision or the percent of recommendation items that have been purchased, together with recall the percent of purchased items that have been recommended, are also good metrics. We can plot these results as per below.
plot(results,"prec/rec",annotate=TRUE)
In general a good recommender should have balanced precession recall. A metric also often used is the F2-score, which provides a way to combine the two in a single metric.
Now that we have a way to train a model and measure its performance, we build several model and compare their results. We select a model based on a recommendation evaluation. We build four model in total, all collaborative filtering, two item based, two user based filtering, and two using cosine distance and two using pearson distance.
models<-list(
IBCFcos=list(name="IBCF",param=list(method="cosine")),
IBCFpearson=list(name="IBCF",param=list(method="pearson")),
UBCFcos=list(name="UBCF",param=list(method="cosine")),
UBCFpearson=list(name="UBCF",param=list(method="pearson"))
)
nRecommendations<-c(1,5,seq(10,100,10))
results<-evaluate(x=evalSets,method = models,n=nRecommendations)
## IBCF run fold/sample [model time/prediction time]
## 1 [0.007sec/0.029sec]
## IBCF run fold/sample [model time/prediction time]
## 1 [0.006sec/0.005sec]
## UBCF run fold/sample [model time/prediction time]
## 1 [0.001sec/0.009sec]
## UBCF run fold/sample [model time/prediction time]
## 1 [0sec/0.008sec]
We can calculate their performance, here we show the results for one of the four models.
averageMatrices<-lapply(results,avg)
averageMatrices[1]
## $IBCFcos
## TP FP FN TN precision recall TPR
## 1 0.05882353 0.7647059 3.5882353 57.58824 0.07142857 0.005128205 0.005128205
## 5 0.29411765 3.8235294 3.3529412 54.52941 0.07142857 0.094700855 0.094700855
## 10 0.64705882 7.5882353 3.0000000 50.76471 0.07857143 0.142051282 0.142051282
## 20 1.17647059 15.2941176 2.4705882 43.05882 0.07142857 0.383418803 0.383418803
## 30 1.88235294 22.7647059 1.7647059 35.58824 0.07627258 0.495897436 0.495897436
## 40 2.58823529 28.0588235 1.0588235 30.29412 0.08049752 0.603247863 0.603247863
## 50 2.76470588 29.5882353 0.8823529 28.76471 0.07672146 0.623247863 0.623247863
## 60 2.76470588 29.5882353 0.8823529 28.76471 0.07672146 0.623247863 0.623247863
## 70 2.76470588 29.5882353 0.8823529 28.76471 0.07672146 0.623247863 0.623247863
## 80 2.76470588 29.5882353 0.8823529 28.76471 0.07672146 0.623247863 0.623247863
## 90 2.76470588 29.5882353 0.8823529 28.76471 0.07672146 0.623247863 0.623247863
## 100 2.76470588 29.5882353 0.8823529 28.76471 0.07672146 0.623247863 0.623247863
## FPR
## 1 0.01314330
## 5 0.06607009
## 10 0.13089020
## 20 0.26448813
## 30 0.39291070
## 40 0.48549124
## 50 0.51471539
## 60 0.51471539
## 70 0.51471539
## 80 0.51471539
## 90 0.51471539
## 100 0.51471539
From the performance results we can plot the ROC curve for all and select the best performing.
plot(results,annotate=2)
It is somewhat hard to select a model so far, so we look at the precision/recall plot.
plot(results,"prec/rec",annotate=1)
From this plot we select UBCF with cosine distance as the best performing algorithm.
Now that we have a selected model using recommendations, we look at its performance with rating by calculating the RMSE. We will use this metric to observe the effect of diversity and novelty in the recommender’s performance.
model<-"UBCF"
param<-list(method="cosine")
evalRecommender<-Recommender(data = getData(evalSets,"train"),method=model,parameter=param)
itemsToRecommend<-5
evalPrediction<-predict(object = evalRecommender,newdata=getData(evalSets,"known"),n=itemsToRecommend,type="ratings")
evalAccuracy<-calcPredictionAccuracy(x=evalPrediction,data=getData(evalSets,"unknown"),byUser=FALSE)
head(evalAccuracy)
## RMSE MSE MAE
## 0.9240670 0.8538998 0.7599519
To include diversity in our results, we will take items with poor reviews and will increase their ratings so that they show in the recommender’s results. We will change 20% of the ratings at or below 3 to 5.
matrix<-as(data,"matrix")
lowRatingsInd<-(which(matrix<=3,arr.ind = TRUE))
lowRatingsIndchangeSize<-round(length(lowRatingsInd)/2*0.2,0)
lowRatingsIndToChange<-round(sample(1:length(lowRatingsInd)/2,lowRatingsIndchangeSize),0)
for(i in 1:length(lowRatingsIndToChange)) {
matrix[lowRatingsInd[lowRatingsIndToChange[i],1],lowRatingsInd[lowRatingsIndToChange[i],2]]<-5
}
dataDiversity <- as(as.matrix(matrix), "realRatingMatrix")
As before we calculate the models performace using ratings and calculating RMSE.
percentTrain<-0.8
min(rowCounts(dataDiversity))
## [1] 6
keep<-5
goodRating<-3
nEval<-1
evalSets<-evaluationScheme(data=dataDiversity,method="bootstrap",train=percentTrain,given=keep,goodRating=goodRating,k=nEval)
evalSets
## Evaluation scheme with 5 items given
## Method: 'bootstrap' with 1 run(s).
## Training set proportion: 0.800
## Good ratings: >=3.000000
## Data set: 38 x 67 rating matrix of class 'realRatingMatrix' with 319 ratings.
model<-"UBCF"
param<-list(method="cosine")
evalRecommender<-Recommender(data = getData(evalSets,"train"),method=model,parameter=param)
itemsToRecommend<-5
evalPrediction<-predict(object = evalRecommender,newdata=getData(evalSets,"known"),n=itemsToRecommend,type="ratings")
evalAccuracy<-calcPredictionAccuracy(x=evalPrediction,data=getData(evalSets,"unknown"),byUser=FALSE)
head(evalAccuracy)
## RMSE MSE MAE
## 1.0027289 1.0054653 0.8720934
As expected the RMSE performance isn’t as “good” as before, but we do have results which are more diverse by including items which would otherwise not be included in the results.
To increase novelty in the results, we will increase the ratings of unpopular items (items with very few rating, or items with many NA ratings), that is items which do not have a lot of reviews.
matrix<-as(data,"matrix")
head(as.data.frame(matrix)) %>% kable() %>% kable_styling() %>% scroll_box(width = "800px", height = "400px")
| B000H4YNM0 | B000HAB4NK | B000HZEHL6 | B000IBUIS0 | B000IJRK7O | B000IK882Y | B000IXUOP0 | B000J0Q1KO | B000JO9JHW | B000MPGI68 | B000MVIAUY | B000MVN8GK | B000OC2T4Y | B000OGTRC2 | B000OJFV3S | B000U6I0B0 | B000UERE3M | B000ULTV4U | B000UU4IX0 | B000UVUMDO | B000VU2SW2 | B000VU4GW2 | B000VU6K0I | B000VU8DCQ | B000VVEDWO | B000VZUWZW | B000W4BJ7C | B000W4WD40 | B000WFFAIA | B000WT7R6O | B000ZU1KMW | B0011NHCYM | B0012H0DBG | B0012QRPU4 | B0013Z0PLA | B00150OTCE | B0019CY40U | B0019D3I4W | B001CB8NQE | B001CEB4YE | B001CH44YI | B001ENSOVO | B001EUE6X2 | B001EUKHRG | B001EXRQQ8 | B001F6ZIXC | B001N4PPWK | B001PN63PC | B001QDKK3W | B001SE07JG | B001TNVEKC | B001W1Y5RK | B0021HAG5E | B00241ASBO | B002962RKE | B002BLCNHY | B002IVNLKA | B002JAU1X0 | B002KEZ91E | B002KGB2EU | B002KGEORM | B002N4QCTE | B002QS5OQ4 | B002TNS012 | B002TWWOHE | B002VVCMTS | B002Y0TGOU | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| AJKWF4W7QD4NS | 5 | NA | NA | NA | 5 | NA | NA | 5 | 5 | NA | NA | NA | NA | NA | 4 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 4 | NA | NA | NA | NA | NA | NA | NA | 4 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 3 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 5 |
| A153NZD2WZN5S3 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 4 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 3 | 3 | 3 | NA | NA | NA | NA | NA | NA | NA | NA | NA | 3 | NA | NA | NA | NA | NA | NA | NA | NA | 3 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |
| A16XRPF40679KG | NA | NA | NA | NA | NA | NA | NA | 5 | NA | NA | NA | NA | NA | NA | NA | 5 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 4 | 5 | NA | NA | 3 | 5 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 3 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |
| A2FRKEXDXDN1KI | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 5 | NA | NA | NA | NA | NA | NA | NA | 5 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 5 | 5 | NA | NA | 5 | NA | 4 | NA | NA | 5 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |
| ALZWY9L4E5GXO | 5 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 5 | 5 | NA | NA | NA | NA | NA | NA | NA | 5 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 5 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 5 | NA | NA | 5 | NA | NA | NA | NA | NA | NA | NA | NA |
| A2QWF2BJ2FM4R2 | 5 | 4 | NA | NA | NA | 5 | NA | NA | NA | 3 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 5 | NA | 3 | NA | NA | NA | NA | NA | 4 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 4 | 3 | NA | NA | NA | NA | NA | NA | 4 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |
As we can see in the head table of our matrix, we have several items that appear to be unpopular, with many NA reviews.
popularity<-as.data.frame(as.data.frame(which(is.na(matrix), TRUE)) %>% group_by('col') %>% count(col))
popularity<-popularity[with(popularity, order(n)), ]
numberOfItemsToChange<-round(nrow(popularity)*0.2,0)
indToChange<-tail(popularity$col,numberOfItemsToChange)
matrix[,indToChange]<-5
head(as.data.frame(matrix)) %>% kable() %>% kable_styling() %>% scroll_box(width = "800px", height = "400px")
| B000H4YNM0 | B000HAB4NK | B000HZEHL6 | B000IBUIS0 | B000IJRK7O | B000IK882Y | B000IXUOP0 | B000J0Q1KO | B000JO9JHW | B000MPGI68 | B000MVIAUY | B000MVN8GK | B000OC2T4Y | B000OGTRC2 | B000OJFV3S | B000U6I0B0 | B000UERE3M | B000ULTV4U | B000UU4IX0 | B000UVUMDO | B000VU2SW2 | B000VU4GW2 | B000VU6K0I | B000VU8DCQ | B000VVEDWO | B000VZUWZW | B000W4BJ7C | B000W4WD40 | B000WFFAIA | B000WT7R6O | B000ZU1KMW | B0011NHCYM | B0012H0DBG | B0012QRPU4 | B0013Z0PLA | B00150OTCE | B0019CY40U | B0019D3I4W | B001CB8NQE | B001CEB4YE | B001CH44YI | B001ENSOVO | B001EUE6X2 | B001EUKHRG | B001EXRQQ8 | B001F6ZIXC | B001N4PPWK | B001PN63PC | B001QDKK3W | B001SE07JG | B001TNVEKC | B001W1Y5RK | B0021HAG5E | B00241ASBO | B002962RKE | B002BLCNHY | B002IVNLKA | B002JAU1X0 | B002KEZ91E | B002KGB2EU | B002KGEORM | B002N4QCTE | B002QS5OQ4 | B002TNS012 | B002TWWOHE | B002VVCMTS | B002Y0TGOU | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| AJKWF4W7QD4NS | 5 | NA | NA | NA | 5 | NA | 5 | 5 | 5 | NA | NA | NA | NA | 5 | 4 | NA | NA | NA | NA | NA | NA | NA | NA | NA | 5 | NA | 4 | NA | NA | NA | NA | NA | 5 | NA | 4 | NA | 5 | 5 | NA | 5 | NA | NA | NA | NA | NA | NA | NA | 5 | NA | NA | NA | 5 | 3 | 5 | NA | NA | 5 | NA | NA | NA | NA | NA | 5 | NA | NA | 5 | 5 |
| A153NZD2WZN5S3 | NA | NA | NA | NA | NA | NA | 5 | NA | NA | NA | NA | NA | NA | 5 | 4 | NA | NA | NA | NA | NA | NA | NA | NA | NA | 5 | NA | NA | NA | NA | NA | NA | NA | 5 | 3 | 3 | 3 | 5 | 5 | NA | 5 | NA | NA | NA | NA | NA | 3 | NA | 5 | NA | NA | NA | 5 | NA | 5 | 3 | NA | 5 | NA | NA | NA | NA | NA | 5 | NA | NA | 5 | NA |
| A16XRPF40679KG | NA | NA | NA | NA | NA | NA | 5 | 5 | NA | NA | NA | NA | NA | 5 | NA | 5 | NA | NA | NA | NA | NA | NA | NA | NA | 5 | NA | NA | NA | 4 | 5 | NA | NA | 5 | 5 | NA | NA | 5 | 5 | NA | 5 | NA | NA | NA | NA | NA | NA | NA | 5 | NA | NA | NA | 5 | 3 | 5 | NA | NA | 5 | NA | NA | NA | NA | NA | 5 | NA | NA | 5 | NA |
| A2FRKEXDXDN1KI | NA | NA | NA | NA | NA | NA | 5 | NA | NA | NA | NA | NA | NA | 5 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 5 | 5 | NA | NA | NA | NA | NA | NA | 5 | 5 | NA | NA | 5 | 5 | NA | 5 | NA | NA | NA | NA | NA | 5 | 5 | 5 | NA | 5 | NA | 5 | NA | 5 | 5 | NA | 5 | NA | NA | NA | NA | NA | 5 | NA | NA | 5 | NA |
| ALZWY9L4E5GXO | 5 | NA | NA | NA | NA | NA | 5 | NA | NA | NA | NA | NA | NA | 5 | NA | NA | 5 | 5 | NA | NA | NA | NA | NA | NA | 5 | 5 | NA | NA | NA | NA | NA | NA | 5 | NA | NA | NA | 5 | 5 | NA | 5 | 5 | NA | NA | NA | NA | NA | NA | 5 | NA | NA | NA | 5 | NA | 5 | NA | 5 | 5 | NA | 5 | NA | NA | NA | 5 | NA | NA | 5 | NA |
| A2QWF2BJ2FM4R2 | 5 | 4 | NA | NA | NA | 5 | 5 | NA | NA | 3 | NA | NA | NA | 5 | NA | NA | NA | NA | NA | NA | 5 | NA | 3 | NA | 5 | NA | NA | NA | 4 | NA | NA | NA | 5 | NA | NA | NA | 5 | 5 | NA | 5 | NA | 4 | 3 | NA | NA | NA | NA | 5 | NA | 4 | NA | 5 | NA | 5 | NA | NA | 5 | NA | NA | NA | NA | NA | 5 | NA | NA | 5 | NA |
dataNovelty <- as(as.matrix(matrix), "realRatingMatrix")
We have changed reviews for unpopular items to 5, making them very popular.
Again we calculate ratings performance and the RMSE metric.
percentTrain<-0.8
keep<-5
goodRating<-3
nEval<-1
evalSets<-evaluationScheme(data=dataNovelty,method="split",train=percentTrain,given=keep,goodRating=goodRating,k=nEval)
evalSets
## Evaluation scheme with 5 items given
## Method: 'split' with 1 run(s).
## Training set proportion: 0.800
## Good ratings: >=3.000000
## Data set: 38 x 67 rating matrix of class 'realRatingMatrix' with 797 ratings.
model<-"UBCF"
param<-list(method="cosine")
evalRecommender<-Recommender(data = getData(evalSets,"train"),method=model,parameter=param)
itemsToRecommend<-5
evalPrediction<-predict(object = evalRecommender,newdata=getData(evalSets,"known"),n=itemsToRecommend,type="ratings")
evalAccuracy<-calcPredictionAccuracy(x=evalPrediction,data=getData(evalSets,"unknown"),byUser=FALSE)
head(evalAccuracy)
## RMSE MSE MAE
## 0.7114389 0.5061453 0.3761004
The results include novel items, that is items that are not popular and that can now make the top results. We can see how including more of these un-popular items, thus increasing novelty, the RMSE actually decreases. In fact if we make all items popular, then RMSE is zero, that’s because we have made all ratings equal to 5, we’ve made all items popular.
In this project we have designed a recommenders using Amazon rating data, described how to run evaluations on it and used those metrics to select a best performing model between a selection of 4 models. After selecting a model we introduced both diversity and novelty and described its effects in the metric, RMSE, used to select the best performing model. There are several definitions of diversity and novelty, for this project a loose definition of them given by Denis Kotkov et. al. in the paper “How does serendipity affect diversity in recommender systems? A serendipity-oriented greedy algorithm” was used. https://link.springer.com/content/pdf/10.1007/s00607-018-0687-5.pdf These and other authors also present different ways to measure the amount of diversity and novelty in different models. Here we introduced both features directly and used RMSE as a metric for performance only.
Data used in this project only had rating entries. If other features of the items are available, those can be used to introduce diversity and novelty. When selecting unpopular items, we could have also used say a genre or item type feature to select items from a classifications the user has not purchased/rated. This would increase the novelty of the recommendations. Such techniques can also be used to introduce serendipity and find items not being looked for. If we think of running the recommender evaluation online, in a configuration as shown in the figure below, we could increase the effectiveness of diversity and novelty by incorporating user feedback.
http://soc-research.org/wp-content/uploads/2014/11/OfflineTest4RS.pdf
For example if we present the user with unpopular items from a different category than those already rated/purchased by her/him, and the user actually selects one of the items, we could then present more unpopular, or even popular items from that new category. By presenting new items from categories other than the one already in the user preferences, we can use online evaluation to learn something new from the user and thus use the result of that new evaluation as input to new recommendations showing items which otherwise would have never been presented to the user.