Introduction

By taking data review data from Amazon, in this project a guideline to build and evaluate a recommender system will first be presented. First a model will be designed and evaluated both on ratings and recommendations. From there the recommendations evaluation will be used to select the best out of several different models designed following the same steps as our first model. Model selection will be based on RSME. With that best model on hand, both diversity and novelty will be introduced. The effects of both on the performance metric RSME will be presented. Finally in the conclusion section a short discussion on possible online evaluation improvements will be presented.

Data

Datasets are available on Julian McAuley’s site: http://jmcauley.ucsd.edu/data/amazon/links.html

Data for instant video was selected. Data is present in rows for each user/item rating entry. The data was transformed into a user/item utility matrix and then transformed into the recommenderlab library’s sparse matrix data class.

dataRaw <- read.csv("ratings_Amazon_Instant_Video.csv", header = FALSE,colClasses=c("character","character","character","NULL"))[1:80000,]
colnames(dataRaw)<-c('user','item','rating')

numberOfUsers<-length(unique(dataRaw[["user"]]))
numberOfItems<-length(unique(dataRaw[["item"]]))
utilMatrix<-data.frame(matrix(NA, ncol = numberOfItems, nrow = numberOfUsers))
colnames(utilMatrix)<-unique(dataRaw[["item"]])
rownames(utilMatrix)<-unique(dataRaw[["user"]])
for(i in 1:nrow(dataRaw)) {
  utilMatrix[dataRaw[i,1],dataRaw[i,2]]<-dataRaw[i,3]
}
#head(utilMatrix,20) %>% kable() %>% kable_styling() %>% scroll_box(width = "800px", height = "400px")

dataAll <- as(as.matrix(utilMatrix), "realRatingMatrix")
dataAll

## 70125 x 4733 rating matrix of class 'realRatingMatrix' with 80000 ratings.

Data Preparation

Relevant Data

First step in data preparation was to select only users and items with 5 or more ratings. This with the objective of avoiding items and users with very low entries which do not add much information to the system. Although this might have an effect on both diversity and novelty as will be evident in that section of the project.

data<-dataAll[rowCounts(dataAll)>5,]
data<-data[,colCounts(data)>5]
data<-data[rowCounts(data)>5,]
data

## 38 x 67 rating matrix of class 'realRatingMatrix' with 319 ratings.

The heat map of the data shows the reduced set of users and items. It is also somewhat evident how each user and item have several, more than 5, ratings.

image(data)

We can also observe the average ratings per user, which is important when deciding if normalization is required.

averageRatingPerUser<-rowMeans(data)
hist(averageRatingPerUser)

Normalize

As seen in the data exploration section, user ratings seem to be skewed towards the high side of the rating scale. For this reason we normalize the data before using it to build our recommender.

dataNorm<-normalize(data)
averageRatingPerUser<-rowMeans(dataNorm)
hist(averageRatingPerUser)

The new heat-map shows our normalized dataset, now with continuous ratings.

image(dataNorm)

We can see how now the ratings are centralized around 0.

sum(rowMeans(dataNorm))

## [1] 1.383021e-15

Split Training/Test

For our recommender evaluation we do a simple split of our data selecting 80% for training and the remaining 20% for testing. Here we use the bootstrap function in recommenderlab to increase the number of cases to use. Also to note is defining what a good rating is, this will be used during one of the recommender evaluation methods.

percentTrain<-0.8
min(rowCounts(data))

## [1] 6

keep<-5
goodRating<-3
nEval<-1
evalSets<-evaluationScheme(data=data,method="bootstrap",train=percentTrain,given=keep,goodRating=goodRating,k=nEval)
evalSets

## Evaluation scheme with 5 items given
## Method: 'bootstrap' with 1 run(s).
## Training set proportion: 0.800
## Good ratings: >=3.000000
## Data set: 38 x 67 rating matrix of class 'realRatingMatrix' with 319 ratings.

We verify the amount of data we have in our training dataset.

getData(evalSets,"train")

## 30 x 67 rating matrix of class 'realRatingMatrix' with 236 ratings.

Initial Model

As we want to build several models and select the best performing one before experimenting with diversity and novelty, we start by going thru all the steps to design a recommender with an initial model. We start by using our training data to train an item based collaborative filtering recommender.

model<-"IBCF"
param<-NULL
evalRecommender<-Recommender(data = getData(evalSets,"train"),method=model,parameter=param)
itemsToRecommend<-5
evalPrediction<-predict(object = evalRecommender,newdata=getData(evalSets,"known"),n=itemsToRecommend,type="ratings")

A histogram shows us the distribution of predicted recomendations from our first model.

hist(rowCounts(evalPrediction))

Evaluating Model

To select a model we will need a way to measure their performance, we look at two ways of measuring how good predictions from a recommender are, the first based on RMSE (Root Mean Squared Error), the second on a confusion metric and ROC and precision/recall curves.

Evaluating the ratings

First evaluation method compares the estimated ratings with the real ones and then calculating the difference as a mean error.

evalAccuracy<-calcPredictionAccuracy(x=evalPrediction,data=getData(evalSets,"unknown"),byUser=FALSE)
head(evalAccuracy)

##      RMSE       MSE       MAE 
## 1.1570069 1.3386649 0.8556967

Evaluating the Recommendations

Another was to evaluate a recommender is to compare the recommendations from the model with actual positive ratings. Here we use our selection of good rating (defined as 3 in the data split section) as the threshold between good and bad recommendations.

results<-evaluate(x=evalSets,method=model,n=seq(10,100,10))

## IBCF run fold/sample [model time/prediction time]
##   1  [0.032sec/0.007sec]

getConfusionMatrix(results)[[1]]

##            TP        FP        FN       TN  precision    recall       TPR
## 10  0.6470588  7.588235 3.0000000 50.76471 0.07857143 0.1420513 0.1420513
## 20  1.1764706 15.294118 2.4705882 43.05882 0.07142857 0.3834188 0.3834188
## 30  1.8823529 22.764706 1.7647059 35.58824 0.07627258 0.4958974 0.4958974
## 40  2.5882353 28.058824 1.0588235 30.29412 0.08049752 0.6032479 0.6032479
## 50  2.7647059 29.588235 0.8823529 28.76471 0.07672146 0.6232479 0.6232479
## 60  2.7647059 29.588235 0.8823529 28.76471 0.07672146 0.6232479 0.6232479
## 70  2.7647059 29.588235 0.8823529 28.76471 0.07672146 0.6232479 0.6232479
## 80  2.7647059 29.588235 0.8823529 28.76471 0.07672146 0.6232479 0.6232479
## 90  2.7647059 29.588235 0.8823529 28.76471 0.07672146 0.6232479 0.6232479
## 100 2.7647059 29.588235 0.8823529 28.76471 0.07672146 0.6232479 0.6232479
##           FPR
## 10  0.1308902
## 20  0.2644881
## 30  0.3929107
## 40  0.4854912
## 50  0.5147154
## 60  0.5147154
## 70  0.5147154
## 80  0.5147154
## 90  0.5147154
## 100 0.5147154

The confusion matrix gives us results for all the positive and negative predictions, but we can also look at the ROC curve.We will use this curve to compare different model and select the best performing.

plot(results,annotate=TRUE)

Precision or the percent of recommendation items that have been purchased, together with recall the percent of purchased items that have been recommended, are also good metrics. We can plot these results as per below.

plot(results,"prec/rec",annotate=TRUE)

In general a good recommender should have balanced precession recall. A metric also often used is the F2-score, which provides a way to combine the two in a single metric.

Selecting a Model

Now that we have a way to train a model and measure its performance, we build several model and compare their results. We select a model based on a recommendation evaluation. We build four model in total, all collaborative filtering, two item based, two user based filtering, and two using cosine distance and two using pearson distance.

models<-list(
  IBCFcos=list(name="IBCF",param=list(method="cosine")),
  IBCFpearson=list(name="IBCF",param=list(method="pearson")),
  UBCFcos=list(name="UBCF",param=list(method="cosine")),
  UBCFpearson=list(name="UBCF",param=list(method="pearson"))
)
nRecommendations<-c(1,5,seq(10,100,10))
results<-evaluate(x=evalSets,method = models,n=nRecommendations)

## IBCF run fold/sample [model time/prediction time]
##   1  [0.007sec/0.029sec] 
## IBCF run fold/sample [model time/prediction time]
##   1  [0.006sec/0.005sec] 
## UBCF run fold/sample [model time/prediction time]
##   1  [0.001sec/0.009sec] 
## UBCF run fold/sample [model time/prediction time]
##   1  [0sec/0.008sec]

We can calculate their performance, here we show the results for one of the four models.

averageMatrices<-lapply(results,avg)
averageMatrices[1]

## $IBCFcos
##             TP         FP        FN       TN  precision      recall         TPR
## 1   0.05882353  0.7647059 3.5882353 57.58824 0.07142857 0.005128205 0.005128205
## 5   0.29411765  3.8235294 3.3529412 54.52941 0.07142857 0.094700855 0.094700855
## 10  0.64705882  7.5882353 3.0000000 50.76471 0.07857143 0.142051282 0.142051282
## 20  1.17647059 15.2941176 2.4705882 43.05882 0.07142857 0.383418803 0.383418803
## 30  1.88235294 22.7647059 1.7647059 35.58824 0.07627258 0.495897436 0.495897436
## 40  2.58823529 28.0588235 1.0588235 30.29412 0.08049752 0.603247863 0.603247863
## 50  2.76470588 29.5882353 0.8823529 28.76471 0.07672146 0.623247863 0.623247863
## 60  2.76470588 29.5882353 0.8823529 28.76471 0.07672146 0.623247863 0.623247863
## 70  2.76470588 29.5882353 0.8823529 28.76471 0.07672146 0.623247863 0.623247863
## 80  2.76470588 29.5882353 0.8823529 28.76471 0.07672146 0.623247863 0.623247863
## 90  2.76470588 29.5882353 0.8823529 28.76471 0.07672146 0.623247863 0.623247863
## 100 2.76470588 29.5882353 0.8823529 28.76471 0.07672146 0.623247863 0.623247863
##            FPR
## 1   0.01314330
## 5   0.06607009
## 10  0.13089020
## 20  0.26448813
## 30  0.39291070
## 40  0.48549124
## 50  0.51471539
## 60  0.51471539
## 70  0.51471539
## 80  0.51471539
## 90  0.51471539
## 100 0.51471539

From the performance results we can plot the ROC curve for all and select the best performing.

plot(results,annotate=2)

It is somewhat hard to select a model so far, so we look at the precision/recall plot.

plot(results,"prec/rec",annotate=1)

From this plot we select UBCF with cosine distance as the best performing algorithm.

Selected Model

Now that we have a selected model using recommendations, we look at its performance with rating by calculating the RMSE. We will use this metric to observe the effect of diversity and novelty in the recommender’s performance.

model<-"UBCF"
param<-list(method="cosine")
evalRecommender<-Recommender(data = getData(evalSets,"train"),method=model,parameter=param)
itemsToRecommend<-5
evalPrediction<-predict(object = evalRecommender,newdata=getData(evalSets,"known"),n=itemsToRecommend,type="ratings")

evalAccuracy<-calcPredictionAccuracy(x=evalPrediction,data=getData(evalSets,"unknown"),byUser=FALSE)
head(evalAccuracy)

##      RMSE       MSE       MAE 
## 0.9240670 0.8538998 0.7599519

Business/User Experoence goal

Diversity

To include diversity in our results, we will take items with poor reviews and will increase their ratings so that they show in the recommender’s results. We will change 20% of the ratings at or below 3 to 5.

matrix<-as(data,"matrix")

lowRatingsInd<-(which(matrix<=3,arr.ind = TRUE))

lowRatingsIndchangeSize<-round(length(lowRatingsInd)/2*0.2,0)
lowRatingsIndToChange<-round(sample(1:length(lowRatingsInd)/2,lowRatingsIndchangeSize),0)

for(i in 1:length(lowRatingsIndToChange)) {
  matrix[lowRatingsInd[lowRatingsIndToChange[i],1],lowRatingsInd[lowRatingsIndToChange[i],2]]<-5  
}

dataDiversity <- as(as.matrix(matrix), "realRatingMatrix")

As before we calculate the models performace using ratings and calculating RMSE.

percentTrain<-0.8
min(rowCounts(dataDiversity))

## [1] 6

keep<-5
goodRating<-3
nEval<-1
evalSets<-evaluationScheme(data=dataDiversity,method="bootstrap",train=percentTrain,given=keep,goodRating=goodRating,k=nEval)
evalSets

## Evaluation scheme with 5 items given
## Method: 'bootstrap' with 1 run(s).
## Training set proportion: 0.800
## Good ratings: >=3.000000
## Data set: 38 x 67 rating matrix of class 'realRatingMatrix' with 319 ratings.

model<-"UBCF"
param<-list(method="cosine")
evalRecommender<-Recommender(data = getData(evalSets,"train"),method=model,parameter=param)
itemsToRecommend<-5
evalPrediction<-predict(object = evalRecommender,newdata=getData(evalSets,"known"),n=itemsToRecommend,type="ratings")

evalAccuracy<-calcPredictionAccuracy(x=evalPrediction,data=getData(evalSets,"unknown"),byUser=FALSE)
head(evalAccuracy)

##      RMSE       MSE       MAE 
## 1.0027289 1.0054653 0.8720934

As expected the RMSE performance isn’t as “good” as before, but we do have results which are more diverse by including items which would otherwise not be included in the results.

Novelty

To increase novelty in the results, we will increase the ratings of unpopular items (items with very few rating, or items with many NA ratings), that is items which do not have a lot of reviews.

matrix<-as(data,"matrix")

head(as.data.frame(matrix)) %>% kable() %>% kable_styling() %>% scroll_box(width = "800px", height = "400px")

	B000H4YNM0	B000HAB4NK	B000HZEHL6	B000IBUIS0	B000IJRK7O	B000IK882Y	B000IXUOP0	B000J0Q1KO	B000JO9JHW	B000MPGI68	B000MVIAUY	B000MVN8GK	B000OC2T4Y	B000OGTRC2	B000OJFV3S	B000U6I0B0	B000UERE3M	B000ULTV4U	B000UU4IX0	B000UVUMDO	B000VU2SW2	B000VU4GW2	B000VU6K0I	B000VU8DCQ	B000VVEDWO	B000VZUWZW	B000W4BJ7C	B000W4WD40	B000WFFAIA	B000WT7R6O	B000ZU1KMW	B0011NHCYM	B0012H0DBG	B0012QRPU4	B0013Z0PLA	B00150OTCE	B0019CY40U	B0019D3I4W	B001CB8NQE	B001CEB4YE	B001CH44YI	B001ENSOVO	B001EUE6X2	B001EUKHRG	B001EXRQQ8	B001F6ZIXC	B001N4PPWK	B001PN63PC	B001QDKK3W	B001SE07JG	B001TNVEKC	B001W1Y5RK	B0021HAG5E	B00241ASBO	B002962RKE	B002BLCNHY	B002IVNLKA	B002JAU1X0	B002KEZ91E	B002KGB2EU	B002KGEORM	B002N4QCTE	B002QS5OQ4	B002TNS012	B002TWWOHE	B002VVCMTS	B002Y0TGOU
AJKWF4W7QD4NS	5	NA	NA	NA	5	NA	NA	5	5	NA	NA	NA	NA	NA	4	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	4	NA	NA	NA	NA	NA	NA	NA	4	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	3	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	5
A153NZD2WZN5S3	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	4	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	3	3	3	NA	NA	NA	NA	NA	NA	NA	NA	NA	3	NA	NA	NA	NA	NA	NA	NA	NA	3	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA
A16XRPF40679KG	NA	NA	NA	NA	NA	NA	NA	5	NA	NA	NA	NA	NA	NA	NA	5	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	4	5	NA	NA	3	5	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	3	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA
A2FRKEXDXDN1KI	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	5	NA	NA	NA	NA	NA	NA	NA	5	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	5	5	NA	NA	5	NA	4	NA	NA	5	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA
ALZWY9L4E5GXO	5	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	5	5	NA	NA	NA	NA	NA	NA	NA	5	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	5	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	5	NA	NA	5	NA	NA	NA	NA	NA	NA	NA	NA
A2QWF2BJ2FM4R2	5	4	NA	NA	NA	5	NA	NA	NA	3	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	5	NA	3	NA	NA	NA	NA	NA	4	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	4	3	NA	NA	NA	NA	NA	NA	4	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA

As we can see in the head table of our matrix, we have several items that appear to be unpopular, with many NA reviews.

popularity<-as.data.frame(as.data.frame(which(is.na(matrix), TRUE)) %>% group_by('col') %>% count(col))
popularity<-popularity[with(popularity, order(n)), ]

numberOfItemsToChange<-round(nrow(popularity)*0.2,0)
indToChange<-tail(popularity$col,numberOfItemsToChange)

matrix[,indToChange]<-5

head(as.data.frame(matrix)) %>% kable() %>% kable_styling() %>% scroll_box(width = "800px", height = "400px")

	B000H4YNM0	B000HAB4NK	B000HZEHL6	B000IBUIS0	B000IJRK7O	B000IK882Y	B000IXUOP0	B000J0Q1KO	B000JO9JHW	B000MPGI68	B000MVIAUY	B000MVN8GK	B000OC2T4Y	B000OGTRC2	B000OJFV3S	B000U6I0B0	B000UERE3M	B000ULTV4U	B000UU4IX0	B000UVUMDO	B000VU2SW2	B000VU4GW2	B000VU6K0I	B000VU8DCQ	B000VVEDWO	B000VZUWZW	B000W4BJ7C	B000W4WD40	B000WFFAIA	B000WT7R6O	B000ZU1KMW	B0011NHCYM	B0012H0DBG	B0012QRPU4	B0013Z0PLA	B00150OTCE	B0019CY40U	B0019D3I4W	B001CB8NQE	B001CEB4YE	B001CH44YI	B001ENSOVO	B001EUE6X2	B001EUKHRG	B001EXRQQ8	B001F6ZIXC	B001N4PPWK	B001PN63PC	B001QDKK3W	B001SE07JG	B001TNVEKC	B001W1Y5RK	B0021HAG5E	B00241ASBO	B002962RKE	B002BLCNHY	B002IVNLKA	B002JAU1X0	B002KEZ91E	B002KGB2EU	B002KGEORM	B002N4QCTE	B002QS5OQ4	B002TNS012	B002TWWOHE	B002VVCMTS	B002Y0TGOU
AJKWF4W7QD4NS	5	NA	NA	NA	5	NA	5	5	5	NA	NA	NA	NA	5	4	NA	NA	NA	NA	NA	NA	NA	NA	NA	5	NA	4	NA	NA	NA	NA	NA	5	NA	4	NA	5	5	NA	5	NA	NA	NA	NA	NA	NA	NA	5	NA	NA	NA	5	3	5	NA	NA	5	NA	NA	NA	NA	NA	5	NA	NA	5	5
A153NZD2WZN5S3	NA	NA	NA	NA	NA	NA	5	NA	NA	NA	NA	NA	NA	5	4	NA	NA	NA	NA	NA	NA	NA	NA	NA	5	NA	NA	NA	NA	NA	NA	NA	5	3	3	3	5	5	NA	5	NA	NA	NA	NA	NA	3	NA	5	NA	NA	NA	5	NA	5	3	NA	5	NA	NA	NA	NA	NA	5	NA	NA	5	NA
A16XRPF40679KG	NA	NA	NA	NA	NA	NA	5	5	NA	NA	NA	NA	NA	5	NA	5	NA	NA	NA	NA	NA	NA	NA	NA	5	NA	NA	NA	4	5	NA	NA	5	5	NA	NA	5	5	NA	5	NA	NA	NA	NA	NA	NA	NA	5	NA	NA	NA	5	3	5	NA	NA	5	NA	NA	NA	NA	NA	5	NA	NA	5	NA
A2FRKEXDXDN1KI	NA	NA	NA	NA	NA	NA	5	NA	NA	NA	NA	NA	NA	5	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	5	5	NA	NA	NA	NA	NA	NA	5	5	NA	NA	5	5	NA	5	NA	NA	NA	NA	NA	5	5	5	NA	5	NA	5	NA	5	5	NA	5	NA	NA	NA	NA	NA	5	NA	NA	5	NA
ALZWY9L4E5GXO	5	NA	NA	NA	NA	NA	5	NA	NA	NA	NA	NA	NA	5	NA	NA	5	5	NA	NA	NA	NA	NA	NA	5	5	NA	NA	NA	NA	NA	NA	5	NA	NA	NA	5	5	NA	5	5	NA	NA	NA	NA	NA	NA	5	NA	NA	NA	5	NA	5	NA	5	5	NA	5	NA	NA	NA	5	NA	NA	5	NA
A2QWF2BJ2FM4R2	5	4	NA	NA	NA	5	5	NA	NA	3	NA	NA	NA	5	NA	NA	NA	NA	NA	NA	5	NA	3	NA	5	NA	NA	NA	4	NA	NA	NA	5	NA	NA	NA	5	5	NA	5	NA	4	3	NA	NA	NA	NA	5	NA	4	NA	5	NA	5	NA	NA	5	NA	NA	NA	NA	NA	5	NA	NA	5	NA

dataNovelty <- as(as.matrix(matrix), "realRatingMatrix")

We have changed reviews for unpopular items to 5, making them very popular.

Again we calculate ratings performance and the RMSE metric.

percentTrain<-0.8
keep<-5
goodRating<-3
nEval<-1
evalSets<-evaluationScheme(data=dataNovelty,method="split",train=percentTrain,given=keep,goodRating=goodRating,k=nEval)
evalSets

## Evaluation scheme with 5 items given
## Method: 'split' with 1 run(s).
## Training set proportion: 0.800
## Good ratings: >=3.000000
## Data set: 38 x 67 rating matrix of class 'realRatingMatrix' with 797 ratings.

model<-"UBCF"
param<-list(method="cosine")
evalRecommender<-Recommender(data = getData(evalSets,"train"),method=model,parameter=param)
itemsToRecommend<-5
evalPrediction<-predict(object = evalRecommender,newdata=getData(evalSets,"known"),n=itemsToRecommend,type="ratings")

evalAccuracy<-calcPredictionAccuracy(x=evalPrediction,data=getData(evalSets,"unknown"),byUser=FALSE)
head(evalAccuracy)

##      RMSE       MSE       MAE 
## 0.7114389 0.5061453 0.3761004

The results include novel items, that is items that are not popular and that can now make the top results. We can see how including more of these un-popular items, thus increasing novelty, the RMSE actually decreases. In fact if we make all items popular, then RMSE is zero, that’s because we have made all ratings equal to 5, we’ve made all items popular.

Conclusion

In this project we have designed a recommenders using Amazon rating data, described how to run evaluations on it and used those metrics to select a best performing model between a selection of 4 models. After selecting a model we introduced both diversity and novelty and described its effects in the metric, RMSE, used to select the best performing model. There are several definitions of diversity and novelty, for this project a loose definition of them given by Denis Kotkov et. al. in the paper “How does serendipity affect diversity in recommender systems? A serendipity-oriented greedy algorithm” was used. https://link.springer.com/content/pdf/10.1007/s00607-018-0687-5.pdf These and other authors also present different ways to measure the amount of diversity and novelty in different models. Here we introduced both features directly and used RMSE as a metric for performance only.

Data used in this project only had rating entries. If other features of the items are available, those can be used to introduce diversity and novelty. When selecting unpopular items, we could have also used say a genre or item type feature to select items from a classifications the user has not purchased/rated. This would increase the novelty of the recommendations. Such techniques can also be used to introduce serendipity and find items not being looked for. If we think of running the recommender evaluation online, in a configuration as shown in the figure below, we could increase the effectiveness of diversity and novelty by incorporating user feedback.

http://soc-research.org/wp-content/uploads/2014/11/OfflineTest4RS.pdf

For example if we present the user with unpopular items from a different category than those already rated/purchased by her/him, and the user actually selects one of the items, we could then present more unpopular, or even popular items from that new category. By presenting new items from categories other than the one already in the user preferences, we can use online evaluation to learn something new from the user and thus use the result of that new evaluation as input to new recommendations showing items which otherwise would have never been presented to the user.

CUNY Data 612 - Project 4

Peter Kowalchuk

3/21/2020