1 Accuracy Comparisons (Deliverable 1)

Back To Top

In our previous assignment, we studied Matrix Factorization Methods. In this assignment, we will practice accuracy comparison methods and implement a user experience goal such as increased serendipity, novelty or diversity.

For this assignment, we chose to use the Serendipity datasets here: https://grouplens.org/datasets/serendipity-2018/. We follow the procedures laid out in Kotkov, Denis et. al. (see references section).

These datasets include 10,000,000 movie ratings. However, to make the dataset size more manageable, the ratings were reduced to include only ratings from the users who were part of the serendipity study (the “answers.csv” file). The ratings_raw dataset below contains 1,446,109 ratings. The dataset was further subsetted to include only movies which had been rated at least 200 times.

Another change we made to the original dataset is to include only six out of eight variations of serendipity, as outlined in the paper, because the two remaining variations are likely to reduce user satisfaction.

1.2 Compare the UBCF and SVD Recommender Models

Back To Top

Now that we have built the two models, we will compare the errors and other metrics for each model:

Model n TP FP FN TN precision recall TPR FPR
1 UBCF 1 0.582645 0.417355 83.801653 103.198347 0.582645 0.007284 0.007284 0.0038
5 UBCF 5 2.704545 2.295455 81.679752 101.320248 0.540909 0.033467 0.033467 0.021074
10 UBCF 10 5.204545 4.795455 79.179752 98.820248 0.520455 0.063962 0.063962 0.044963
20 UBCF 20 10.090909 9.909091 74.293388 93.706612 0.504545 0.122341 0.122341 0.092932
30 UBCF 30 14.774793 15.225207 69.609504 88.390496 0.492493 0.178652 0.178652 0.144006
40 UBCF 40 19.42562 20.57438 64.958678 83.041322 0.48564 0.233862 0.233862 0.195065
50 UBCF 50 23.940083 26.059917 60.444215 77.555785 0.478802 0.287173 0.287173 0.248081
60 UBCF 60 28.394628 31.605372 55.989669 72.010331 0.473244 0.339919 0.339919 0.301327
70 UBCF 70 32.77686 37.22314 51.607438 66.392562 0.468241 0.391357 0.391357 0.355273
80 UBCF 80 37.033058 42.966942 47.35124 60.64876 0.462913 0.441143 0.441143 0.4107
90 UBCF 90 41.475207 48.524793 42.909091 55.090909 0.460836 0.493568 0.493568 0.463697
100 UBCF 100 45.873967 54.126033 38.510331 49.489669 0.45874 0.544786 0.544786 0.517538
1 SVD 1 0.555785 0.444215 83.828512 103.171488 0.555785 0.007033 0.007033 0.004082
5 SVD 5 2.642562 2.357438 81.741736 101.258264 0.528512 0.032719 0.032719 0.021693
10 SVD 10 5.202479 4.797521 79.181818 98.818182 0.520248 0.063456 0.063456 0.044894
20 SVD 20 10.022727 9.977273 74.36157 93.63843 0.501136 0.120824 0.120824 0.094298
30 SVD 30 14.774793 15.225207 69.609504 88.390496 0.492493 0.178756 0.178756 0.143861
40 SVD 40 19.464876 20.535124 64.919421 83.080579 0.486622 0.234016 0.234016 0.194335
50 SVD 50 24.028926 25.971074 60.355372 77.644628 0.480579 0.287551 0.287551 0.245875
60 SVD 60 28.561983 31.438017 55.822314 72.177686 0.476033 0.341256 0.341256 0.299004
70 SVD 70 33.061983 36.938017 51.322314 66.677686 0.472314 0.393982 0.393982 0.351225
80 SVD 80 37.404959 42.595041 46.979339 61.020661 0.467562 0.444631 0.444631 0.405569
90 SVD 90 41.714876 48.285124 42.669421 55.330579 0.463499 0.495099 0.495099 0.460876
100 SVD 100 46.095041 53.904959 38.289256 49.710744 0.46095 0.545345 0.545345 0.51389

We can see that the UBCF model is the more accurate of the two. Next, we look at the RMSE, MSE and MAE.

RMSE MSE MAE
UBCF-Cosine 0.9314149 0.8675337 0.6767562
SVD 0.9341935 0.8727175 0.6749346

Using RMSE, MSE and MAE, we can see that SVD is more accurate.

In addition, we created ROC Curve and Precision-Recall Plots:

Given that the ROC curves are very similar, we will calculate the area under the curve (AUC) to see which is better (visually, the curve seems to favor the UBCF model). AUC is calculated as the area formed by the TPR and FPR coordinates. To calculate the area, we will do the following procedure:
1) Normalize data, so that X and Y axis should be in unity.
2) Use Trapezoidal method to calculate AUC.

AUC
UBCF-Cosine 0.5304783
SVD 0.5318852

As seen above, SVD AUC is slightly higher than UBCF’s.


2 Implement Support for Serendipity (Deliverable 2)

Back To Top

We use the Serendipity dataset, which seems to be the only publicly available dataset, which contains user feedback regarding serendipity on movies.

Our methodology is to include ramdonly sampled movies that contain seredenpity-related ratings into the training dataset, and then measure the impact of the inclusion. To avoid increasing the size of the training dataset, we will reduce its size proportionally to the amount of movies included.

We will measure the impact of the inclusion by varying the number of seredenpity movies into the training set. This will be done via a for loop from 10 to 100%.

set.seed(137)
#loop
vec_s <- seq(10,100,10)
vec_size <- seq(1,10,1)
t_size <- length(t_redux$user)
#using lapply functions to generate all results
# sampling serendipity file
s_sample <- lapply(vec_s, function(n){sample_frac(s,n/100)})
names(s_sample) <- paste0("s_sample", vec_s)
# sample size
s_sample_size <- lapply(vec_size, function(n){length(s_sample[[n]][,1])})
# reducing size of original set through sampling
sample_red <- lapply(vec_size, function(n){1-s_sample_size[[n]] / t_size})
t_sample <- lapply(vec_size, function(n){sample_frac(t_redux, sample_red[[n]])})
#t_sample<-t_redux
#merging data frames
t_s <- lapply(vec_size, function(n){rbind.data.frame(s_sample[[n]], t_sample[[n]])}) 
# coercing into realRatingMatrix
ratings_s <- lapply(vec_size, function(n){as(t_s[[n]], "realRatingMatrix")}) 
# create evaluation scheme
eval_sets_s <- lapply(vec_size, function(n){evaluationScheme(data = ratings_s[[n]], method = "cross-validation", k = 4, given = 5, goodRating =3)})
# build UBCF model and SVD model
ubcf_rec_s <- lapply(vec_size, function(n){Recommender(getData(eval_sets_s[[n]], "train"), "UBCF", param = list(normalize = "center", method = "cosine"))})
svd_rec_s <- lapply(vec_size, function(n){Recommender(getData(eval_sets_s[[n]], "train"), "SVD", param = list(normalize = "center", k=10))})
# Make predictions with each model
ubcf_pred_s <- lapply(vec_size, function(n){predict(ubcf_rec_s[[n]], getData(eval_sets_s[[n]], "known"), type = "ratings")})
svd_pred_s <- lapply(vec_size, function(n){predict(svd_rec_s[[n]], getData(eval_sets_s[[n]], "known"), type = "ratings")})
# Table showing error calcs for UBCF vs SVD
ubcf_er_s <- lapply(vec_size, function(n){calcPredictionAccuracy(ubcf_pred_s[[n]], getData(eval_sets_s[[n]], "unknown"))})
svd_er_s <-lapply(vec_size, function(n){calcPredictionAccuracy(svd_pred_s[[n]], getData(eval_sets_s[[n]], "unknown"))})
# Model evaluation
models_to_evaluate <- list(
    UBCF_cos = list(name = "UBCF", param = list(normalize = "center", method = "cosine")), 
    SVD = list(name = "SVD", param = list(normalize = "center", k=10))
  )
n_recommendations <- c(1, 5, seq(10, 100, 10))
list_results_s <- lapply(vec_size, function(n){evaluate(x = eval_sets_s[[n]], method = models_to_evaluate, n = n_recommendations, progress=FALSE)})
avg_matrices_s <- lapply(vec_size, function(n){lapply(list_results_s[[n]], avg)})
# error tables TP/FP/etc
error_tables_s <- lapply(vec_size, function(n){rbind(cbind(Model = rep("UBCF",12), n = rownames(avg_matrices_s[[n]]$UBCF_cos), avg_matrices_s[[n]]$UBCF_cos), cbind(Model = rep("SVD",12), n = rownames(avg_matrices_s[[n]]$UBCF_cos), avg_matrices_s[[n]]$SVD))})
# RMSE plot
#dataframe processing
rmse_e1 <- do.call(rbind, ubcf_er_s)
rmse_e2 <- do.call(rbind, svd_er_s)
rmse_tbl <- data.frame(cbind(rmse_e1, rmse_e2))
rmse_tbl <- rmse_tbl[,c(1,4)]
rmse_tbl[,3] <- vec_s
rmse_tbl <- rmse_tbl[,c(3,1,2)]
colnames(rmse_tbl) <- c("Perc", "UBCF", "SVD")
rmse_long <- gather(rmse_tbl, variable, value, -Perc)
#inclusion of values calculated previously with no serendipity ratings
rmse_1 <- data.frame("Perc"=vec_s, "UBCF"= ubcf_er[[1]], "SVD"=svd_er[[1]])
rmse_long1 <- gather(rmse_1, variable, value, -Perc)
#plot
ggplot(data = rmse_long, aes(x = Perc, y = value, fill = variable)) +
  geom_col(position = position_dodge()) + ggtitle("RMSE", subtitle = "Dots represent no serendipity ratings") + xlab("Serendipity inclusion in %") + ylab("RMSE") + geom_point(data = rmse_long1, aes(x = Perc, y = value, fill = variable))

3 Changes in Accuracy After Incorporating Serendipity Data (Deliverable 3)

Back To Top

As can be seen in the chart above, inclusion of the serendipity dataset into the training dataset does reduce the error as measured by RMSE in the majority of the runs. Overall, SVD method is still the one producing the lower errors. Optimal % of inclusion seems to be at 100%.

Next we will calculate the AUC using the same methodology as explained above and compare it with the dataset with no serendipity ratings.

#AUC calculation
#UBCF
x_s <- lapply(vec_size, function(n){as.vector(avg_matrices_s[[n]]$UBCF_cos[,8])})
y_s <- lapply(vec_size, function(n){as.vector(avg_matrices_s[[n]]$UBCF_cos[,7])})
#normalization
norm_x_s <- lapply(vec_size, function(n){(x_s[[n]]-min(x_s[[n]]))/(max(x_s[[n]])-min(x_s[[n]]))})
norm_y_s <- lapply(vec_size, function(n){(y_s[[n]]-min(y_s[[n]]))/(max(y_s[[n]])-min(y_s[[n]]))})
#AUC calculation using Trapezoid Rule Numerical Integration
auc_ubcf_s <- lapply(vec_size, function(n){round(trapz(norm_x_s[[n]],norm_y_s[[n]]),4)})
#SVD
z_s <- lapply(vec_size, function(n){as.vector(avg_matrices_s[[n]]$SVD[,8])})
w_s <- lapply(vec_size, function(n){as.vector(avg_matrices_s[[n]]$SVD[,7])})
#normalization
norm_z_s <- lapply(vec_size, function(n){(z_s[[n]]-min(z_s[[n]]))/(max(z_s[[n]])-min(z_s[[n]]))})
norm_w_s <- lapply(vec_size, function(n){(w_s[[n]]-min(w_s[[n]]))/(max(w_s[[n]])-min(w_s[[n]]))})
#AUC calculation using Trapezoid Rule Numerical Integration
auc_svd_s <- lapply(vec_size, function(n){round(trapz(norm_z_s[[n]],norm_w_s[[n]]),4)})
#AUC plot
#dataframe processing
auc_tbl1 <- do.call(rbind, auc_ubcf_s)
auc_tbl2 <- do.call(rbind, auc_svd_s)
auc_tbl <- data.frame(cbind(auc_tbl1,auc_tbl2))
auc_tbl[,3] <- vec_s
auc_tbl <- auc_tbl[,c(3,1,2)]
colnames(auc_tbl) <- c("Perc","UBCF","SVD")
auc_long <- gather(auc_tbl, variable, value, -Perc)
#inclusion of values calculated previously with no serendipity ratings
auc_1 <- data.frame("Perc"=vec_s,"UBCF"= auc_ubcf, "SVD"=auc_svd)
auc_long1 <- gather(auc_1, variable,value, -Perc)
#plot
ggplot(data = auc_long, aes(x = Perc, y = value, fill = variable)) +
  geom_col(position = position_dodge()) + ggtitle("AUC", subtitle = "Dots represent no serendipity ratings ") + xlab("Serendipity inclusion in %") + ylab("AUC") + geom_point(data=auc_long1, aes(x = Perc, y = value, fill = variable))

As can be seen in the chart above, inclusion of the serendipity dataset into the training dataset does increase the AUC in all runs. Overall, SVD method produces the higher AUC. Optimal % of inclusion seems to be at 80%.

4 Additional Experiments and Metrics - Online Evaluation (Deliverable 4)

Back To Top

Online evaluation refers to creating mechanisms that respond to ongoing activity on a web site and then measuring the accuracy of recommendations based off of these mechanisms. For example, a site could experiment with different changes in the recommender system algorithm and assess the accuracy of the change based on Click-Through Rate (CTR). Thus, the determination of how accurate the recommendations are would be based on user interaction with recommended items, how often recommended items were viewed, etc.

To create a reasonable online evaluation environment, an engine must be created to split user traffic randomly into different experimental tracks, and then follow user activity from those groups following the experiment. Some potential experiments could include:

  • Altering the presentation of recommendations and seeing if it changes CTR or patterns
  • Altering the recommendation algorithm in a variety of ways:
    • Incorporate an entropy measure
    • Give more weight to more recent ratings to address the user-recommender lifecycle
    • Try recommending baskets of items instead of individual items
    • Penalize or remove recommendations that could be very wrong, which reduces trust
  • Try arranging recommendations by their cost to see if that changes response
  • Try multi-dimensional ratings (recommend movies with same actor, same genre, etc.)

5 Conclusion

Back To Top

Via the four deliverables we completed per the assignment, we demonstrated that prediction errors can be decreased and recommendation performance improved by utilizing incorporating a serendipity dataset into the recommender system.

6 References

Back To Top