1 Accuracy Comparisons (Deliverable 1)

Back To Top

In our previous assignment, we studied Matrix Factorization Methods. In this assignment, we will practice accuracy comparison methods and implement a user experience goal such as increased serendipity, novelty or diversity.

For this assignment, we chose to use the Serendipity datasets here: https://grouplens.org/datasets/serendipity-2018/. We follow the procedures laid out in Kotkov, Denis et. al. (see references section).

These datasets include 10,000,000 movie ratings. However, to make the dataset size more manageable, the ratings were reduced to include only ratings from the users who were part of the serendipity study (the “answers.csv” file). The ratings_raw dataset below contains 1,446,109 ratings. The dataset was further subsetted to include only movies which had been rated at least 200 times.

Another change we made to the original dataset is to include only six out of eight variations of serendipity, as outlined in the paper, because the two remaining variations are likely to reduce user satisfaction.

1.2 Compare the UBCF and SVD Recommender Models

Back To Top

Now that we have built the two models, we will compare the errors and other metrics for each model:

Model n TP FP FN TN precision recall TPR FPR
1 UBCF 1 0.481405 0.518595 83.938017 103.061983 0.481405 0.005899 0.005899 0.004894
5 UBCF 5 2.233471 2.766529 82.18595 100.81405 0.446694 0.026814 0.026814 0.026978
10 UBCF 10 4.64876 5.35124 79.770661 98.229339 0.464876 0.055694 0.055694 0.051999
20 UBCF 20 9.363636 10.636364 75.055785 92.944215 0.468182 0.112047 0.112047 0.103521
30 UBCF 30 14.06405 15.93595 70.355372 87.644628 0.468802 0.168983 0.168983 0.153533
40 UBCF 40 18.745868 21.254132 65.673554 82.326446 0.468647 0.224892 0.224892 0.203866
50 UBCF 50 23.320248 26.677686 61.099174 76.902893 0.466423 0.279497 0.279497 0.255505
60 UBCF 60 27.88843 32.088843 56.530992 71.491736 0.46497 0.333404 0.333404 0.307086
70 UBCF 70 32.396694 37.559917 52.022727 66.020661 0.463076 0.387077 0.387077 0.360026
80 UBCF 80 36.950413 42.985537 47.469008 60.595041 0.462223 0.441373 0.441373 0.411432
90 UBCF 90 41.52686 48.38843 42.892562 55.192149 0.461813 0.496738 0.496738 0.464104
100 UBCF 100 46.202479 53.67562 38.216942 49.904959 0.462563 0.552625 0.552625 0.514903
1 SVD 1 0.57438 0.42562 83.845041 103.154959 0.57438 0.007105 0.007105 0.003881
5 SVD 5 2.752066 2.247934 81.667355 101.332645 0.550413 0.032786 0.032786 0.020003
10 SVD 10 5.27686 4.72314 79.142562 98.857438 0.527686 0.063001 0.063001 0.043305
20 SVD 20 10.014463 9.985537 74.404959 93.595041 0.500723 0.119513 0.119513 0.093821
30 SVD 30 14.737603 15.262397 69.681818 88.318182 0.491253 0.175855 0.175855 0.144
40 SVD 40 19.371901 20.628099 65.047521 82.952479 0.484298 0.230383 0.230383 0.195642
50 SVD 50 23.892562 26.107438 60.52686 77.47314 0.477851 0.282765 0.282765 0.247476
60 SVD 60 28.423554 31.576446 55.995868 72.004132 0.473726 0.33666 0.33666 0.300123
70 SVD 70 32.88843 37.11157 51.530992 66.469008 0.469835 0.388759 0.388759 0.352457
80 SVD 80 37.10124 42.89876 47.318182 60.681818 0.463765 0.437586 0.437586 0.408199
90 SVD 90 41.46281 48.53719 42.956612 55.043388 0.460698 0.489223 0.489223 0.462759
100 SVD 100 45.768595 54.231405 38.650826 49.349174 0.457686 0.539184 0.539184 0.516696

We can see that the UBCF model is the more accurate of the two. Next, we look at the RMSE, MSE and MAE.

RMSE MSE MAE
SVD 0.9355845 0.8753183 0.6634779
UBCF-Cosine 0.9581187 0.9179914 0.6927214

Using RMSE, MSE and MAE, we can see that SVD is more accurate.

In addition, we created ROC Curve and Precision-Recall Plots:

Given that the ROC curves are very similar, we will calculate the area under the curve (AUC) to see which is better (visually, the curve seems to favor the UBCF model). AUC is calculated as the area formed by the TPR and FPR coordinates. To calculate the area, we will do the following procedure:
1) Normalize data, so that X and Y axis should be in unity.
2) Use Trapezoidal method to calculate AUC.

AUC
UBCF-Cosine 0.5028067
SVD 0.5311904

As seen above, SVD AUC is slightly higher than UBCF’s.


2 Implement Support for Serendipity (Deliverable 2)

Back To Top

We use the Serendipity dataset, which seems to be the only publicly available dataset, which contains user feedback regarding serendipity on movies.

Our methodology is to include ramdonly sampled movies that contain seredenpity-related ratings into the training dataset, and then measure the impact of the inclusion. To avoid increasing the size of the training dataset, we will reduce its size proportionally to the amount of movies included.

We will measure the impact of the inclusion by varying the number of seredenpity movies into the training set. This will be done via a for loop from 10 to 100%.

set.seed(137)
#loop
vec_s <- seq(10,100,10)
vec_size <- seq(1,10,1)
t_size <- length(t_redux$user)
#using lapply functions to generate all results
# sampling serendipity file
s_sample <- lapply(vec_s, function(n){sample_frac(s,n/100)})
names(s_sample) <- paste0("s_sample", vec_s)
# sample size
s_sample_size <- lapply(vec_size, function(n){length(s_sample[[n]][,1])})
# reducing size of original set through sampling
sample_red <- lapply(vec_size, function(n){1-s_sample_size[[n]] / t_size})
t_sample <- lapply(vec_size, function(n){sample_frac(t_redux, sample_red[[n]])})
#t_sample<-t_redux
#merging data frames
t_s <- lapply(vec_size, function(n){rbind.data.frame(s_sample[[n]], t_sample[[n]])}) 
# coercing into realRatingMatrix
ratings_s <- lapply(vec_size, function(n){as(t_s[[n]], "realRatingMatrix")}) 
# create evaluation scheme
eval_sets_s <- lapply(vec_size, function(n){evaluationScheme(data = ratings_s[[n]], method = "cross-validation", k = 4, given = 5, goodRating =3)})
# build UBCF model and SVD model
ubcf_rec_s <- lapply(vec_size, function(n){Recommender(getData(eval_sets_s[[n]], "train"), "UBCF", param = list(normalize = "center", method = "cosine"))})
svd_rec_s <- lapply(vec_size, function(n){Recommender(getData(eval_sets_s[[n]], "train"), "SVD", param = list(normalize = "center", k=10))})
# Make predictions with each model
ubcf_pred_s <- lapply(vec_size, function(n){predict(ubcf_rec_s[[n]], getData(eval_sets_s[[n]], "known"), type = "ratings")})
svd_pred_s <- lapply(vec_size, function(n){predict(svd_rec_s[[n]], getData(eval_sets_s[[n]], "known"), type = "ratings")})
# Table showing error calcs for UBCF vs SVD
ubcf_er_s <- lapply(vec_size, function(n){calcPredictionAccuracy(ubcf_pred_s[[n]], getData(eval_sets_s[[n]], "unknown"))})
svd_er_s <-lapply(vec_size, function(n){calcPredictionAccuracy(svd_pred_s[[n]], getData(eval_sets_s[[n]], "unknown"))})
# Model evaluation
models_to_evaluate <- list(
    UBCF_cos = list(name = "UBCF", param = list(normalize = "center", method = "cosine")), 
    SVD = list(name = "SVD", param = list(normalize = "center", k=10))
  )
n_recommendations <- c(1, 5, seq(10, 100, 10))
list_results_s <- lapply(vec_size, function(n){evaluate(x = eval_sets_s[[n]], method = models_to_evaluate, n = n_recommendations, progress=FALSE)})
avg_matrices_s <- lapply(vec_size, function(n){lapply(list_results_s[[n]], avg)})
# error tables TP/FP/etc
error_tables_s <- lapply(vec_size, function(n){rbind(cbind(Model = rep("UBCF",12), n = rownames(avg_matrices_s[[n]]$UBCF_cos), avg_matrices_s[[n]]$UBCF_cos), cbind(Model = rep("SVD",12), n = rownames(avg_matrices_s[[n]]$UBCF_cos), avg_matrices_s[[n]]$SVD))})
# RMSE plot
#dataframe processing
rmse_e1 <- do.call(rbind, ubcf_er_s)
rmse_e2 <- do.call(rbind, svd_er_s)
rmse_tbl <- data.frame(cbind(rmse_e1, rmse_e2))
rmse_tbl <- rmse_tbl[,c(1,4)]
rmse_tbl[,3] <- vec_s
rmse_tbl <- rmse_tbl[,c(3,1,2)]
colnames(rmse_tbl) <- c("Perc", "UBCF", "SVD")
rmse_long <- gather(rmse_tbl, variable, value, -Perc)
#inclusion of values calculated previously with no serendipity ratings
rmse_1 <- data.frame("Perc"=vec_s, "UBCF"= ubcf_er[[1]], "SVD"=svd_er[[1]])
rmse_long1 <- gather(rmse_1, variable, value, -Perc)
#plot
ggplot(data = rmse_long, aes(x = Perc, y = value, fill = variable)) +
  geom_col(position = position_dodge()) + ggtitle("RMSE", subtitle = "Dots represent no serendipity ratings") + xlab("Serendipity inclusion in %") + ylab("RMSE") + geom_point(data = rmse_long1, aes(x = Perc, y = value, fill = variable))

3 Changes in Accuracy After Incorporating Serendipity Data (Deliverable 3)

Back To Top

As can be seen in the chart above, inclusion of the serendipity dataset into the training dataset does reduce the error as measured by RMSE in the majority of the runs. Overall, SVD method is still the one producing the lower errors. Optimal % of inclusion seems to be at 100%.

Next we will calculate the AUC using the same methodology as explained above and compare it with the dataset with no serendipity ratings.

#AUC calculation
#UBCF
x_s <- lapply(vec_size, function(n){as.vector(avg_matrices_s[[n]]$UBCF_cos[,8])})
y_s <- lapply(vec_size, function(n){as.vector(avg_matrices_s[[n]]$UBCF_cos[,7])})
#normalization
norm_x_s <- lapply(vec_size, function(n){(x_s[[n]]-min(x_s[[n]]))/(max(x_s[[n]])-min(x_s[[n]]))})
norm_y_s <- lapply(vec_size, function(n){(y_s[[n]]-min(y_s[[n]]))/(max(y_s[[n]])-min(y_s[[n]]))})
#AUC calculation using Trapezoid Rule Numerical Integration
auc_ubcf_s <- lapply(vec_size, function(n){round(trapz(norm_x_s[[n]],norm_y_s[[n]]),4)})
#SVD
z_s <- lapply(vec_size, function(n){as.vector(avg_matrices_s[[n]]$SVD[,8])})
w_s <- lapply(vec_size, function(n){as.vector(avg_matrices_s[[n]]$SVD[,7])})
#normalization
norm_z_s <- lapply(vec_size, function(n){(z_s[[n]]-min(z_s[[n]]))/(max(z_s[[n]])-min(z_s[[n]]))})
norm_w_s <- lapply(vec_size, function(n){(w_s[[n]]-min(w_s[[n]]))/(max(w_s[[n]])-min(w_s[[n]]))})
#AUC calculation using Trapezoid Rule Numerical Integration
auc_svd_s <- lapply(vec_size, function(n){round(trapz(norm_z_s[[n]],norm_w_s[[n]]),4)})
#AUC plot
#dataframe processing
auc_tbl1 <- do.call(rbind, auc_ubcf_s)
auc_tbl2 <- do.call(rbind, auc_svd_s)
auc_tbl <- data.frame(cbind(auc_tbl1,auc_tbl2))
auc_tbl[,3] <- vec_s
auc_tbl <- auc_tbl[,c(3,1,2)]
colnames(auc_tbl) <- c("Perc","UBCF","SVD")
auc_long <- gather(auc_tbl, variable, value, -Perc)
#inclusion of values calculated previously with no serendipity ratings
auc_1 <- data.frame("Perc"=vec_s,"UBCF"= auc_ubcf, "SVD"=auc_svd)
auc_long1 <- gather(auc_1, variable,value, -Perc)
#plot
ggplot(data = auc_long, aes(x = Perc, y = value, fill = variable)) +
  geom_col(position = position_dodge()) + ggtitle("AUC", subtitle = "Dots represent no serendipity ratings ") + xlab("Serendipity inclusion in %") + ylab("AUC") + geom_point(data=auc_long1, aes(x = Perc, y = value, fill = variable))

As can be seen in the chart above, inclusion of the serendipity dataset into the training dataset does increase the AUC in all runs. Overall, SVD method produces the higher AUC. Optimal % of inclusion seems to be at 80%.

4 Additional Experiments and Metrics - Online Evaluation (Deliverable 4)

Back To Top

Online evaluation refers to creating mechanisms that respond to ongoing activity on a web site and then measuring the accuracy of recommendations based off of these mechanisms. For example, a site could experiment with different changes in the recommender system algorithm and assess the accuracy of the change based on Click-Through Rate (CTR). Thus, the determination of how accurate the recommendations are would be based on user interaction with recommended items, how often recommended items were viewed, etc.

To create a reasonable online evaluation environment, an engine must be created to split user traffic randomly into different experimental tracks, and then follow user activity from those groups following the experiment. Some potential experiments could include:

  • Altering the presentation of recommendations and seeing if it changes CTR or patterns
  • Altering the recommendation algorithm in a variety of ways:
    • Incorporate an entropy measure
    • Give more weight to more recent ratings to address the user-recommender lifecycle
    • Try recommending baskets of items instead of individual items
    • Penalize or remove recommendations that could be very wrong, which reduces trust
  • Try arranging recommendations by their cost to see if that changes response
  • Try multi-dimensional ratings (recommend movies with same actor, same genre, etc.)

5 Conclusion

Back To Top

Via the four deliverables we completed per the assignment, we demonstrated that prediction errors can be decreased and recommendation performance improved by utilizing incorporating a serendipity dataset into the recommender system.

6 References

Back To Top