In our previous assignment, we studied Matrix Factorization Methods. In this assignment, we will practice accuracy comparison methods and implement a user experience goal such as increased serendipity, novelty or diversity.
For this assignment, we chose to use the Serendipity datasets here: https://grouplens.org/datasets/serendipity-2018/. We follow the procedures laid out in Kotkov, Denis et. al. (see references section).
These datasets include 10,000,000 movie ratings. However, to make the dataset size more manageable, the ratings were reduced to include only ratings from the users who were part of the serendipity study (the “answers.csv” file). The ratings_raw dataset below contains 1,446,109 ratings. The dataset was further subsetted to include only movies which had been rated at least 200 times.
Another change we made to the original dataset is to include only six out of eight variations of serendipity, as outlined in the paper, because the two remaining variations are likely to reduce user satisfaction.
Per the assignment, Deliverable 1 asks us to build two recommender systems with our data. We chose to build a UBCF model and an SVD model.
set.seed(137)
# create evaluation scheme
eval_sets <- evaluationScheme(data = ratings1, method = "cross-validation", k = 4, given = 5, goodRating = 3)
# build UBCF model and SVD model
ubcf_rec <- Recommender(getData(eval_sets, "train"), "UBCF", param = list(normalize = "center", method = "cosine"))
svd_rec <- Recommender(getData(eval_sets, "train"), "SVD", param = list(normalize = "center", k = 10))
# Make predictions with each model
ubcf_pred <- predict(ubcf_rec, getData(eval_sets, "known"), type = "ratings")
svd_pred <- predict(svd_rec, getData(eval_sets, "known"), type = "ratings")Now that we have built the two models, we will compare the errors and other metrics for each model:
set.seed(137)
# Table showing error calcs for UBCF vs SVD
ubcf_er <- calcPredictionAccuracy(ubcf_pred, getData(eval_sets, "unknown"))
svd_er <- calcPredictionAccuracy(svd_pred, getData(eval_sets, "unknown"))
models_to_evaluate <- list(
UBCF_cos = list(name = "UBCF", param = list(normalize = "center", method = "cosine")),
SVD = list(name = "SVD", param = list(normalize = "center", k=10))
)
n_recommendations <- c(1, 5, seq(10, 100, 10))
list_results <- evaluate(x = eval_sets, method = models_to_evaluate, n = n_recommendations, progress = FALSE)
avg_matrices <- lapply(list_results, avg)
error_tables <- rbind(cbind(Model = rep("UBCF",12), n = rownames(avg_matrices$UBCF_cos), avg_matrices$UBCF_cos), cbind(Model = rep("SVD",12), n = rownames(avg_matrices$UBCF_cos), avg_matrices$SVD))
error_tables[,3:10] <- round(as.numeric(error_tables[,3:10]), 6)
kable(error_tables) %>% kable_styling()| Model | n | TP | FP | FN | TN | precision | recall | TPR | FPR | |
|---|---|---|---|---|---|---|---|---|---|---|
| 1 | UBCF | 1 | 0.481405 | 0.518595 | 83.938017 | 103.061983 | 0.481405 | 0.005899 | 0.005899 | 0.004894 |
| 5 | UBCF | 5 | 2.233471 | 2.766529 | 82.18595 | 100.81405 | 0.446694 | 0.026814 | 0.026814 | 0.026978 |
| 10 | UBCF | 10 | 4.64876 | 5.35124 | 79.770661 | 98.229339 | 0.464876 | 0.055694 | 0.055694 | 0.051999 |
| 20 | UBCF | 20 | 9.363636 | 10.636364 | 75.055785 | 92.944215 | 0.468182 | 0.112047 | 0.112047 | 0.103521 |
| 30 | UBCF | 30 | 14.06405 | 15.93595 | 70.355372 | 87.644628 | 0.468802 | 0.168983 | 0.168983 | 0.153533 |
| 40 | UBCF | 40 | 18.745868 | 21.254132 | 65.673554 | 82.326446 | 0.468647 | 0.224892 | 0.224892 | 0.203866 |
| 50 | UBCF | 50 | 23.320248 | 26.677686 | 61.099174 | 76.902893 | 0.466423 | 0.279497 | 0.279497 | 0.255505 |
| 60 | UBCF | 60 | 27.88843 | 32.088843 | 56.530992 | 71.491736 | 0.46497 | 0.333404 | 0.333404 | 0.307086 |
| 70 | UBCF | 70 | 32.396694 | 37.559917 | 52.022727 | 66.020661 | 0.463076 | 0.387077 | 0.387077 | 0.360026 |
| 80 | UBCF | 80 | 36.950413 | 42.985537 | 47.469008 | 60.595041 | 0.462223 | 0.441373 | 0.441373 | 0.411432 |
| 90 | UBCF | 90 | 41.52686 | 48.38843 | 42.892562 | 55.192149 | 0.461813 | 0.496738 | 0.496738 | 0.464104 |
| 100 | UBCF | 100 | 46.202479 | 53.67562 | 38.216942 | 49.904959 | 0.462563 | 0.552625 | 0.552625 | 0.514903 |
| 1 | SVD | 1 | 0.57438 | 0.42562 | 83.845041 | 103.154959 | 0.57438 | 0.007105 | 0.007105 | 0.003881 |
| 5 | SVD | 5 | 2.752066 | 2.247934 | 81.667355 | 101.332645 | 0.550413 | 0.032786 | 0.032786 | 0.020003 |
| 10 | SVD | 10 | 5.27686 | 4.72314 | 79.142562 | 98.857438 | 0.527686 | 0.063001 | 0.063001 | 0.043305 |
| 20 | SVD | 20 | 10.014463 | 9.985537 | 74.404959 | 93.595041 | 0.500723 | 0.119513 | 0.119513 | 0.093821 |
| 30 | SVD | 30 | 14.737603 | 15.262397 | 69.681818 | 88.318182 | 0.491253 | 0.175855 | 0.175855 | 0.144 |
| 40 | SVD | 40 | 19.371901 | 20.628099 | 65.047521 | 82.952479 | 0.484298 | 0.230383 | 0.230383 | 0.195642 |
| 50 | SVD | 50 | 23.892562 | 26.107438 | 60.52686 | 77.47314 | 0.477851 | 0.282765 | 0.282765 | 0.247476 |
| 60 | SVD | 60 | 28.423554 | 31.576446 | 55.995868 | 72.004132 | 0.473726 | 0.33666 | 0.33666 | 0.300123 |
| 70 | SVD | 70 | 32.88843 | 37.11157 | 51.530992 | 66.469008 | 0.469835 | 0.388759 | 0.388759 | 0.352457 |
| 80 | SVD | 80 | 37.10124 | 42.89876 | 47.318182 | 60.681818 | 0.463765 | 0.437586 | 0.437586 | 0.408199 |
| 90 | SVD | 90 | 41.46281 | 48.53719 | 42.956612 | 55.043388 | 0.460698 | 0.489223 | 0.489223 | 0.462759 |
| 100 | SVD | 100 | 45.768595 | 54.231405 | 38.650826 | 49.349174 | 0.457686 | 0.539184 | 0.539184 | 0.516696 |
We can see that the UBCF model is the more accurate of the two. Next, we look at the RMSE, MSE and MAE.
# RMSE, MSE and MAE
k_Method <- c("UBCF-Cosine", "SVD")
k_table_p <- data.frame(rbind(ubcf_er, svd_er))
rownames(k_table_p) <- k_Method
k_table_p <- k_table_p[order(k_table_p$RMSE ),]
kable(k_table_p) %>% kable_styling()| RMSE | MSE | MAE | |
|---|---|---|---|
| SVD | 0.9355845 | 0.8753183 | 0.6634779 |
| UBCF-Cosine | 0.9581187 | 0.9179914 | 0.6927214 |
Using RMSE, MSE and MAE, we can see that SVD is more accurate.
In addition, we created ROC Curve and Precision-Recall Plots:
# Precision-Recall plot
plot(list_results, "prec/rec", annotate = 1, legend = "bottomright")
title("Precision-Recall")Given that the ROC curves are very similar, we will calculate the area under the curve (AUC) to see which is better (visually, the curve seems to favor the UBCF model). AUC is calculated as the area formed by the TPR and FPR coordinates. To calculate the area, we will do the following procedure:
1) Normalize data, so that X and Y axis should be in unity.
2) Use Trapezoidal method to calculate AUC.
#AUC calculation
#UBCF
x <- as.vector(avg_matrices$UBCF_cos[,8])
y <- as.vector(avg_matrices$UBCF_cos[,7])
#normalization
norm_x <- (x-min(x)) / (max(x)-min(x))
norm_y <- (y-min(y)) / (max(y)-min(y))
#AUC calculation using Trapezoid Rule Numerical Integration
auc_ubcf <- trapz(norm_x, norm_y)
#SVD
z <- as.vector(avg_matrices$SVD[,8])
w <- as.vector(avg_matrices$SVD[,7])
#normalization
norm_z <- (z-min(z)) / (max(z)-min(z))
norm_w <- (w-min(w)) / (max(w)-min(w))
#AUC calculation using Trapezoid Rule Numerical Integration
auc_svd <- trapz(norm_z, norm_w)
# Table comparing AUCs
k_Method <- c("UBCF-Cosine", "SVD")
k_table_a <- data.frame(rbind(cbind(AUC=c(auc_ubcf, auc_svd))))
rownames(k_table_a) <- k_Method
kable(k_table_a) %>% kable_styling() | AUC | |
|---|---|
| UBCF-Cosine | 0.5028067 |
| SVD | 0.5311904 |
As seen above, SVD AUC is slightly higher than UBCF’s.
We use the Serendipity dataset, which seems to be the only publicly available dataset, which contains user feedback regarding serendipity on movies.
Our methodology is to include ramdonly sampled movies that contain seredenpity-related ratings into the training dataset, and then measure the impact of the inclusion. To avoid increasing the size of the training dataset, we will reduce its size proportionally to the amount of movies included.
We will measure the impact of the inclusion by varying the number of seredenpity movies into the training set. This will be done via a for loop from 10 to 100%.
set.seed(137)
#loop
vec_s <- seq(10,100,10)
vec_size <- seq(1,10,1)
t_size <- length(t_redux$user)
#using lapply functions to generate all results
# sampling serendipity file
s_sample <- lapply(vec_s, function(n){sample_frac(s,n/100)})
names(s_sample) <- paste0("s_sample", vec_s)
# sample size
s_sample_size <- lapply(vec_size, function(n){length(s_sample[[n]][,1])})
# reducing size of original set through sampling
sample_red <- lapply(vec_size, function(n){1-s_sample_size[[n]] / t_size})
t_sample <- lapply(vec_size, function(n){sample_frac(t_redux, sample_red[[n]])})
#t_sample<-t_redux
#merging data frames
t_s <- lapply(vec_size, function(n){rbind.data.frame(s_sample[[n]], t_sample[[n]])})
# coercing into realRatingMatrix
ratings_s <- lapply(vec_size, function(n){as(t_s[[n]], "realRatingMatrix")})
# create evaluation scheme
eval_sets_s <- lapply(vec_size, function(n){evaluationScheme(data = ratings_s[[n]], method = "cross-validation", k = 4, given = 5, goodRating =3)})
# build UBCF model and SVD model
ubcf_rec_s <- lapply(vec_size, function(n){Recommender(getData(eval_sets_s[[n]], "train"), "UBCF", param = list(normalize = "center", method = "cosine"))})
svd_rec_s <- lapply(vec_size, function(n){Recommender(getData(eval_sets_s[[n]], "train"), "SVD", param = list(normalize = "center", k=10))})
# Make predictions with each model
ubcf_pred_s <- lapply(vec_size, function(n){predict(ubcf_rec_s[[n]], getData(eval_sets_s[[n]], "known"), type = "ratings")})
svd_pred_s <- lapply(vec_size, function(n){predict(svd_rec_s[[n]], getData(eval_sets_s[[n]], "known"), type = "ratings")})
# Table showing error calcs for UBCF vs SVD
ubcf_er_s <- lapply(vec_size, function(n){calcPredictionAccuracy(ubcf_pred_s[[n]], getData(eval_sets_s[[n]], "unknown"))})
svd_er_s <-lapply(vec_size, function(n){calcPredictionAccuracy(svd_pred_s[[n]], getData(eval_sets_s[[n]], "unknown"))})
# Model evaluation
models_to_evaluate <- list(
UBCF_cos = list(name = "UBCF", param = list(normalize = "center", method = "cosine")),
SVD = list(name = "SVD", param = list(normalize = "center", k=10))
)
n_recommendations <- c(1, 5, seq(10, 100, 10))
list_results_s <- lapply(vec_size, function(n){evaluate(x = eval_sets_s[[n]], method = models_to_evaluate, n = n_recommendations, progress=FALSE)})
avg_matrices_s <- lapply(vec_size, function(n){lapply(list_results_s[[n]], avg)})
# error tables TP/FP/etc
error_tables_s <- lapply(vec_size, function(n){rbind(cbind(Model = rep("UBCF",12), n = rownames(avg_matrices_s[[n]]$UBCF_cos), avg_matrices_s[[n]]$UBCF_cos), cbind(Model = rep("SVD",12), n = rownames(avg_matrices_s[[n]]$UBCF_cos), avg_matrices_s[[n]]$SVD))})
# RMSE plot
#dataframe processing
rmse_e1 <- do.call(rbind, ubcf_er_s)
rmse_e2 <- do.call(rbind, svd_er_s)
rmse_tbl <- data.frame(cbind(rmse_e1, rmse_e2))
rmse_tbl <- rmse_tbl[,c(1,4)]
rmse_tbl[,3] <- vec_s
rmse_tbl <- rmse_tbl[,c(3,1,2)]
colnames(rmse_tbl) <- c("Perc", "UBCF", "SVD")
rmse_long <- gather(rmse_tbl, variable, value, -Perc)
#inclusion of values calculated previously with no serendipity ratings
rmse_1 <- data.frame("Perc"=vec_s, "UBCF"= ubcf_er[[1]], "SVD"=svd_er[[1]])
rmse_long1 <- gather(rmse_1, variable, value, -Perc)
#plot
ggplot(data = rmse_long, aes(x = Perc, y = value, fill = variable)) +
geom_col(position = position_dodge()) + ggtitle("RMSE", subtitle = "Dots represent no serendipity ratings") + xlab("Serendipity inclusion in %") + ylab("RMSE") + geom_point(data = rmse_long1, aes(x = Perc, y = value, fill = variable))As can be seen in the chart above, inclusion of the serendipity dataset into the training dataset does reduce the error as measured by RMSE in the majority of the runs. Overall, SVD method is still the one producing the lower errors. Optimal % of inclusion seems to be at 100%.
Next we will calculate the AUC using the same methodology as explained above and compare it with the dataset with no serendipity ratings.
#AUC calculation
#UBCF
x_s <- lapply(vec_size, function(n){as.vector(avg_matrices_s[[n]]$UBCF_cos[,8])})
y_s <- lapply(vec_size, function(n){as.vector(avg_matrices_s[[n]]$UBCF_cos[,7])})
#normalization
norm_x_s <- lapply(vec_size, function(n){(x_s[[n]]-min(x_s[[n]]))/(max(x_s[[n]])-min(x_s[[n]]))})
norm_y_s <- lapply(vec_size, function(n){(y_s[[n]]-min(y_s[[n]]))/(max(y_s[[n]])-min(y_s[[n]]))})
#AUC calculation using Trapezoid Rule Numerical Integration
auc_ubcf_s <- lapply(vec_size, function(n){round(trapz(norm_x_s[[n]],norm_y_s[[n]]),4)})
#SVD
z_s <- lapply(vec_size, function(n){as.vector(avg_matrices_s[[n]]$SVD[,8])})
w_s <- lapply(vec_size, function(n){as.vector(avg_matrices_s[[n]]$SVD[,7])})
#normalization
norm_z_s <- lapply(vec_size, function(n){(z_s[[n]]-min(z_s[[n]]))/(max(z_s[[n]])-min(z_s[[n]]))})
norm_w_s <- lapply(vec_size, function(n){(w_s[[n]]-min(w_s[[n]]))/(max(w_s[[n]])-min(w_s[[n]]))})
#AUC calculation using Trapezoid Rule Numerical Integration
auc_svd_s <- lapply(vec_size, function(n){round(trapz(norm_z_s[[n]],norm_w_s[[n]]),4)})
#AUC plot
#dataframe processing
auc_tbl1 <- do.call(rbind, auc_ubcf_s)
auc_tbl2 <- do.call(rbind, auc_svd_s)
auc_tbl <- data.frame(cbind(auc_tbl1,auc_tbl2))
auc_tbl[,3] <- vec_s
auc_tbl <- auc_tbl[,c(3,1,2)]
colnames(auc_tbl) <- c("Perc","UBCF","SVD")
auc_long <- gather(auc_tbl, variable, value, -Perc)
#inclusion of values calculated previously with no serendipity ratings
auc_1 <- data.frame("Perc"=vec_s,"UBCF"= auc_ubcf, "SVD"=auc_svd)
auc_long1 <- gather(auc_1, variable,value, -Perc)
#plot
ggplot(data = auc_long, aes(x = Perc, y = value, fill = variable)) +
geom_col(position = position_dodge()) + ggtitle("AUC", subtitle = "Dots represent no serendipity ratings ") + xlab("Serendipity inclusion in %") + ylab("AUC") + geom_point(data=auc_long1, aes(x = Perc, y = value, fill = variable))As can be seen in the chart above, inclusion of the serendipity dataset into the training dataset does increase the AUC in all runs. Overall, SVD method produces the higher AUC. Optimal % of inclusion seems to be at 80%.
Online evaluation refers to creating mechanisms that respond to ongoing activity on a web site and then measuring the accuracy of recommendations based off of these mechanisms. For example, a site could experiment with different changes in the recommender system algorithm and assess the accuracy of the change based on Click-Through Rate (CTR). Thus, the determination of how accurate the recommendations are would be based on user interaction with recommended items, how often recommended items were viewed, etc.
To create a reasonable online evaluation environment, an engine must be created to split user traffic randomly into different experimental tracks, and then follow user activity from those groups following the experiment. Some potential experiments could include:
Via the four deliverables we completed per the assignment, we demonstrated that prediction errors can be decreased and recommendation performance improved by utilizing incorporating a serendipity dataset into the recommender system.