1 Accuracy Comparisons (Deliverable 1)

In our previous assignment, we studied Matrix Factorization Methods. In this assignment, we will practice accuracy comparison methods and implement a user experience goal such as increased serendipity, novelty or diversity.

For this assignment, we chose to use the Serendipity datasets here: https://grouplens.org/datasets/serendipity-2018/. We follow the procedures laid out in Kotkov, Denis et. al. (see references section).

These datasets include 10,000,000 movie ratings. However, to make the dataset size more manageable, the ratings were reduced to include only ratings from the users who were part of the serendipity study (the “answers.csv” file). The ratings_raw dataset below contains 1,446,109 ratings. The dataset was further subsetted to include only movies which had been rated at least 200 times.

Another change we made to the original dataset is to include only six out of eight variations of serendipity, as outlined in the paper, because the two remaining variations are likely to reduce user satisfaction.

1.1 Create UBCF and SVD Recommender Models

Per the assignment, Deliverable 1 asks us to build two recommender systems with our data. We chose to build a UBCF model and an SVD model.

set.seed(137)
# create evaluation scheme
eval_sets <- evaluationScheme(data = ratings1, method = "cross-validation", k = 4, given = 5, goodRating = 3)
# build UBCF model and SVD model
ubcf_rec <- Recommender(getData(eval_sets, "train"), "UBCF", param = list(normalize = "center", method = "cosine"))
svd_rec <- Recommender(getData(eval_sets, "train"), "SVD", param = list(normalize = "center", k = 10))
# Make predictions with each model
ubcf_pred <- predict(ubcf_rec, getData(eval_sets, "known"), type = "ratings")
svd_pred <- predict(svd_rec, getData(eval_sets, "known"), type = "ratings")

1.2 Compare the UBCF and SVD Recommender Models

Now that we have built the two models, we will compare the errors and other metrics for each model:

set.seed(137)
# Table showing error calcs for UBCF vs SVD
ubcf_er <- calcPredictionAccuracy(ubcf_pred, getData(eval_sets, "unknown"))
svd_er <- calcPredictionAccuracy(svd_pred, getData(eval_sets, "unknown"))
models_to_evaluate <- list(
  UBCF_cos = list(name = "UBCF", param = list(normalize = "center", method = "cosine")), 
  SVD = list(name = "SVD", param = list(normalize = "center", k=10))
)
n_recommendations <- c(1, 5, seq(10, 100, 10))
list_results <- evaluate(x = eval_sets, method = models_to_evaluate, n = n_recommendations, progress = FALSE)
avg_matrices <- lapply(list_results, avg)
error_tables <- rbind(cbind(Model = rep("UBCF",12), n = rownames(avg_matrices$UBCF_cos), avg_matrices$UBCF_cos), cbind(Model = rep("SVD",12), n = rownames(avg_matrices$UBCF_cos), avg_matrices$SVD))
error_tables[,3:10] <- round(as.numeric(error_tables[,3:10]), 6)
kable(error_tables) %>% kable_styling()

	Model	n	TP	FP	FN	TN	precision	recall	TPR	FPR
1	UBCF	1	0.582645	0.417355	83.801653	103.198347	0.582645	0.007284	0.007284	0.0038
5	UBCF	5	2.704545	2.295455	81.679752	101.320248	0.540909	0.033467	0.033467	0.021074
10	UBCF	10	5.204545	4.795455	79.179752	98.820248	0.520455	0.063962	0.063962	0.044963
20	UBCF	20	10.090909	9.909091	74.293388	93.706612	0.504545	0.122341	0.122341	0.092932
30	UBCF	30	14.774793	15.225207	69.609504	88.390496	0.492493	0.178652	0.178652	0.144006
40	UBCF	40	19.42562	20.57438	64.958678	83.041322	0.48564	0.233862	0.233862	0.195065
50	UBCF	50	23.940083	26.059917	60.444215	77.555785	0.478802	0.287173	0.287173	0.248081
60	UBCF	60	28.394628	31.605372	55.989669	72.010331	0.473244	0.339919	0.339919	0.301327
70	UBCF	70	32.77686	37.22314	51.607438	66.392562	0.468241	0.391357	0.391357	0.355273
80	UBCF	80	37.033058	42.966942	47.35124	60.64876	0.462913	0.441143	0.441143	0.4107
90	UBCF	90	41.475207	48.524793	42.909091	55.090909	0.460836	0.493568	0.493568	0.463697
100	UBCF	100	45.873967	54.126033	38.510331	49.489669	0.45874	0.544786	0.544786	0.517538
1	SVD	1	0.555785	0.444215	83.828512	103.171488	0.555785	0.007033	0.007033	0.004082
5	SVD	5	2.642562	2.357438	81.741736	101.258264	0.528512	0.032719	0.032719	0.021693
10	SVD	10	5.202479	4.797521	79.181818	98.818182	0.520248	0.063456	0.063456	0.044894
20	SVD	20	10.022727	9.977273	74.36157	93.63843	0.501136	0.120824	0.120824	0.094298
30	SVD	30	14.774793	15.225207	69.609504	88.390496	0.492493	0.178756	0.178756	0.143861
40	SVD	40	19.464876	20.535124	64.919421	83.080579	0.486622	0.234016	0.234016	0.194335
50	SVD	50	24.028926	25.971074	60.355372	77.644628	0.480579	0.287551	0.287551	0.245875
60	SVD	60	28.561983	31.438017	55.822314	72.177686	0.476033	0.341256	0.341256	0.299004
70	SVD	70	33.061983	36.938017	51.322314	66.677686	0.472314	0.393982	0.393982	0.351225
80	SVD	80	37.404959	42.595041	46.979339	61.020661	0.467562	0.444631	0.444631	0.405569
90	SVD	90	41.714876	48.285124	42.669421	55.330579	0.463499	0.495099	0.495099	0.460876
100	SVD	100	46.095041	53.904959	38.289256	49.710744	0.46095	0.545345	0.545345	0.51389

We can see that the UBCF model is the more accurate of the two. Next, we look at the RMSE, MSE and MAE.

# RMSE, MSE and MAE
k_Method <- c("UBCF-Cosine", "SVD")
k_table_p <- data.frame(rbind(ubcf_er, svd_er))
rownames(k_table_p) <- k_Method
k_table_p <- k_table_p[order(k_table_p$RMSE ),]
kable(k_table_p) %>% kable_styling()

	RMSE	MSE	MAE
UBCF-Cosine	0.9314149	0.8675337	0.6767562
SVD	0.9341935	0.8727175	0.6749346

Using RMSE, MSE and MAE, we can see that SVD is more accurate.

In addition, we created ROC Curve and Precision-Recall Plots:

# ROC Curve plot
plot(list_results, annotate = 1, legend = "topleft")
title("ROC Curve")

# Precision-Recall plot
plot(list_results, "prec/rec", annotate = 1, legend = "bottomright")
title("Precision-Recall")

Given that the ROC curves are very similar, we will calculate the area under the curve (AUC) to see which is better (visually, the curve seems to favor the UBCF model). AUC is calculated as the area formed by the TPR and FPR coordinates. To calculate the area, we will do the following procedure:
1) Normalize data, so that X and Y axis should be in unity.
2) Use Trapezoidal method to calculate AUC.

#AUC calculation
#UBCF
x <- as.vector(avg_matrices$UBCF_cos[,8])
y <- as.vector(avg_matrices$UBCF_cos[,7])
#normalization
norm_x <- (x-min(x)) / (max(x)-min(x))
norm_y <- (y-min(y)) / (max(y)-min(y))
#AUC calculation using Trapezoid Rule Numerical Integration
auc_ubcf <- trapz(norm_x, norm_y)
#SVD
z <- as.vector(avg_matrices$SVD[,8])
w <- as.vector(avg_matrices$SVD[,7])
#normalization
norm_z <- (z-min(z)) / (max(z)-min(z))
norm_w <- (w-min(w)) / (max(w)-min(w))
#AUC calculation using Trapezoid Rule Numerical Integration
auc_svd <- trapz(norm_z, norm_w)
# Table comparing AUCs
k_Method <- c("UBCF-Cosine", "SVD")
k_table_a <- data.frame(rbind(cbind(AUC=c(auc_ubcf, auc_svd))))
rownames(k_table_a) <- k_Method
kable(k_table_a) %>% kable_styling()

	AUC
UBCF-Cosine	0.5304783
SVD	0.5318852

As seen above, SVD AUC is slightly higher than UBCF’s.

2 Implement Support for Serendipity (Deliverable 2)

We use the Serendipity dataset, which seems to be the only publicly available dataset, which contains user feedback regarding serendipity on movies.

Our methodology is to include ramdonly sampled movies that contain seredenpity-related ratings into the training dataset, and then measure the impact of the inclusion. To avoid increasing the size of the training dataset, we will reduce its size proportionally to the amount of movies included.

We will measure the impact of the inclusion by varying the number of seredenpity movies into the training set. This will be done via a for loop from 10 to 100%.

set.seed(137)
#loop
vec_s <- seq(10,100,10)
vec_size <- seq(1,10,1)
t_size <- length(t_redux$user)
#using lapply functions to generate all results
# sampling serendipity file
s_sample <- lapply(vec_s, function(n){sample_frac(s,n/100)})
names(s_sample) <- paste0("s_sample", vec_s)
# sample size
s_sample_size <- lapply(vec_size, function(n){length(s_sample[[n]][,1])})
# reducing size of original set through sampling
sample_red <- lapply(vec_size, function(n){1-s_sample_size[[n]] / t_size})
t_sample <- lapply(vec_size, function(n){sample_frac(t_redux, sample_red[[n]])})
#t_sample<-t_redux
#merging data frames
t_s <- lapply(vec_size, function(n){rbind.data.frame(s_sample[[n]], t_sample[[n]])}) 
# coercing into realRatingMatrix
ratings_s <- lapply(vec_size, function(n){as(t_s[[n]], "realRatingMatrix")}) 
# create evaluation scheme
eval_sets_s <- lapply(vec_size, function(n){evaluationScheme(data = ratings_s[[n]], method = "cross-validation", k = 4, given = 5, goodRating =3)})
# build UBCF model and SVD model
ubcf_rec_s <- lapply(vec_size, function(n){Recommender(getData(eval_sets_s[[n]], "train"), "UBCF", param = list(normalize = "center", method = "cosine"))})
svd_rec_s <- lapply(vec_size, function(n){Recommender(getData(eval_sets_s[[n]], "train"), "SVD", param = list(normalize = "center", k=10))})
# Make predictions with each model
ubcf_pred_s <- lapply(vec_size, function(n){predict(ubcf_rec_s[[n]], getData(eval_sets_s[[n]], "known"), type = "ratings")})
svd_pred_s <- lapply(vec_size, function(n){predict(svd_rec_s[[n]], getData(eval_sets_s[[n]], "known"), type = "ratings")})
# Table showing error calcs for UBCF vs SVD
ubcf_er_s <- lapply(vec_size, function(n){calcPredictionAccuracy(ubcf_pred_s[[n]], getData(eval_sets_s[[n]], "unknown"))})
svd_er_s <-lapply(vec_size, function(n){calcPredictionAccuracy(svd_pred_s[[n]], getData(eval_sets_s[[n]], "unknown"))})
# Model evaluation
models_to_evaluate <- list(
    UBCF_cos = list(name = "UBCF", param = list(normalize = "center", method = "cosine")), 
    SVD = list(name = "SVD", param = list(normalize = "center", k=10))
  )
n_recommendations <- c(1, 5, seq(10, 100, 10))
list_results_s <- lapply(vec_size, function(n){evaluate(x = eval_sets_s[[n]], method = models_to_evaluate, n = n_recommendations, progress=FALSE)})
avg_matrices_s <- lapply(vec_size, function(n){lapply(list_results_s[[n]], avg)})
# error tables TP/FP/etc
error_tables_s <- lapply(vec_size, function(n){rbind(cbind(Model = rep("UBCF",12), n = rownames(avg_matrices_s[[n]]$UBCF_cos), avg_matrices_s[[n]]$UBCF_cos), cbind(Model = rep("SVD",12), n = rownames(avg_matrices_s[[n]]$UBCF_cos), avg_matrices_s[[n]]$SVD))})
# RMSE plot
#dataframe processing
rmse_e1 <- do.call(rbind, ubcf_er_s)
rmse_e2 <- do.call(rbind, svd_er_s)
rmse_tbl <- data.frame(cbind(rmse_e1, rmse_e2))
rmse_tbl <- rmse_tbl[,c(1,4)]
rmse_tbl[,3] <- vec_s
rmse_tbl <- rmse_tbl[,c(3,1,2)]
colnames(rmse_tbl) <- c("Perc", "UBCF", "SVD")
rmse_long <- gather(rmse_tbl, variable, value, -Perc)
#inclusion of values calculated previously with no serendipity ratings
rmse_1 <- data.frame("Perc"=vec_s, "UBCF"= ubcf_er[[1]], "SVD"=svd_er[[1]])
rmse_long1 <- gather(rmse_1, variable, value, -Perc)
#plot
ggplot(data = rmse_long, aes(x = Perc, y = value, fill = variable)) +
  geom_col(position = position_dodge()) + ggtitle("RMSE", subtitle = "Dots represent no serendipity ratings") + xlab("Serendipity inclusion in %") + ylab("RMSE") + geom_point(data = rmse_long1, aes(x = Perc, y = value, fill = variable))

3 Changes in Accuracy After Incorporating Serendipity Data (Deliverable 3)

As can be seen in the chart above, inclusion of the serendipity dataset into the training dataset does reduce the error as measured by RMSE in the majority of the runs. Overall, SVD method is still the one producing the lower errors. Optimal % of inclusion seems to be at 100%.

Next we will calculate the AUC using the same methodology as explained above and compare it with the dataset with no serendipity ratings.

#AUC calculation
#UBCF
x_s <- lapply(vec_size, function(n){as.vector(avg_matrices_s[[n]]$UBCF_cos[,8])})
y_s <- lapply(vec_size, function(n){as.vector(avg_matrices_s[[n]]$UBCF_cos[,7])})
#normalization
norm_x_s <- lapply(vec_size, function(n){(x_s[[n]]-min(x_s[[n]]))/(max(x_s[[n]])-min(x_s[[n]]))})
norm_y_s <- lapply(vec_size, function(n){(y_s[[n]]-min(y_s[[n]]))/(max(y_s[[n]])-min(y_s[[n]]))})
#AUC calculation using Trapezoid Rule Numerical Integration
auc_ubcf_s <- lapply(vec_size, function(n){round(trapz(norm_x_s[[n]],norm_y_s[[n]]),4)})
#SVD
z_s <- lapply(vec_size, function(n){as.vector(avg_matrices_s[[n]]$SVD[,8])})
w_s <- lapply(vec_size, function(n){as.vector(avg_matrices_s[[n]]$SVD[,7])})
#normalization
norm_z_s <- lapply(vec_size, function(n){(z_s[[n]]-min(z_s[[n]]))/(max(z_s[[n]])-min(z_s[[n]]))})
norm_w_s <- lapply(vec_size, function(n){(w_s[[n]]-min(w_s[[n]]))/(max(w_s[[n]])-min(w_s[[n]]))})
#AUC calculation using Trapezoid Rule Numerical Integration
auc_svd_s <- lapply(vec_size, function(n){round(trapz(norm_z_s[[n]],norm_w_s[[n]]),4)})
#AUC plot
#dataframe processing
auc_tbl1 <- do.call(rbind, auc_ubcf_s)
auc_tbl2 <- do.call(rbind, auc_svd_s)
auc_tbl <- data.frame(cbind(auc_tbl1,auc_tbl2))
auc_tbl[,3] <- vec_s
auc_tbl <- auc_tbl[,c(3,1,2)]
colnames(auc_tbl) <- c("Perc","UBCF","SVD")
auc_long <- gather(auc_tbl, variable, value, -Perc)
#inclusion of values calculated previously with no serendipity ratings
auc_1 <- data.frame("Perc"=vec_s,"UBCF"= auc_ubcf, "SVD"=auc_svd)
auc_long1 <- gather(auc_1, variable,value, -Perc)
#plot
ggplot(data = auc_long, aes(x = Perc, y = value, fill = variable)) +
  geom_col(position = position_dodge()) + ggtitle("AUC", subtitle = "Dots represent no serendipity ratings ") + xlab("Serendipity inclusion in %") + ylab("AUC") + geom_point(data=auc_long1, aes(x = Perc, y = value, fill = variable))

As can be seen in the chart above, inclusion of the serendipity dataset into the training dataset does increase the AUC in all runs. Overall, SVD method produces the higher AUC. Optimal % of inclusion seems to be at 80%.

4 Additional Experiments and Metrics - Online Evaluation (Deliverable 4)

Online evaluation refers to creating mechanisms that respond to ongoing activity on a web site and then measuring the accuracy of recommendations based off of these mechanisms. For example, a site could experiment with different changes in the recommender system algorithm and assess the accuracy of the change based on Click-Through Rate (CTR). Thus, the determination of how accurate the recommendations are would be based on user interaction with recommended items, how often recommended items were viewed, etc.

To create a reasonable online evaluation environment, an engine must be created to split user traffic randomly into different experimental tracks, and then follow user activity from those groups following the experiment. Some potential experiments could include:

Altering the presentation of recommendations and seeing if it changes CTR or patterns
Altering the recommendation algorithm in a variety of ways:
- Incorporate an entropy measure
- Give more weight to more recent ratings to address the user-recommender lifecycle
- Try recommending baskets of items instead of individual items
- Penalize or remove recommendations that could be very wrong, which reduces trust
Try arranging recommendations by their cost to see if that changes response
Try multi-dimensional ratings (recommend movies with same actor, same genre, etc.)

5 Conclusion

Via the four deliverables we completed per the assignment, we demonstrated that prediction errors can be decreased and recommendation performance improved by utilizing incorporating a serendipity dataset into the recommender system.

6 References

Building a Recommendation System with R by Suresh K. Gorakala, Michele Usuelli
Kotkov D, Konstan J.A, Zhao Q, Veijalainen J (2018) Investigating serendipity in recommender systems based on real user feedback. In: Proceedings of SAC 2018: symposium on applied computing, ACM
Kotkov, Denis & Konstan, Joseph & Zhao, Qian & Veijalainen, Jari. (2018). Investigating serendipity in recommender systems based on real user feedback. 1341-1350. 10.1145/3167132.3167276.
Konstan and Riedl. (2012). Recommender systems: from algorithms to user experience. User Model User-Adap Inter 22:101-123.
Gunawardana and Shani. (2009). A Survey of Accuracy Evaluation Metrics of Recommendation Tasks. Journal of Machine Learning Research 10 2935-2962.
Moreira, Souza and Cunha. (2015). Comparing offline and online recommender system evaluations on long-tail distributions. ACM RecSysy 2015 Poster Proceedings, September 16-20, 2015, Austria, Vienna.

DATA 612 - Summer 2020 - Project 4 | Accuracy and Beyond

Bruno de Melo and Leland Randles

July 2, 2020