1 Accuracy Comparisons (Deliverable 1)

In our previous assignment, we studied Matrix Factorization Methods. In this assignment, we will practice accuracy comparison methods and implement a user experience goal such as increased serendipity, novelty or diversity.

For this assignment, we chose to use the Serendipity datasets here: https://grouplens.org/datasets/serendipity-2018/. We follow the procedures laid out in Kotkov, Denis et. al. (see references section).

These datasets include 10,000,000 movie ratings. However, to make the dataset size more manageable, the ratings were reduced to include only ratings from the users who were part of the serendipity study (the “answers.csv” file). The ratings_raw dataset below contains 1,446,109 ratings. The dataset was further subsetted to include only movies which had been rated at least 200 times.

Another change we made to the original dataset is to include only six out of eight variations of serendipity, as outlined in the paper, because the two remaining variations are likely to reduce user satisfaction.

1.1 Create UBCF and SVD Recommender Models

Per the assignment, Deliverable 1 asks us to build two recommender systems with our data. We chose to build a UBCF model and an SVD model.

set.seed(137)
# create evaluation scheme
eval_sets <- evaluationScheme(data = ratings1, method = "cross-validation", k = 4, given = 5, goodRating = 3)
# build UBCF model and SVD model
ubcf_rec <- Recommender(getData(eval_sets, "train"), "UBCF", param = list(normalize = "center", method = "cosine"))
svd_rec <- Recommender(getData(eval_sets, "train"), "SVD", param = list(normalize = "center", k = 10))
# Make predictions with each model
ubcf_pred <- predict(ubcf_rec, getData(eval_sets, "known"), type = "ratings")
svd_pred <- predict(svd_rec, getData(eval_sets, "known"), type = "ratings")

1.2 Compare the UBCF and SVD Recommender Models

Now that we have built the two models, we will compare the errors and other metrics for each model:

set.seed(137)
# Table showing error calcs for UBCF vs SVD
ubcf_er <- calcPredictionAccuracy(ubcf_pred, getData(eval_sets, "unknown"))
svd_er <- calcPredictionAccuracy(svd_pred, getData(eval_sets, "unknown"))
models_to_evaluate <- list(
  UBCF_cos = list(name = "UBCF", param = list(normalize = "center", method = "cosine")), 
  SVD = list(name = "SVD", param = list(normalize = "center", k=10))
)
n_recommendations <- c(1, 5, seq(10, 100, 10))
list_results <- evaluate(x = eval_sets, method = models_to_evaluate, n = n_recommendations, progress = FALSE)
avg_matrices <- lapply(list_results, avg)
error_tables <- rbind(cbind(Model = rep("UBCF",12), n = rownames(avg_matrices$UBCF_cos), avg_matrices$UBCF_cos), cbind(Model = rep("SVD",12), n = rownames(avg_matrices$UBCF_cos), avg_matrices$SVD))
error_tables[,3:10] <- round(as.numeric(error_tables[,3:10]), 6)
kable(error_tables) %>% kable_styling()

	Model	n	TP	FP	FN	TN	precision	recall	TPR	FPR
1	UBCF	1	0.481405	0.518595	83.938017	103.061983	0.481405	0.005899	0.005899	0.004894
5	UBCF	5	2.233471	2.766529	82.18595	100.81405	0.446694	0.026814	0.026814	0.026978
10	UBCF	10	4.64876	5.35124	79.770661	98.229339	0.464876	0.055694	0.055694	0.051999
20	UBCF	20	9.363636	10.636364	75.055785	92.944215	0.468182	0.112047	0.112047	0.103521
30	UBCF	30	14.06405	15.93595	70.355372	87.644628	0.468802	0.168983	0.168983	0.153533
40	UBCF	40	18.745868	21.254132	65.673554	82.326446	0.468647	0.224892	0.224892	0.203866
50	UBCF	50	23.320248	26.677686	61.099174	76.902893	0.466423	0.279497	0.279497	0.255505
60	UBCF	60	27.88843	32.088843	56.530992	71.491736	0.46497	0.333404	0.333404	0.307086
70	UBCF	70	32.396694	37.559917	52.022727	66.020661	0.463076	0.387077	0.387077	0.360026
80	UBCF	80	36.950413	42.985537	47.469008	60.595041	0.462223	0.441373	0.441373	0.411432
90	UBCF	90	41.52686	48.38843	42.892562	55.192149	0.461813	0.496738	0.496738	0.464104
100	UBCF	100	46.202479	53.67562	38.216942	49.904959	0.462563	0.552625	0.552625	0.514903
1	SVD	1	0.57438	0.42562	83.845041	103.154959	0.57438	0.007105	0.007105	0.003881
5	SVD	5	2.752066	2.247934	81.667355	101.332645	0.550413	0.032786	0.032786	0.020003
10	SVD	10	5.27686	4.72314	79.142562	98.857438	0.527686	0.063001	0.063001	0.043305
20	SVD	20	10.014463	9.985537	74.404959	93.595041	0.500723	0.119513	0.119513	0.093821
30	SVD	30	14.737603	15.262397	69.681818	88.318182	0.491253	0.175855	0.175855	0.144
40	SVD	40	19.371901	20.628099	65.047521	82.952479	0.484298	0.230383	0.230383	0.195642
50	SVD	50	23.892562	26.107438	60.52686	77.47314	0.477851	0.282765	0.282765	0.247476
60	SVD	60	28.423554	31.576446	55.995868	72.004132	0.473726	0.33666	0.33666	0.300123
70	SVD	70	32.88843	37.11157	51.530992	66.469008	0.469835	0.388759	0.388759	0.352457
80	SVD	80	37.10124	42.89876	47.318182	60.681818	0.463765	0.437586	0.437586	0.408199
90	SVD	90	41.46281	48.53719	42.956612	55.043388	0.460698	0.489223	0.489223	0.462759
100	SVD	100	45.768595	54.231405	38.650826	49.349174	0.457686	0.539184	0.539184	0.516696

We can see that the UBCF model is the more accurate of the two. Next, we look at the RMSE, MSE and MAE.

# RMSE, MSE and MAE
k_Method <- c("UBCF-Cosine", "SVD")
k_table_p <- data.frame(rbind(ubcf_er, svd_er))
rownames(k_table_p) <- k_Method
k_table_p <- k_table_p[order(k_table_p$RMSE ),]
kable(k_table_p) %>% kable_styling()

	RMSE	MSE	MAE
SVD	0.9355845	0.8753183	0.6634779
UBCF-Cosine	0.9581187	0.9179914	0.6927214

Using RMSE, MSE and MAE, we can see that SVD is more accurate.

In addition, we created ROC Curve and Precision-Recall Plots:

# ROC Curve plot
plot(list_results, annotate = 1, legend = "topleft")
title("ROC Curve")

# Precision-Recall plot
plot(list_results, "prec/rec", annotate = 1, legend = "bottomright")
title("Precision-Recall")

Given that the ROC curves are very similar, we will calculate the area under the curve (AUC) to see which is better (visually, the curve seems to favor the UBCF model). AUC is calculated as the area formed by the TPR and FPR coordinates. To calculate the area, we will do the following procedure:
1) Normalize data, so that X and Y axis should be in unity.
2) Use Trapezoidal method to calculate AUC.

#AUC calculation
#UBCF
x <- as.vector(avg_matrices$UBCF_cos[,8])
y <- as.vector(avg_matrices$UBCF_cos[,7])
#normalization
norm_x <- (x-min(x)) / (max(x)-min(x))
norm_y <- (y-min(y)) / (max(y)-min(y))
#AUC calculation using Trapezoid Rule Numerical Integration
auc_ubcf <- trapz(norm_x, norm_y)
#SVD
z <- as.vector(avg_matrices$SVD[,8])
w <- as.vector(avg_matrices$SVD[,7])
#normalization
norm_z <- (z-min(z)) / (max(z)-min(z))
norm_w <- (w-min(w)) / (max(w)-min(w))
#AUC calculation using Trapezoid Rule Numerical Integration
auc_svd <- trapz(norm_z, norm_w)
# Table comparing AUCs
k_Method <- c("UBCF-Cosine", "SVD")
k_table_a <- data.frame(rbind(cbind(AUC=c(auc_ubcf, auc_svd))))
rownames(k_table_a) <- k_Method
kable(k_table_a) %>% kable_styling()

	AUC
UBCF-Cosine	0.5028067
SVD	0.5311904

As seen above, SVD AUC is slightly higher than UBCF’s.

2 Implement Support for Serendipity (Deliverable 2)

We use the Serendipity dataset, which seems to be the only publicly available dataset, which contains user feedback regarding serendipity on movies.

Our methodology is to include ramdonly sampled movies that contain seredenpity-related ratings into the training dataset, and then measure the impact of the inclusion. To avoid increasing the size of the training dataset, we will reduce its size proportionally to the amount of movies included.

We will measure the impact of the inclusion by varying the number of seredenpity movies into the training set. This will be done via a for loop from 10 to 100%.

set.seed(137)
#loop
vec_s <- seq(10,100,10)
vec_size <- seq(1,10,1)
t_size <- length(t_redux$user)
#using lapply functions to generate all results
# sampling serendipity file
s_sample <- lapply(vec_s, function(n){sample_frac(s,n/100)})
names(s_sample) <- paste0("s_sample", vec_s)
# sample size
s_sample_size <- lapply(vec_size, function(n){length(s_sample[[n]][,1])})
# reducing size of original set through sampling
sample_red <- lapply(vec_size, function(n){1-s_sample_size[[n]] / t_size})
t_sample <- lapply(vec_size, function(n){sample_frac(t_redux, sample_red[[n]])})
#t_sample<-t_redux
#merging data frames
t_s <- lapply(vec_size, function(n){rbind.data.frame(s_sample[[n]], t_sample[[n]])}) 
# coercing into realRatingMatrix
ratings_s <- lapply(vec_size, function(n){as(t_s[[n]], "realRatingMatrix")}) 
# create evaluation scheme
eval_sets_s <- lapply(vec_size, function(n){evaluationScheme(data = ratings_s[[n]], method = "cross-validation", k = 4, given = 5, goodRating =3)})
# build UBCF model and SVD model
ubcf_rec_s <- lapply(vec_size, function(n){Recommender(getData(eval_sets_s[[n]], "train"), "UBCF", param = list(normalize = "center", method = "cosine"))})
svd_rec_s <- lapply(vec_size, function(n){Recommender(getData(eval_sets_s[[n]], "train"), "SVD", param = list(normalize = "center", k=10))})
# Make predictions with each model
ubcf_pred_s <- lapply(vec_size, function(n){predict(ubcf_rec_s[[n]], getData(eval_sets_s[[n]], "known"), type = "ratings")})
svd_pred_s <- lapply(vec_size, function(n){predict(svd_rec_s[[n]], getData(eval_sets_s[[n]], "known"), type = "ratings")})
# Table showing error calcs for UBCF vs SVD
ubcf_er_s <- lapply(vec_size, function(n){calcPredictionAccuracy(ubcf_pred_s[[n]], getData(eval_sets_s[[n]], "unknown"))})
svd_er_s <-lapply(vec_size, function(n){calcPredictionAccuracy(svd_pred_s[[n]], getData(eval_sets_s[[n]], "unknown"))})
# Model evaluation
models_to_evaluate <- list(
    UBCF_cos = list(name = "UBCF", param = list(normalize = "center", method = "cosine")), 
    SVD = list(name = "SVD", param = list(normalize = "center", k=10))
  )
n_recommendations <- c(1, 5, seq(10, 100, 10))
list_results_s <- lapply(vec_size, function(n){evaluate(x = eval_sets_s[[n]], method = models_to_evaluate, n = n_recommendations, progress=FALSE)})
avg_matrices_s <- lapply(vec_size, function(n){lapply(list_results_s[[n]], avg)})
# error tables TP/FP/etc
error_tables_s <- lapply(vec_size, function(n){rbind(cbind(Model = rep("UBCF",12), n = rownames(avg_matrices_s[[n]]$UBCF_cos), avg_matrices_s[[n]]$UBCF_cos), cbind(Model = rep("SVD",12), n = rownames(avg_matrices_s[[n]]$UBCF_cos), avg_matrices_s[[n]]$SVD))})
# RMSE plot
#dataframe processing
rmse_e1 <- do.call(rbind, ubcf_er_s)
rmse_e2 <- do.call(rbind, svd_er_s)
rmse_tbl <- data.frame(cbind(rmse_e1, rmse_e2))
rmse_tbl <- rmse_tbl[,c(1,4)]
rmse_tbl[,3] <- vec_s
rmse_tbl <- rmse_tbl[,c(3,1,2)]
colnames(rmse_tbl) <- c("Perc", "UBCF", "SVD")
rmse_long <- gather(rmse_tbl, variable, value, -Perc)
#inclusion of values calculated previously with no serendipity ratings
rmse_1 <- data.frame("Perc"=vec_s, "UBCF"= ubcf_er[[1]], "SVD"=svd_er[[1]])
rmse_long1 <- gather(rmse_1, variable, value, -Perc)
#plot
ggplot(data = rmse_long, aes(x = Perc, y = value, fill = variable)) +
  geom_col(position = position_dodge()) + ggtitle("RMSE", subtitle = "Dots represent no serendipity ratings") + xlab("Serendipity inclusion in %") + ylab("RMSE") + geom_point(data = rmse_long1, aes(x = Perc, y = value, fill = variable))

3 Changes in Accuracy After Incorporating Serendipity Data (Deliverable 3)

As can be seen in the chart above, inclusion of the serendipity dataset into the training dataset does reduce the error as measured by RMSE in the majority of the runs. Overall, SVD method is still the one producing the lower errors. Optimal % of inclusion seems to be at 100%.

Next we will calculate the AUC using the same methodology as explained above and compare it with the dataset with no serendipity ratings.

#AUC calculation
#UBCF
x_s <- lapply(vec_size, function(n){as.vector(avg_matrices_s[[n]]$UBCF_cos[,8])})
y_s <- lapply(vec_size, function(n){as.vector(avg_matrices_s[[n]]$UBCF_cos[,7])})
#normalization
norm_x_s <- lapply(vec_size, function(n){(x_s[[n]]-min(x_s[[n]]))/(max(x_s[[n]])-min(x_s[[n]]))})
norm_y_s <- lapply(vec_size, function(n){(y_s[[n]]-min(y_s[[n]]))/(max(y_s[[n]])-min(y_s[[n]]))})
#AUC calculation using Trapezoid Rule Numerical Integration
auc_ubcf_s <- lapply(vec_size, function(n){round(trapz(norm_x_s[[n]],norm_y_s[[n]]),4)})
#SVD
z_s <- lapply(vec_size, function(n){as.vector(avg_matrices_s[[n]]$SVD[,8])})
w_s <- lapply(vec_size, function(n){as.vector(avg_matrices_s[[n]]$SVD[,7])})
#normalization
norm_z_s <- lapply(vec_size, function(n){(z_s[[n]]-min(z_s[[n]]))/(max(z_s[[n]])-min(z_s[[n]]))})
norm_w_s <- lapply(vec_size, function(n){(w_s[[n]]-min(w_s[[n]]))/(max(w_s[[n]])-min(w_s[[n]]))})
#AUC calculation using Trapezoid Rule Numerical Integration
auc_svd_s <- lapply(vec_size, function(n){round(trapz(norm_z_s[[n]],norm_w_s[[n]]),4)})
#AUC plot
#dataframe processing
auc_tbl1 <- do.call(rbind, auc_ubcf_s)
auc_tbl2 <- do.call(rbind, auc_svd_s)
auc_tbl <- data.frame(cbind(auc_tbl1,auc_tbl2))
auc_tbl[,3] <- vec_s
auc_tbl <- auc_tbl[,c(3,1,2)]
colnames(auc_tbl) <- c("Perc","UBCF","SVD")
auc_long <- gather(auc_tbl, variable, value, -Perc)
#inclusion of values calculated previously with no serendipity ratings
auc_1 <- data.frame("Perc"=vec_s,"UBCF"= auc_ubcf, "SVD"=auc_svd)
auc_long1 <- gather(auc_1, variable,value, -Perc)
#plot
ggplot(data = auc_long, aes(x = Perc, y = value, fill = variable)) +
  geom_col(position = position_dodge()) + ggtitle("AUC", subtitle = "Dots represent no serendipity ratings ") + xlab("Serendipity inclusion in %") + ylab("AUC") + geom_point(data=auc_long1, aes(x = Perc, y = value, fill = variable))

As can be seen in the chart above, inclusion of the serendipity dataset into the training dataset does increase the AUC in all runs. Overall, SVD method produces the higher AUC. Optimal % of inclusion seems to be at 80%.

4 Additional Experiments and Metrics - Online Evaluation (Deliverable 4)

Online evaluation refers to creating mechanisms that respond to ongoing activity on a web site and then measuring the accuracy of recommendations based off of these mechanisms. For example, a site could experiment with different changes in the recommender system algorithm and assess the accuracy of the change based on Click-Through Rate (CTR). Thus, the determination of how accurate the recommendations are would be based on user interaction with recommended items, how often recommended items were viewed, etc.

To create a reasonable online evaluation environment, an engine must be created to split user traffic randomly into different experimental tracks, and then follow user activity from those groups following the experiment. Some potential experiments could include:

Altering the presentation of recommendations and seeing if it changes CTR or patterns
Altering the recommendation algorithm in a variety of ways:
- Incorporate an entropy measure
- Give more weight to more recent ratings to address the user-recommender lifecycle
- Try recommending baskets of items instead of individual items
- Penalize or remove recommendations that could be very wrong, which reduces trust
Try arranging recommendations by their cost to see if that changes response
Try multi-dimensional ratings (recommend movies with same actor, same genre, etc.)

5 Conclusion

Via the four deliverables we completed per the assignment, we demonstrated that prediction errors can be decreased and recommendation performance improved by utilizing incorporating a serendipity dataset into the recommender system.

6 References

Building a Recommendation System with R by Suresh K. Gorakala, Michele Usuelli
Kotkov D, Konstan J.A, Zhao Q, Veijalainen J (2018) Investigating serendipity in recommender systems based on real user feedback. In: Proceedings of SAC 2018: symposium on applied computing, ACM
Kotkov, Denis & Konstan, Joseph & Zhao, Qian & Veijalainen, Jari. (2018). Investigating serendipity in recommender systems based on real user feedback. 1341-1350. 10.1145/3167132.3167276.
Konstan and Riedl. (2012). Recommender systems: from algorithms to user experience. User Model User-Adap Inter 22:101-123.
Gunawardana and Shani. (2009). A Survey of Accuracy Evaluation Metrics of Recommendation Tasks. Journal of Machine Learning Research 10 2935-2962.
Moreira, Souza and Cunha. (2015). Comparing offline and online recommender system evaluations on long-tail distributions. ACM RecSysy 2015 Poster Proceedings, September 16-20, 2015, Austria, Vienna.

DATA 612 - Summer 2020 - Project 4 | Accuracy and Beyond

Bruno de Melo and Leland Randles

July 2, 2020