The goal of this assignment is for you to try out different ways of implementing and configuring a recommender, and to evaluate your different approaches.For assignment 2, start with an existing dataset of user-item ratings, such MovieLens. Implement at least two of these recommendation algorithms:
You should evaluate and compare different approaches, using different algorithms, normalization techniques, similarity methods, neighborhood sizes, etc. You don’t need to be exhaustive—these are just some suggested possibilities.
You may use the course text’s recommenderlab or any other library that you want. Please provide at least one graph, and a textual summary of your findings and recommendations.
In this project we will use the built-in MovieLense dataset from the recommenderlab library. We will implement and compare results from several recommendation algorithms. Our recommenders will predict movie ratings given user and movie profiles as well as users’ past ratings.
Our data is stored as a realRatingMatrix, where each row represents a user and each column represents a movie. We can see that there are a total of 99,392 ratings, so not every user has watched every movie. This means that our matrix is sparse – most cells have no information.
## 943 x 1664 rating matrix of class 'realRatingMatrix' with 99392 ratings.
With reference to code by (Hahsler, 2019), we did some exploration of the dataset so that we get a sense of the how ratings are distributed according to user and item. This will inform how we will subset the data for modeling.
First, we can take a look at the number of ratings per user.
The data is severely right skewed, which indicates that most users have only rated a few movies.
Next, we can take a look at the number of ratings per movie.
This distribution is also severely right skewed, which means that most movies only have a few ratings.
This aligns with our original intuitios – given the skewedness in both of these distributions, it is obvious that there is a lot of sparsity in the dataset.
Next, we can look at the distribution of ratings.
# Visualise the distribution of ratings in this dataset
movielense %>%
as.vector() %>%
as_tibble() %>%
filter_all(any_vars(. != 0)) %>%
ggplot(aes(value)) +
geom_bar(fill = 'red') +
labs(title = " Overall Distribution of Ratings", y = "", x = "Ratings") +
theme_minimal()The overall distribution of ratings is left skewed, indicating that people tend to rate movies more positively. The mean rating is confirmed below:
## [1] 3.587565
The format of MovieLenseMeta is a data frame with movie title, year, IMDb URL and indicator variables for 19 genres.
## 'data.frame': 1664 obs. of 22 variables:
## $ title : chr "Toy Story (1995)" "GoldenEye (1995)" "Four Rooms (1995)" "Get Shorty (1995)" ...
## $ year : num 1995 1995 1995 1995 1995 ...
## $ url : chr "http://us.imdb.com/M/title-exact?Toy%20Story%20(1995)" "http://us.imdb.com/M/title-exact?GoldenEye%20(1995)" "http://us.imdb.com/M/title-exact?Four%20Rooms%20(1995)" "http://us.imdb.com/M/title-exact?Get%20Shorty%20(1995)" ...
## $ unknown : int 0 0 0 0 0 0 0 0 0 0 ...
## $ Action : int 0 1 0 1 0 0 0 0 0 0 ...
## $ Adventure : int 0 1 0 0 0 0 0 0 0 0 ...
## $ Animation : int 1 0 0 0 0 0 0 0 0 0 ...
## $ Children's : int 1 0 0 0 0 0 0 1 0 0 ...
## $ Comedy : int 1 0 0 1 0 0 0 1 0 0 ...
## $ Crime : int 0 0 0 0 1 0 0 0 0 0 ...
## $ Documentary: int 0 0 0 0 0 0 0 0 0 0 ...
## $ Drama : int 0 0 0 1 1 1 1 1 1 1 ...
## $ Fantasy : int 0 0 0 0 0 0 0 0 0 0 ...
## $ Film-Noir : int 0 0 0 0 0 0 0 0 0 0 ...
## $ Horror : int 0 0 0 0 0 0 0 0 0 0 ...
## $ Musical : int 0 0 0 0 0 0 0 0 0 0 ...
## $ Mystery : int 0 0 0 0 0 0 0 0 0 0 ...
## $ Romance : int 0 0 0 0 0 0 0 0 0 0 ...
## $ Sci-Fi : int 0 0 0 0 0 0 1 0 0 0 ...
## $ Thriller : int 0 1 1 0 1 0 0 0 0 0 ...
## $ War : int 0 0 0 0 0 0 0 0 0 1 ...
## $ Western : int 0 0 0 0 0 0 0 0 0 0 ...
We can also build out a heatmap of the most relevant users/movies. This will consist of the users that rated the most movies and the movies that have the most ratings. Dark columns represent highly rated movies and dark rows represent users that give high ratings. This indicates the need for normalization of the data – we don’t want to skew the results from users that predominantly rank high or low.
min_n_movies <- quantile(rowCounts(MovieLense), 0.99)
min_n_users <- quantile(colCounts(MovieLense), 0.99)
image(MovieLense[rowCounts(MovieLense) > min_n_movies,
colCounts(MovieLense) > min_n_users], main = "Heatmap of the top users
and movies")Our data prep will involve: selecting relevant data (movies that have a lot of ratings and users that have seen a lot of movies) and splitting into training and test sets. Once we’ve cleaned things up, we can start to build our recommender models.
It is obvious that the dataset is very sparse. It includes many individuals that rated very few movies, and many movies that have very few ratings. In order to have a healthy baseline on which to our build recommendation models, we will take into consideration those users who have rated at least 50 movies and those movies that are rated by at least 100 users. A look at the disribution of average rating per user shows us that we have a large variation across the subset.
# base population
most_rated <- MovieLense[rowCounts(MovieLense) > 50, colCounts(MovieLense) > 100]
average_ratings_per_user <- rowMeans(most_rated)
qplot(average_ratings_per_user) + stat_bin(binwidth = 0.1) +
ggtitle("Distribution of the average rating per user")## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Next, we can split the data into train and test sets. We are left with 462 records in the training set and 98 records in the test set.
# split the dataset into test set and train set
which_train <- sample(x = c(TRUE, FALSE), size = nrow(most_rated), replace = TRUE, prob = c(0.8, 0.2))
train_set <- most_rated[which_train, ]
test_set <- most_rated[!which_train, ]
paste0('Training Set: ',nrow(train_set), ' records | Test Set: ', nrow(test_set), ' records')## [1] "Training Set: 462 records | Test Set: 98 records"
Now that we’ve cleaned up our data, we can configure and implement our recommender systems. There are 3 main filtering approaches: content-based, user-user collaborative, and item-item collaborative.
In content-based filtering, recommendations are user-based – a user profile is defined and the algorithm recommends new items that are similar to previously rated items.
In collaborative filtering, recommendations are based on information about similar users (user-user) or similar items (item-item). The recommenderlab library contains 3 similarity functions that we will explore: cosine, pearson, and jaccard.
For this project, we will focus on user-user and item-item based collaborative filtering.
User-based collaborative filtering: This recommends to a user the items that are the most preferred by similar users. Our approach will be to:
The parameters for this approach are similar to those of the item-item filtering: the similarity method (cosine, pearson, and jaccard), the number of nearest neighbors, and the normalization technique.
We’ll start off with a simple, non-optimized user-user based model to get a feel for the functions. We’ll use our train set to create the model and our test set to get recommendations on. Here, we’ve defined the number of recommendations as 5, the similarity method as cosine, and the number of neighbors as 5. We’ll plot the distribution of recommendations.
# recommend 5 movies to each user
n_recommended <- 5
# create model on training set
recc_model <- Recommender(data = train_set, method = "UBCF", parameter = list(method = 'cosine', nn = 5, normalize = 'center' ))
# predict on test set
recc_predict <- predict(object = recc_model, newdata = test_set, n = n_recommended)
# create a matrix of predictions
recc_matrix <- sapply(recc_predict@items, function(x){
colnames(most_rated)[x]
})
# plot distribution of movie recommendations
numRec <- factor(table(recc_matrix))
qplot(numRec,
main = 'Distribution of Movie Recommendations',
ylab='Number of Movies',
xlab = 'Times Recommended') We can see that that most movies are recommended only a few times.
Now, we can optimize our model by tweaking the similarity method, number of neighbors, and normalization type. We’ll create a function to calculate and record the RMSE, MSE, and MAE for each set of parameters.
set.seed(200)
# function to calculate error
errorCalc <- function(e, sm, nb, nrm, evl){
# create a user-based CF recommender using training data
rec <- Recommender(getData(e, "train"),
evl,
parameter = list(method = sm,
nn = nb,
normalize = nrm))
# create predictions for the test data using known ratings
pred <- predict(rec, getData(e, "known"), type="ratings")
# avg error metrics per user, avgd over all recommendations
error.ubcf <-calcPredictionAccuracy(pred, getData(e, "unknown"))
# evaluate topNLists instead
pred.top10 <- predict(rec, getData(e, "known"), type="topNList", n = 5)
error.ubcf.top10 <- calcPredictionAccuracy(pred.top10,
getData(e,"unknown"),
given=5,
goodRating=3)
error.ubcf <- c(error.ubcf, sm, nb, nrm)
names(error.ubcf) <- c('RMSE', 'MSE', 'MAE','similarity',
'neighbors','norm')
return(error.ubcf)
}In this case, we’ll define an evaluation scheme that consists of 10-fold cross validation with a “good rating” defined as 3. We will cycle through all parameter combinations to identify what the optimal combination is.
# cycle through all parameter combinations
simMethod <- c('cosine', 'pearson', 'jaccard')
neighbors <- seq(5,100,5)
norm <- c('center', 'Z-score')
evalType <- "UBCF"
# 10-fold cross validation, good ratings = 3, withholding 10 items for eval
e <- evaluationScheme(most_rated, method = "cross-validation", k=10, given = 10, goodRating=3)
# create list for errors
errors.names <- c("RMSE", "MSE", "MAE", 'similarity', 'neighbors','norm')
ubcfErrors.ls <- vector("list", length(errors.names))
names(ubcfErrors.ls) <- errors.names
for (i in simMethod){
for (j in neighbors){
for (k in norm){
ubcfErrors.ls <- rbind(ubcfErrors.ls, errorCalc(e, i, j, k, evalType))
}
}
}We can now visualize our errors:
finalErrors.ubcf <- as.data.frame(ubcfErrors.ls) %>%
filter(RMSE != 'NULL') %>%
mutate(RMSE = as.double(RMSE),
MSE = as.double(MSE),
MAE = as.double(MAE))
finalErrors.ubcf$ERROR_MEAN<- rowMeans(subset(finalErrors.ubcf, select = c(RMSE, MSE, MAE)), na.rm = TRUE)
ggplot(finalErrors.ubcf, aes(1:nrow(finalErrors.ubcf))) +
geom_line(aes(y=RMSE), colour="red") +
geom_line(aes(y=MSE), colour="green") +
geom_line(aes(y=MAE), colour="blue") +
geom_line(aes(y=ERROR_MEAN), colour="purple") +
xlab('Trial Run') +
ylab('Error') +
ggtitle('Errors across all runs')To identify which set of parameters to use, we can identify the combination that has the lowest error.
paramVals.ubcf <- finalErrors.ubcf %>%
filter(ERROR_MEAN == min(finalErrors.ubcf$ERROR_MEAN))
paramVals.ubcf## RMSE MSE MAE similarity neighbors norm ERROR_MEAN
## 1 0.9588247 0.9193448 0.748171 pearson 100 center 0.8754469
This is the set of parameters we will use for the final comparison.
Item-based collaborative filtering: This recommends to a user the items that are most similar on the user’s previous ratings. We will follow this process:
Here, we mimic our prior approach to user-based collaborative filtering.
# recommend 5 movies to each user
n_recommended_i <- 5
# create model on training set
recc_model_i <- Recommender(data = train_set, method = "IBCF", parameter = list(method = 'cosine', k = 5, normalize = 'center'))
# predict on test set
recc_predict_i <- predict(object = recc_model_i, newdata = test_set, n = n_recommended_i)
# create a matrix of predictions
recc_matrix_i <- sapply(recc_predict_i@items, function(x){
colnames(most_rated)[x]
})
# plot distribution of movie recommendations
numRec_i <- factor(table(recc_matrix_i))
qplot(numRec_i,
main = 'Distribution of Movie Recommendations',
ylab='Number of Movies',
xlab = 'Times Recommended')The results are very similar to UBCF – there are a large number of movies that are only recommended once.
Next, we’ll define a IBCF-specific function that will calculate the error associated with differing parameters.
# function to calculate error
errorCalc_i <- function(e, sm, kn, nrm, evl){
set.seed(200)
# create a user-based CF recommender using training data
rec <- Recommender(getData(e, "train"), evl, parameter = list(method = sm, k = kn, normalize = nrm))
# create predictions for the test data using known ratings
pred <- predict(rec, getData(e, "known"), type="ratings")
# avg error metrics per user, avgd over all recommendations
error.ibcf <-calcPredictionAccuracy(pred, getData(e, "unknown"))
error.ibcf <- c(error.ibcf, sm, kn, nrm)
names(error.ibcf) <- c('RMSE', 'MSE', 'MAE','similarity', 'neighbor items', 'norm')
return(error.ibcf)
}Finally, we’ll cycle through all combinations of our parameters and calculate the error associated with each.
# Model parameters
simMethod <- c('cosine', 'pearson', 'jaccard')
vector_k <- seq(5, 100, 5)
norm <- c('center', 'Z-score')
evalType <- "IBCF"
# 10-fold cross validation, good ratings = 3, withholding 10 items for eval
e <- evaluationScheme(most_rated, method = "cross-validation", k=10, given = 10, goodRating=3)
# create list for errors
errors.names <- c("RMSE", "MSE", "MAE", 'similarity', 'k-closest items', 'norm')
ibcfErrors.ls <- vector("list", length(errors.names))
names(ibcfErrors.ls) <- errors.names
for (i in simMethod) {
for (j in vector_k) {
for (k in norm){
ibcfErrors.ls <- rbind(ibcfErrors.ls, errorCalc_i(e, i, j, k, evalType))
}
}
}We can identify which set of parameters yields the lowest average error. We will use this as the basis of comparison against the UBCF methodology.
# Pulling lists of errors into dataframe
finalErrors.ibcf <- as.data.frame(ibcfErrors.ls) %>%
filter(RMSE != 'NULL') %>%
mutate(RMSE = as.double(RMSE),
MSE = as.double(MSE),
MAE = as.double(MAE))
# Adding a column containing mean error
finalErrors.ibcf$ERROR_MEAN<- rowMeans(subset(finalErrors.ibcf, select = c(RMSE, MSE, MAE)), na.rm = TRUE)
# Plotting the errors
ggplot(finalErrors.ibcf, aes(1:nrow(finalErrors.ibcf))) +
geom_line(aes(y=RMSE), colour="red") +
geom_line(aes(y=MSE), colour="green") +
geom_line(aes(y=MAE), colour="blue") +
geom_line(aes(y=ERROR_MEAN), colour="purple") +
xlab('Trial Run') +
ylab('Error') +
ggtitle('Errors across all runs')The combination of parameters that minimizes the mean error includes Pearson’s correlation, 100 neighbors, and Z-score normalization. Already, we can see that the error using the IBCF methodology is much higher than the UBCF methodology.
# Finding parameter values that minimize the mean error
paramVals.ibcf <- finalErrors.ibcf %>%
filter(ERROR_MEAN == min(finalErrors.ibcf$ERROR_MEAN))
paramVals.ibcf## RMSE MSE MAE similarity k-closest items norm
## 1 1.150618 1.323922 0.8954775 pearson 100 Z-score
## ERROR_MEAN
## 1 1.123339
For our final evaluation, we compare the user-user and item-item based models, each optimized with those combinations of parameters that minimize mean error. We plot a ROC curve and Precision-recall for each model, and then examine the differences.
User-based collaborative filtering
# Results of model that minimizes mean error
results.ubcf <- evaluate(e, 'UBCF', parameter = list(method = 'cosine',
nn = 100,
normalize = 'center'))## UBCF run fold/sample [model time/prediction time]
## 1 [0sec/0.17sec]
## 2 [0.02sec/0.17sec]
## 3 [0.02sec/0.15sec]
## 4 [0sec/0.22sec]
## 5 [0.02sec/0.22sec]
## 6 [0sec/0.19sec]
## 7 [0.02sec/0.17sec]
## 8 [0sec/0.15sec]
## 9 [0sec/0.17sec]
## 10 [0.02sec/0.37sec]
# Plotting True Positive Rate (TPR) against False Positive Rate (FPR)
plot(results.ubcf, annotate = TRUE, main = "ROC curve")# Plotting Precision-recall
plot(results.ubcf, "prec/rec", annotate = TRUE, main = "Precision-recall - UBCF")Item-based collaborative filtering
# Evaluating using the parameter values - Pearson's, k = 100, Z-score normalization
results.ibcf <- evaluate(e, 'IBCF', parameter = list(method = paramVals.ibcf[[4]], k = as.integer(paramVals.ibcf[[5]]), normalize = paramVals.ibcf[[6]]))## IBCF run fold/sample [model time/prediction time]
## 1 [0.88sec/0.03sec]
## 2 [0.68sec/0.05sec]
## 3 [0.65sec/0.04sec]
## 4 [0.66sec/0.03sec]
## 5 [0.68sec/0.03sec]
## 6 [0.68sec/0.03sec]
## 7 [0.91sec/0.03sec]
## 8 [0.66sec/0.05sec]
## 9 [0.58sec/0.01sec]
## 10 [0.51sec/0.02sec]
Comparison Finally, we can compare the UBCF and IBCF models to each other.
# UBCF and IBCF models with optimal parameters
models_to_evaluate <- list(results.ubcf = list(name = "UBCF", param = list(method = 'cosine', nn = 100, normalize = 'center')),
results.ibcf = list(name = "IBCF", param = list(method = paramVals.ibcf[[4]], k = as.integer(paramVals.ibcf[[5]]), normalize = paramVals.ibcf[[6]])))
# Evaluating at various numbers of recommendations
n_recommendations <- seq(5, 100, 5)
list_results <- evaluate(e, method = models_to_evaluate, n = n_recommendations)## UBCF run fold/sample [model time/prediction time]
## 1 [0.02sec/0.12sec]
## 2 [0sec/0.11sec]
## 3 [0sec/0.12sec]
## 4 [0.02sec/0.29sec]
## 5 [0sec/0.12sec]
## 6 [0sec/0.14sec]
## 7 [0sec/0.16sec]
## 8 [0sec/0.38sec]
## 9 [0.01sec/0.18sec]
## 10 [0.02sec/0.17sec]
## IBCF run fold/sample [model time/prediction time]
## 1 [0.76sec/0.07sec]
## 2 [0.84sec/0.03sec]
## 3 [0.65sec/0.05sec]
## 4 [0.72sec/0.03sec]
## 5 [0.63sec/0.05sec]
## 6 [1.05sec/0.03sec]
## 7 [0.75sec/0.03sec]
## 8 [0.75sec/0.05sec]
## 9 [0.72sec/0.05sec]
## 10 [0.68sec/0.03sec]
# Plotting and comparing ROC curve and Precision-recall for each model
plot(list_results, legend = "topleft", main = "ROC curve")This analysis suggests that, for our subset of the MovieLense dataset, a user-based collaborative filtering model with optimal model parameters outperforms its item-based collaborative filtering counterpart.
Hahsler, M. (2019, August 27). MovieLense Dataset (100k). Rdrr.Io. https://rdrr.io/cran/recommenderlab/man/MovieLense.html
“Recommender Systems 101 – a Step by Step Practical Example in R.” R, 24 Dec. 2014, www.r-bloggers.com/recommender-systems-101-a-step-by-step-practical-example-in-r/.