The goal of this project is to implement and configure a recommender using below two types of recommendation algorithms, and then to evaluate and compare different approaches, different algorithms, and similarity methods.
# Load required packages
library(tidyverse)
library(recommenderlab)
library(psych)
library(reshape2)
library(ggpubr)
library(purrr)
Both the movies and ratings datasets are taken from https://grouplens.org/datasets/movielens/latest/. There are two versions of these datasets. The small datasets are chosen due to limited computing power available on my laptop.
# Load movies and ratings datasets
movies <- read.csv("https://raw.githubusercontent.com/SieSiongWong/DATA-612/master/movies.csv")
ratings <- read.csv("https://raw.githubusercontent.com/SieSiongWong/DATA-612/master/ratings.csv")
head(movies)
## movieId title
## 1 1 Toy Story (1995)
## 2 2 Jumanji (1995)
## 3 3 Grumpier Old Men (1995)
## 4 4 Waiting to Exhale (1995)
## 5 5 Father of the Bride Part II (1995)
## 6 6 Heat (1995)
## genres
## 1 Adventure|Animation|Children|Comedy|Fantasy
## 2 Adventure|Children|Fantasy
## 3 Comedy|Romance
## 4 Comedy|Drama|Romance
## 5 Comedy
## 6 Action|Crime|Thriller
head(ratings)
## userId movieId rating timestamp
## 1 1 1 4 964982703
## 2 1 3 4 964981247
## 3 1 6 4 964982224
## 4 1 47 5 964983815
## 5 1 50 5 964982931
## 6 1 70 3 964982400
The movies dataset contain 3 columns and 9742 observations. The ratings dataset contain 4 columns and 100,836 observations.
We can see that the mean of the rating variable is at 3.5 and the standard deviation is 1.04 and the distribution is left skewed a little.
# Summary of movies and ratings datasets
str(movies)
## 'data.frame': 9742 obs. of 3 variables:
## $ movieId: int 1 2 3 4 5 6 7 8 9 10 ...
## $ title : Factor w/ 9737 levels "'71 (2014)","'burbs, The (1989)",..: 8895 4662 3676 9250 2979 3859 7348 8834 8159 3544 ...
## $ genres : Factor w/ 951 levels "(no genres listed)",..: 352 418 733 688 635 261 733 400 2 134 ...
str(ratings)
## 'data.frame': 100836 obs. of 4 variables:
## $ userId : int 1 1 1 1 1 1 1 1 1 1 ...
## $ movieId : int 1 3 6 47 50 70 101 110 151 157 ...
## $ rating : num 4 4 4 5 5 3 5 4 5 5 ...
## $ timestamp: int 964982703 964981247 964982224 964983815 964982931 964982400 964980868 964982176 964984041 964984100 ...
# Statistical summary of rating variable
describe(ratings$rating)
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 100836 3.5 1.04 3.5 3.57 0.74 0.5 5 4.5 -0.64 0.12 0
# Plot a histogram to show the distribution of ratings
hist(ratings$rating, main = "Ratings Distribution", xlab = "Ratings", ylab = "Frequency", col = "hotpink", ylim = c(0,30000), breaks = 15)
First of all, we have to convert the raw dataset into matrix format that can be used for building recommendation systems through the recommenderlab package.
# Convert to rating matrix
ratings_matrix <- dcast(ratings, userId~movieId, value.var = "rating", na.rm = FALSE)
# remove userid column
ratings_matrix <- as.matrix(ratings_matrix[,-1])
# Convert rating matrix into a recommenderlab sparse matrix
ratings_matrix <- as(ratings_matrix, "realRatingMatrix")
ratings_matrix
## 610 x 9724 rating matrix of class 'realRatingMatrix' with 100836 ratings.
Each row of the ratings_matrix corresponds to a user, and each column corresponds to a movie id. There are more than 610 x 9724 = 5,931,640 combinations between a user and a movie id. So, it requires 5,931,640 cells to build the matrix. As we know that not every user has watched every movie. There are only 100,836 observations, so this matrix is sparse.
# Convert the ratings matrix into a vector
vec_ratings <- as.vector(ratings_matrix@data)
# Unique ratings
unique(vec_ratings)
## [1] 4.0 0.0 4.5 2.5 3.5 3.0 5.0 0.5 2.0 1.5 1.0
# Count the occurrences for each rating
table_ratings <- table(vec_ratings)
table_ratings
## vec_ratings
## 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5
## 5830804 1370 2811 1791 7551 5550 20047 13136 26818 8551
## 5
## 13211
As we know a rating equal to 0 means a missing value in the matrix, so we can remove all of them before building a frequency plot of the ratings to visualize the ratings distribution.
# Remove zero rating and convert the vector to factor
vec_ratings <- vec_ratings[vec_ratings != 0] %>% factor()
# Visualize through qplot
qplot(vec_ratings, fill = I("steelblue")) +
ggtitle("Distribution of the Ratings") +
labs(x = "Ratings")
# Search for the top 5 most viewed movies
most_views <- colCounts(ratings_matrix) %>% melt()
most_views <- tibble::rowid_to_column(most_views, "movieId") %>%
rename(count = value) %>%
top_n(count, n = 5) %>%
merge(movies, by = "movieId")
# Visualize the top 5 most viewed movies
ggplot(most_views, aes(x = reorder(title, count), y = count, fill = 'lightblue')) +
geom_bar(stat = "identity") +
theme(axis.text.x =element_text(angle = 45, hjust = 1)) +
ggtitle("Top 5 Most Viewed Movies") +
theme(legend.position = "none", axis.title.x = element_blank())
# Average rating for each movie
avg_ratings_mv <- colMeans(ratings_matrix)
# Average rating for each user
avg_ratings_us <- rowMeans(ratings_matrix)
# Visualize the distribution of the average movie rating
avg1 <- qplot(avg_ratings_mv) +
stat_bin(binwidth = 0.1) +
ggtitle("Average Movie Rating Distribution") +
labs(x = 'Average Rating', y = 'Frequency')
# Visualize the distribution of the average rating per user
avg2 <- qplot(avg_ratings_us) +
stat_bin(binwidth = 0.1) +
ggtitle("Average Rating Per User Distribution") +
labs(x = 'Average Rating', y = 'Frequency')
figure <- ggarrange(avg1, avg2, ncol = 1, nrow = 2)
figure
From both of the plots above, we can see that there are some movies have only few ratings and some users only rated few movies. For building recommendation system, we don’t want take these movies and users into account as these ratings might be biased. To remove these least-watched movies and least-rated users, we can set a threshold of minimum number for example, 50.
# Filter users and movies more than 50
ratings_matrix <- ratings_matrix[rowCounts(ratings_matrix) > 50, colCounts(ratings_matrix) > 50]
# Average rating for each movie
avg_ratings_mv2 <- colMeans(ratings_matrix)
# Average rating for each user
avg_ratings_us2 <- rowMeans(ratings_matrix)
# Visualize the distribution of the average movie rating
avg3 <- qplot(avg_ratings_mv2) +
stat_bin(binwidth = 0.1) +
ggtitle("Average Movie Rating Distribution") +
labs(x = 'Average Rating', y = 'Frequency')
# Visualize the distribution of the average rating per user
avg4 <- qplot(avg_ratings_us2) +
stat_bin(binwidth = 0.1) +
ggtitle("Average Rating Per User Distribution") +
labs(x = 'Average Rating', y = 'Frequency')
figure2 <- ggarrange(avg1, avg2, avg3, avg4,
labels = c("A", "B", "C", "D"),
ncol = 2, nrow = 2)
figure2
The effect of removing those potential biased ratings to the distribution is obvious. From above figure, we can see that the curve is much narrow and has less variance compared to before.
Let’s see what are some of the recommender options are available from the recommenderlab package applicable to the realRatingMatrix objects for building recommendation systems.
# Display the list of options for real rating matrix
rec <- recommenderRegistry$get_entries(dataType = "realRatingMatrix")
names(rec)
## [1] "ALS_realRatingMatrix" "ALS_implicit_realRatingMatrix"
## [3] "IBCF_realRatingMatrix" "LIBMF_realRatingMatrix"
## [5] "POPULAR_realRatingMatrix" "RANDOM_realRatingMatrix"
## [7] "RERECOMMEND_realRatingMatrix" "SVD_realRatingMatrix"
## [9] "SVDF_realRatingMatrix" "UBCF_realRatingMatrix"
# Description for the IBCF method
lapply(rec, `[[`, 'description') %>% `[[`('IBCF_realRatingMatrix')
## [1] "Recommender based on item-based collaborative filtering."
# Description for the UBCF method
lapply(rec, `[[`, 'description') %>% `[[`('UBCF_realRatingMatrix')
## [1] "Recommender based on user-based collaborative filtering."
# Default parameter values for the IBCF method
rec$IBCF_realRatingMatrix$parameters
## $k
## [1] 30
##
## $method
## [1] "Cosine"
##
## $normalize
## [1] "center"
##
## $normalize_sim_matrix
## [1] FALSE
##
## $alpha
## [1] 0.5
##
## $na_as_zero
## [1] FALSE
# Default parameter values for the UBCF method
rec$UBCF_realRatingMatrix$parameters
## $method
## [1] "cosine"
##
## $nn
## [1] 25
##
## $sample
## [1] FALSE
##
## $normalize
## [1] "center"
“IBCF_realRatingMatrix” and “UBCF_realRatingMatrix” are the two models used to demonstrate in this project. One is item-based and the other is user-based collaborative filtering. Different parameters will be used to optimize the performance of these two recommendation models.
Since both of the user-based and item-based CF algorithms automatically normalize the data, we can directly use the ratings matrix data from last step above without having to normalize the data manually.
We will build this filtering system by splitting the dataset into 80% training set and 20% test set. 10 ratings per user will be given to the recommender to make predictions and the other ratings are held out for computing prediction accuracy.
evaluation <- evaluationScheme(ratings_matrix, method = "split", train = 0.8, given = 10)
evaluation
## Evaluation scheme with 10 items given
## Method: 'split' with 1 run(s).
## Training set proportion: 0.800
## Good ratings: NA
## Data set: 378 x 436 rating matrix of class 'realRatingMatrix' with 36214 ratings.
train <- getData(evaluation, "train")
train
## 302 x 436 rating matrix of class 'realRatingMatrix' with 28961 ratings.
test_known <- getData(evaluation, "known")
test_known
## 76 x 436 rating matrix of class 'realRatingMatrix' with 760 ratings.
test_unknown <- getData(evaluation, "unknown")
test_unknown
## 76 x 436 rating matrix of class 'realRatingMatrix' with 6493 ratings.
Create an IBCF recommender, using “Pearson” similarity measure and 50 most similar items.
# Create an item-based CF recommender using training data
rec_ib <- Recommender(data = train, method = "IBCF",
parameter = list(method = "pearson", k = 50))
# Create predictions for the test items using known ratings with type as ratings
pred_ib_acr <- predict(object = rec_ib, newdata = test_known, type = "ratings")
# Create predictions for the test items using known ratings with type as top n recommendation list
pred_ib_n <- predict(object = rec_ib, newdata = test_known, n = 5)
Top 5 recommendations for the first 5 users.
# Recommendations for the first 5 users.
first_5_users <- pred_ib_n@items[1:5] %>% data.frame()
colnames(first_5_users) <- c("user1", "user2", "user3", "user4", "user5")
first_5_users <- first_5_users %>% melt() %>%
rename(movieId = value) %>%
merge(movies, by = "movieId") %>%
rename(users = variable) %>%
select(users:title)
first_5_users <- first_5_users[order(first_5_users$users),]
first_5_users
## users title
## 1 user1 Toy Story (1995)
## 2 user1 Jumanji (1995)
## 4 user1 Nixon (1995)
## 7 user1 Powder (1995)
## 10 user1 Babe (1995)
## 5 user2 Nixon (1995)
## 12 user2 Usual Suspects, The (1995)
## 15 user2 Lamerica (1994)
## 18 user2 Misérables, Les (1995)
## 13 user3 Usual Suspects, The (1995)
## 14 user3 Mighty Aphrodite (1995)
## 17 user3 Fair Game (1995)
## 21 user3 Up Close and Personal (1996)
## 23 user3 Amazing Panda Adventure, The (1995)
## 3 user4 Heat (1995)
## 6 user4 Ace Ventura: When Nature Calls (1995)
## 9 user4 Dangerous Minds (1995)
## 11 user4 Seven (a.k.a. Se7en) (1995)
## 16 user4 Bio-Dome (1996)
## 8 user5 Powder (1995)
## 19 user5 Black Sheep (1996)
## 20 user5 Pie in the Sky (1996)
## 22 user5 Bad Boys (1995)
## 24 user5 Amazing Panda Adventure, The (1995)
Number of times each movie got recommended
# Define a matrix with the recommendations to the test set users
rec_matrix <- sapply(pred_ib_n@items, function(x){
colnames(ratings_matrix)[x]
})
# Define a vector with all recommendations
num_of_items <- factor(table(rec_matrix))
# Visualize the distribution of the number of items
qplot(num_of_items) + ggtitle("Distribution of the Number of Items")
Top 5 most recommended movies
# Top 5 most recommended movies
top5_rec_mv <- num_of_items %>% data.frame()
top5_rec_mv <- cbind(movieId = rownames(top5_rec_mv), top5_rec_mv)
rownames(top5_rec_mv) <- 1:nrow(top5_rec_mv)
colnames(top5_rec_mv)[2] <- "count"
top5_rec_mv <- top5_rec_mv %>%
mutate_if( is.factor, ~ as.integer(levels(.x))[.x]) %>%
top_n(count, n = 5) %>%
merge(movies, by = "movieId")
top5_rec_mv <- top5_rec_mv[order(top5_rec_mv$count, decreasing = TRUE),] %>%
select(title)
top5_rec_mv
## title
## 1 Toy Story (1995)
## 5 Braveheart (1995)
## 2 Jumanji (1995)
## 3 Grumpier Old Men (1995)
## 4 Sense and Sensibility (1995)
Create an UBCF recommender, using “Pearson” similarity measure and 50 nearest neighbors.
# Create an user-based CF recommender using training data
rec_ub <- Recommender(data = train, method = "UBCF",
parameter = list(method = "pearson", nn = 50))
# Create predictions for the test users using known ratings with type as ratings
pred_ub_acr <- predict(rec_ub, test_known, type = "ratings")
# Create predictions for the test users using known ratings with type as top n recommendation list
pred_ub_n <- predict(object = rec_ub, newdata = test_known, n = 5)
Top 5 recommendations for the first 5 users.
# Recommendations for the first 5 users
first_5_users <- pred_ub_n@items[1:5] %>% data.frame()
colnames(first_5_users) <- c("user1", "user2", "user3", "user4", "user5")
first_5_users <- first_5_users %>% melt() %>%
rename(movieId = value) %>%
merge(movies, by = "movieId") %>%
rename(users = variable) %>%
select(users:title)
first_5_users <- first_5_users[order(first_5_users$users),]
first_5_users
## users title
## 3 user1 To Die For (1995)
## 4 user1 Usual Suspects, The (1995)
## 8 user1 Big Green, The (1995)
## 18 user1 Birdcage, The (1996)
## 20 user1 Immortal Beloved (1994)
## 5 user2 Usual Suspects, The (1995)
## 9 user2 Big Green, The (1995)
## 14 user2 Dunston Checks In (1996)
## 21 user2 Immortal Beloved (1994)
## 24 user2 Shawshank Redemption, The (1994)
## 2 user3 To Die For (1995)
## 10 user3 Big Green, The (1995)
## 13 user3 Mr. Holland's Opus (1995)
## 19 user3 Birdcage, The (1996)
## 25 user3 Little Buddha (1993)
## 1 user4 Sense and Sensibility (1995)
## 7 user4 Usual Suspects, The (1995)
## 11 user4 Big Green, The (1995)
## 16 user4 Nick of Time (1995)
## 17 user4 If Lucy Fell (1996)
## 6 user5 Usual Suspects, The (1995)
## 12 user5 Big Green, The (1995)
## 15 user5 Dunston Checks In (1996)
## 22 user5 Love Affair (1994)
## 23 user5 Man of the House (1995)
Visualize the distribution of the number of items
# Define a matrix with the recommendations to the test set users
rec_matrix <- sapply(pred_ub_n@items, function(x){
colnames(ratings_matrix)[x]
})
# Define a vector with all recommendations
num_of_items <- factor(table(rec_matrix))
# Visualize the distribution of the number of items
qplot(num_of_items) + ggtitle("Distribution of the Number of Items")
Top 5 most recommended movies
# Top 5 most recommended movies
top5_rec_mv <- num_of_items %>% data.frame(stringsAsFactors = FALSE)
top5_rec_mv <- cbind(movieId = rownames(top5_rec_mv), top5_rec_mv)
rownames(top5_rec_mv) <- 1:nrow(top5_rec_mv)
colnames(top5_rec_mv)[2] <- "count"
top5_rec_mv <- top5_rec_mv %>%
mutate_if( is.factor, ~ as.integer(levels(.x))[.x]) %>%
top_n(count, n = 5) %>%
merge(movies, by = "movieId")
top5_rec_mv <- top5_rec_mv[order(top5_rec_mv$count, decreasing = TRUE),] %>%
select(title)
top5_rec_mv
## title
## 3 Shawshank Redemption, The (1994)
## 2 Pulp Fiction (1994)
## 1 Star Wars: Episode IV - A New Hope (1977)
## 4 Forrest Gump (1994)
## 5 Schindler's List (1993)
Compare predictions with true “unknown” ratings
# Compare predictions with true "unknown" ratings
as(test_unknown, "matrix")[1:8,1:5]
## 1 2 3 6 7
## [1,] 4.5 NA NA NA NA
## [2,] NA NA NA NA NA
## [3,] NA 3 NA NA NA
## [4,] NA NA NA NA NA
## [5,] 3.0 NA NA NA 1
## [6,] 5.0 NA NA NA NA
## [7,] 4.0 NA 3.5 4.5 NA
## [8,] NA NA NA NA NA
as(pred_ib_acr, "matrix")[1:8,1:5]
## 1 2 3 6 7
## [1,] 5.000000 5.000000 4.000000 4.000000 4.000000
## [2,] 3.339197 3.500000 3.000000 NA 3.502433
## [3,] 4.500000 NA NA 1.500000 4.500000
## [4,] NA 3.500000 NA 3.746954 2.473600
## [5,] NA NA NA NA 2.960024
## [6,] 4.678354 2.544514 4.672803 NA NA
## [7,] 3.206836 NA NA NA NA
## [8,] 4.500000 NA 4.000000 4.751435 4.000000
as(pred_ub_acr, "matrix")[1:8,1:5]
## 1 2 3 6 7
## [1,] 4.574934 4.487671 4.450626 4.501525 4.512176
## [2,] 3.288340 3.065994 3.096236 3.199075 3.135870
## [3,] 3.942737 3.720959 3.755558 3.822132 3.808681
## [4,] 2.984634 2.853417 2.857997 2.920167 2.877304
## [5,] 3.947475 3.923321 3.890224 4.015133 3.880585
## [6,] 4.268158 4.005265 4.047457 4.200309 4.023832
## [7,] 3.726804 3.635771 3.665291 3.865003 3.679749
## [8,] 4.271804 4.128269 4.189861 4.324460 4.202695
Evaluate the accuracy of User-Based CF and Item-Based CF recommender on unknown ratings.
# Evaluate Item-Based recommendations on unknown ratings
acr_ib <- calcPredictionAccuracy(pred_ib_acr, test_unknown)
# Evaluate User-Based recommendations on unknown ratings
acr_ub <- calcPredictionAccuracy(pred_ub_acr, test_unknown)
acr <- rbind(IBCF = acr_ib, UBCF = acr_ub)
acr
## RMSE MSE MAE
## IBCF 1.0853645 1.1780161 0.7898596
## UBCF 0.9026854 0.8148408 0.6828060
Let’s try another evaluation scheme with “Cross Validation” method and “Cosine” similarity measure.
# Setup the evaluation scheme
evaluation_2 <- evaluationScheme(ratings_matrix,
method = "cross",
k = 5,
train = 0.8,
given = 10,
goodRating = 5
)
evaluation_2
## Evaluation scheme with 10 items given
## Method: 'cross-validation' with 5 run(s).
## Good ratings: >=5.000000
## Data set: 378 x 436 rating matrix of class 'realRatingMatrix' with 36214 ratings.
# Set up list of algorithms
algorithms <- list(
"item-based CF" = list(name = "IBCF", parameter = list(method = "Cosine", k = 50)),
"user-based CF" = list(name = "UBCF", parameter = list(method = "Cosine", nn = 50))
)
# Estimate the models
results <- evaluate(evaluation_2,
algorithms,
type = "topNList",
n = c(1, 3, 5, 10, 15, 20)
)
## IBCF run fold/sample [model time/prediction time]
## 1 [1.41sec/0.11sec]
## 2 [1.39sec/0.08sec]
## 3 [2.16sec/0.06sec]
## 4 [1.51sec/0.11sec]
## 5 [1.44sec/0.09sec]
## UBCF run fold/sample [model time/prediction time]
## 1 [0sec/0.35sec]
## 2 [0.02sec/0.29sec]
## 3 [0.01sec/0.35sec]
## 4 [0.02sec/0.31sec]
## 5 [0.02sec/0.33sec]
results
## List of evaluation results for 2 recommenders:
## Evaluation results for 5 folds/samples using method 'IBCF'.
## Evaluation results for 5 folds/samples using method 'UBCF'.
# Create a function to get average of precision, recall, TPR, FPR
avg_cf_matrix <- function(results) {
avg <- results %>%
getConfusionMatrix() %>%
as.list()
as.data.frame( Reduce("+", avg) / length(avg)) %>%
mutate(n = c(1, 3, 5, 10, 15, 20)) %>%
select('n', 'precision', 'recall', 'TPR', 'FPR')
}
# Using map() to iterate the avg function across both models
results_tbl <- results %>% map(avg_cf_matrix) %>% enframe() %>% unnest()
## Warning: `cols` is now required when using unnest().
## Please use `cols = c(value)`
results_tbl
## # A tibble: 12 x 6
## name n precision recall TPR FPR
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 item-based CF 1 0.0256 0.00179 0.00179 0.00238
## 2 item-based CF 3 0.0282 0.00571 0.00571 0.00711
## 3 item-based CF 5 0.0364 0.0154 0.0154 0.0117
## 4 item-based CF 10 0.0382 0.0280 0.0280 0.0234
## 5 item-based CF 15 0.0395 0.0445 0.0445 0.0351
## 6 item-based CF 20 0.0387 0.0571 0.0571 0.0468
## 7 user-based CF 1 0.236 0.0235 0.0235 0.00185
## 8 user-based CF 3 0.178 0.0531 0.0531 0.00598
## 9 user-based CF 5 0.152 0.0690 0.0690 0.0103
## 10 user-based CF 10 0.127 0.103 0.103 0.0212
## 11 user-based CF 15 0.113 0.131 0.131 0.0323
## 12 user-based CF 20 0.103 0.148 0.148 0.0436
# Plot ROC curves for each model
results_tbl %>%
ggplot(aes(FPR, TPR, color = fct_reorder2(as.factor(name), FPR, TPR))) +
geom_line() +
geom_label(aes(label = n)) +
labs(title = "ROC Curves", color = "Model") +
theme_grey(base_size = 14)
# Plot Precision-Recall curves for each model
results_tbl %>%
ggplot(aes(recall, precision, color = fct_reorder2(as.factor(name), recall, precision))) +
geom_line() +
geom_label(aes(label = n)) +
labs(title = "Precision-Recall Curves", colour = "Model") +
theme_grey(base_size = 14)
From the evaluation results, user-based CF model is the clear winner in either methods. We can see its RMSE is lower than item-based CF model. Also, we can see clearly from ROC curves that the user-based CF model achieves higher TPR for any given level of FPR. This means that the user-based CF model is producing higher number of relevant recommendations (true positives) for the same level of non-relevant recommendations (false positives). This happens the same in Precision-Recall curves where user-based CF model has higher Recall for any given level of Precision. This means that it minimizes False Negatives for all level of False Positives. Furthermore, each method has a number of tuning parameters such as type of similarity, number of neighbors, number of latent factors, regularization parameters and so on. We can do further comparison by playing around with these parameters.
Even though collaborative filtering is the most popular branch of recommendation but it does have some limitations when dealing with new users or items. If the new user hasn’t seen any movie yet, neither of the two models is able to recommend any item. It’s the same thing if the new item hasn’t been purchased by anyone, it will never be recommended. To handle this cold start problem, as recommended from “Building a Recommendation System with R” book we should take account of other information such as user profiles and item descriptions into building our recommendation systems. This will lead to building a hybrid recommender system, combination of item-based and/or used-based with content-based filtering models, which usually give better results.
Gorakala, K.G. & Usuelli, M. (2015, Sept). Building a Recommendation System with R (pp. 50-92). Packt Publishing Ltd.
Hashler, M. & Vereet, B. (2019, Aug 27). Package ‘recommenderlab’. CRAN. Retrieved from https://cran.r-project.org/web/packages/recommenderlab/recommenderlab.pdf.