Figure
How can I recommend the top 10 movies to certain people (users), based on their history of preference, or on the experirnce of similar people like them, or some other mechanism?
There are a few packages that are available in the R for the recommendation system,and the most commonly used is recommenderlab. I have used recommender lab throughout and have performed below analysis on it: Content based filtering, User based filtering, and SVD. In this project, I am using all three machine learning recommendation methods to apply to the movie lense database, and to make comparison on their performances.
"The main task of a recommender system is to predict the users response to different options. GroupLens Research has collected and made available rating data sets from the MovieLens web site. A data set with 200,000 movie ratings history (by 1000 people who have rated 2000 movies) is downloaded and split into a train (80%) and a ‘unseen’ test (20%) set for evaluation.
The three models are evaluated in below parameters:root-mean-squared error (RMSE) and its aproximate family, the run time of machine learning, Reiver Operative Curve (ROC) with is a trade off summary of true positive value, true negative value, false positive value and false negative value (confusion matrix), as well as the balance of Precision versus Recall Error.
The singular value decomposition method is a winner!
library(caret)
## Loading required package: lattice
## Loading required package: ggplot2
library(hexbin)
UserBased (UBCF) vs ItemBased (IBCF)
This gif illustrates the most commonly used recomendation system model: Collaborative Filting. Collaborative filtering can answer a question “What items do users with interests similar to yours like?
Figure
Figure
Figure
SVD w Parameters
{ width=30% }
{ width=30%}
UBCF
UserBased Content Filtering Cosion
Matrix Factorization
library(hexbin)
library (knitr)
Singular Value Decomposition begins by breaking an \(M\) by \(N\) matrix \(A\) (in this case \(M\) users and \(N\) jokes) into the product of three matrices: \(U\), which is \(M\) by \(M\), \(\Sigma\), which is \(M\) by \(N\), and \(V^T\), which is \(N\) by \(N\):
\[A = U \ \Sigma \ V^T\]
The Residual Mean Square Error (RMSE) is the error function to that will measure accuracy and quantify the typical error we make when predicting the movie rating. RMSE defined;
\[ RMSE = \sqrt{\frac{1}{N}\displaystyle\sum_{u,i} (\hat{y}_{u,i}-y_{u,i})^{2}} \]
The dataset i choose for this project is movieLense dataset. The dataset is already present in the recommenderlab package so we will be using that dataset and will explore it first before applying SVD (Singular Value Decompostion)
library(recommenderlab)
## Loading required package: Matrix
## Loading required package: arules
##
## Attaching package: 'arules'
## The following objects are masked from 'package:base':
##
## abbreviate, write
## Loading required package: proxy
##
## Attaching package: 'proxy'
## The following object is masked from 'package:Matrix':
##
## as.matrix
## The following objects are masked from 'package:stats':
##
## as.dist, dist
## The following object is masked from 'package:base':
##
## as.matrix
## Loading required package: registry
## Registered S3 methods overwritten by 'registry':
## method from
## print.registry_field proxy
## print.registry_entry proxy
##
## Attaching package: 'recommenderlab'
## The following objects are masked from 'package:caret':
##
## MAE, RMSE
library(ggplot2)
library(tidyverse)
## -- Attaching packages --------------------------------------------------------------------------------------------------------------------------- tidyverse 1.3.0 --
## v tibble 3.0.0 v dplyr 0.8.5
## v tidyr 1.0.2 v stringr 1.4.0
## v readr 1.3.1 v forcats 0.5.0
## v purrr 0.3.4
## -- Conflicts ------------------------------------------------------------------------------------------------------------------------------ tidyverse_conflicts() --
## x tidyr::expand() masks Matrix::expand()
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
## x purrr::lift() masks caret::lift()
## x tidyr::pack() masks Matrix::pack()
## x dplyr::recode() masks arules::recode()
## x tidyr::unpack() masks Matrix::unpack()
library(pander)
Lets load the dataser first
data(MovieLense)
movielense <- MovieLense # Loading the movie datset
movielense
## 943 x 1664 rating matrix of class 'realRatingMatrix' with 99392 ratings.
class (movielense)
## [1] "realRatingMatrix"
## attr(,"package")
## [1] "recommenderlab"
slotNames(movielense)
## [1] "data" "normalize"
print(paste0("The dimensions of dataset : (Users x Movies)", nrow(movielense), " x ",ncol(movielense)))
## [1] "The dimensions of dataset : (Users x Movies)943 x 1664"
print('maximum Times that a movie is Rated is:')
## [1] "maximum Times that a movie is Rated is:"
max(movielense@data@i)
## [1] 942
print('The first 6 movies in this dataset is:')
## [1] "The first 6 movies in this dataset is:"
head(names(colCounts(movielense)))
## [1] "Toy Story (1995)"
## [2] "GoldenEye (1995)"
## [3] "Four Rooms (1995)"
## [4] "Get Shorty (1995)"
## [5] "Copycat (1995)"
## [6] "Shanghai Triad (Yao a yao yao dao waipo qiao) (1995)"
movMat<-as(movielense,'matrix')
class(movMat)
## [1] "matrix"
print ('Number of missing Rating is:')
## [1] "Number of missing Rating is:"
prod(dim(movMat)) -sum(is.na(movMat))
## [1] 99392
We would like to take a peak looking into the database now. Here is ratings of movies in the beginning and end the movie lense database, and we also peak what the movies the first user has rated.
library(kableExtra)
##
## Attaching package: 'kableExtra'
## The following object is masked from 'package:dplyr':
##
## group_rows
## View Data as a 5 by 5 example
y<-as.matrix(movielense@data[1:10,1:10])
y %>% kable (caption ="DataExample") %>% kable_styling ("striped", full_width=TRUE)
Toy Story (1995) | GoldenEye (1995) | Four Rooms (1995) | Get Shorty (1995) | Copycat (1995) | Shanghai Triad (Yao a yao yao dao waipo qiao) (1995) | Twelve Monkeys (1995) | Babe (1995) | Dead Man Walking (1995) | Richard III (1995) |
---|---|---|---|---|---|---|---|---|---|
5 | 3 | 4 | 3 | 3 | 5 | 4 | 1 | 5 | 3 |
4 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 |
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
4 | 3 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
4 | 0 | 0 | 0 | 0 | 0 | 2 | 4 | 4 | 0 |
0 | 0 | 0 | 5 | 0 | 0 | 5 | 5 | 5 | 4 |
0 | 0 | 0 | 0 | 0 | 0 | 3 | 0 | 0 | 0 |
0 | 0 | 0 | 0 | 0 | 5 | 4 | 0 | 0 | 0 |
4 | 0 | 0 | 4 | 0 | 0 | 4 | 0 | 4 | 0 |
# # look at the first 3 ratings of the first user
head(as(movielense[1,], "list")[[1]], 3)
## Toy Story (1995) GoldenEye (1995) Four Rooms (1995)
## 5 3 4
## look at the last 4 ratings of the no 16th user
# tail(as(movielense[1,], "list")[[16]], 4)
## lets look at user number168
mov_rated168 <- as.data.frame(movielense@data[c("168"),])
# print(mov_rated168)
dim(mov_rated168)
## [1] 1664 1
tail(mov_rated168) ## last 6 movies No168 user rated
## movielense@data[c("168"), ]
## War at Home, The (1996) 0
## Sweet Nothing (1995) 0
## Mat' i syn (1997) 0
## B. Monkey (1998) 0
## You So Crazy (1994) 0
## Scream of Stone (Schrei aus Stein) (1991) 0
# # Loading the metadata that gets loaded with main dataset
moviemeta <- MovieLenseMeta
class(moviemeta)
## [1] "data.frame"
colnames(moviemeta)
## [1] "title" "year" "url" "unknown" "Action"
## [6] "Adventure" "Animation" "Children's" "Comedy" "Crime"
## [11] "Documentary" "Drama" "Fantasy" "Film-Noir" "Horror"
## [16] "Musical" "Mystery" "Romance" "Sci-Fi" "Thriller"
## [21] "War" "Western"
# rownames(moviemeta)
dim(moviemeta)
## [1] 1664 22
pander(head(moviemeta,2),caption = "First few Rows within Movie Meta Data ")
title | year |
---|---|
Toy Story (1995) | 1995 |
GoldenEye (1995) | 1995 |
url | unknown | Action |
---|---|---|
http://us.imdb.com/M/title-exact?Toy%20Story%20(1995) | 0 | 0 |
http://us.imdb.com/M/title-exact?GoldenEye%20(1995) | 0 | 1 |
Adventure | Animation | Children’s | Comedy | Crime | Documentary | Drama |
---|---|---|---|---|---|---|
0 | 1 | 1 | 1 | 0 | 0 | 0 |
1 | 0 | 0 | 0 | 0 | 0 | 0 |
Fantasy | Film-Noir | Horror | Musical | Mystery | Romance | Sci-Fi | Thriller |
---|---|---|---|---|---|---|---|
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
War | Western |
---|---|
0 | 0 |
0 | 0 |
# pander(tail(moviemeta), caption = "Last few Rows within Movie Meta Data")
movie_watched <- data.frame(
movie_name = names(colCounts(movielense)),
watched_times = colCounts(movielense)
)
top_ten_movies <- movie_watched[order(movie_watched$watched_times, decreasing = TRUE), ][1:10, ]
ggplot(top_ten_movies) + aes(x=movie_name, y=watched_times) +
geom_bar(stat = "identity",fill = "firebrick4", color = "dodgerblue2") + xlab("Movie Tile") + ylab("Count") +
theme(axis.text = element_text(angle = 40, hjust = 1))
Lets see differnt ratings given by users.
We assume that the ratings of 0 is by users mistake, and therefore is excluding it from our data analysis, to prevent it from skewing.
qplot(colMeans(movielense)) + stat_bin(bins=20, fill=I("blue"), col=I("red")) +
xlim(0,5)+
xlab("AVERAGE RATING") + ylab("COUNTS") +
ggtitle("AVERAGE RATINGS COUNT")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 2 rows containing missing values (geom_bar).
## Warning: Removed 2 rows containing missing values (geom_bar).
mvector_raw<-as.vector(movielense@data)
mvector_raw<-factor(mvector_raw)
qplot(mvector_raw,fill=I("blue"), col=I("red") ) + ggtitle("RATINGS RAW COUNT") +
xlab("RATINGS") + ylab("COUNT")
mvector <- as.vector(movielense@data)
mvector <- mvector[mvector != 0]
unique(mvector)
## [1] 5 4 3 1 2
mvector <- factor(mvector)
qplot(mvector,fill=I("blue"), col=I("red") ) + ggtitle("RATINGS RAW COUNT EXCLUDING 0") +
xlab("RATINGS") + ylab("COUNT")
As the dataset is quite large so we need to cut the dataset bit smaller we do this by user which have rated at least 30 movies and movies which are rated by minimum 60 users.
movielense <- movielense [rowCounts(movielense) > 30, colCounts(movielense) > 60]
print(paste0("Number of Rows, uers, after filtering : ", nrow(movielense)))
## [1] "Number of Rows, uers, after filtering : 726"
print(paste0("Number of Columns, items, after filtering : ", ncol(movielense)))
## [1] "Number of Columns, items, after filtering : 529"
We split the data into 80% training, 20% testing. We choose the cutoff rating as 3 and above as the good ratings, with given of 15 items, and 10 runs.
set.seed(2020)#seed as year
n_folds <- 10 ## 10 iterations of run
to_keep <- 15 ## given 15 items
threshold <- 3 ## ratings above 3 as the cutoff point
e <- evaluationScheme(movielense, method="cross-validation",k = n_folds, train=0.8, given=to_keep, goodRating=threshold)
print(e)
## Evaluation scheme with 15 items given
## Method: 'cross-validation' with 10 run(s).
## Good ratings: >=3.000000
## Data set: 726 x 529 rating matrix of class 'realRatingMatrix' with 74956 ratings.
training <- getData(e, "train")
known <- getData(e, "known")
unknown <- getData(e, "unknown")
print(paste0("Traing data has ", nrow(training)," rows, users"))
## [1] "Traing data has 648 rows, users"
print(paste0("Known Testing data has ", nrow(known)," rows, users"))
## [1] "Known Testing data has 78 rows, users"
# print(paste0("Unknown Testing data has ", nrow(unknown)," rows, users"))
We choose 1st model as Singular Value Decompostion to train our model so that we can use it to recommend movies.
training_time <- system.time({
model_svd <- Recommender(data = training, method = "SVD") })
print("Model training time : ")
## [1] "Model training time : "
print(training_time)
## user system elapsed
## 0.04 0.01 0.06
print(model_svd)
## Recommender of type 'SVD' for 'realRatingMatrix'
## learned using 648 users.
predicted_top_ten_movies_svd <- predict(object = model_svd, newdata = known, n = 10) ## top10 movie recommendations
predicted_top_ten_movies_df_svd <- data.frame(users = sort(rep(1:length(predicted_top_ten_movies_svd@items),
predicted_top_ten_movies_svd@n)),
ratings = unlist(predicted_top_ten_movies_svd@ratings),
index = unlist(predicted_top_ten_movies_svd@items))
predicted_top_ten_movies_df_svd$title <- predicted_top_ten_movies_svd@itemLabels[predicted_top_ten_movies_df_svd$index]
predicted_top_ten_movies_df_svd$year <- MovieLenseMeta$year[predicted_top_ten_movies_df_svd$index]
predicted_top_ten_movies_df_svd <- predicted_top_ten_movies_df_svd %>% group_by(users) %>% top_n(4,ratings) ## what does this stands for?? ANswe: it stands for top n 4 movies to display, among the topxx movies recommended
predicted_top_ten_movies_df_svd[predicted_top_ten_movies_df_svd$users %in% (1:2), ] ## first 2 users
## # A tibble: 8 x 5
## # Groups: users [2]
## users ratings index title year
## <int> <dbl> <int> <chr> <dbl>
## 1 1 3.51 43 Pulp Fiction (1994) 1994
## 2 1 3.44 79 Fargo (1996) 1993
## 3 1 3.42 104 2001: A Space Odyssey (1968) 1996
## 4 1 3.40 37 Star Wars (1977) 1994
## 5 2 4.25 43 Pulp Fiction (1994) 1994
## 6 2 4.23 166 Back to the Future (1985) 1986
## 7 2 4.22 176 Field of Dreams (1989) 1986
## 8 2 4.22 19 Braveheart (1995) 1995
svd_prediction <- predict(object = model_svd, newdata = known, n = 10, type = "ratings")
print("Acuracy Matrix SVD :")
## [1] "Acuracy Matrix SVD :"
print(calcPredictionAccuracy(x = svd_prediction, data = unknown, byUser = FALSE))
## RMSE MSE MAE
## 0.9986768 0.9973553 0.7918828
training_time <- system.time({
model_ibcf_cosine <- Recommender(data = training, method = "IBCF", parameter = list(method = "Cosine"))
})
print("Model training time : ")
## [1] "Model training time : "
print(training_time)
## user system elapsed
## 0.84 0.08 0.97
print(model_ibcf_cosine)
## Recommender of type 'IBCF' for 'realRatingMatrix'
## learned using 648 users.
predicted_top_ten_movies_ibcf_cosine <- predict(object = model_ibcf_cosine, newdata = known, n = 10)## top 10 Moviews
predicted_top_ten_movies_df_ibcf_cosine <- data.frame(users = sort(rep(1:length(predicted_top_ten_movies_ibcf_cosine@items),
predicted_top_ten_movies_ibcf_cosine@n)),
ratings = unlist(predicted_top_ten_movies_ibcf_cosine@ratings),
index = unlist(predicted_top_ten_movies_ibcf_cosine@items))
predicted_top_ten_movies_df_ibcf_cosine$title <- predicted_top_ten_movies_ibcf_cosine@itemLabels[predicted_top_ten_movies_df_ibcf_cosine$index]
predicted_top_ten_movies_df_ibcf_cosine$year <- MovieLenseMeta$year[predicted_top_ten_movies_df_ibcf_cosine$index]
predicted_top_ten_movies_df_ibcf_cosine <- predicted_top_ten_movies_df_ibcf_cosine %>% group_by(users) %>% top_n(10,ratings) # to display the first 4 movies, among the top 10 movies recommended , ??????? WHY RESULTS BELOW, do not understand
predicted_top_ten_movies_df_ibcf_cosine[predicted_top_ten_movies_df_ibcf_cosine$users %in% (1:2), ] ## first 2 users
## # A tibble: 20 x 5
## # Groups: users [2]
## users ratings index title year
## <int> <dbl> <int> <chr> <dbl>
## 1 1 5 3 Four Rooms (1995) 1995
## 2 1 5 14 Mr. Holland's Opus (1995) 1994
## 3 1 5 29 Net, The (1995) 1995
## 4 1 5 38 Legends of the Fall (1994) 1995
## 5 1 5 51 While You Were Sleeping (1995) 1994
## 6 1 5 59 Firm, The (1993) 1994
## 7 1 5 67 Sleepless in Seattle (1993) 1994
## 8 1 5 80 Heavy Metal (1981) 1993
## 9 1 5 85 Truth About Cats & Dogs, The (1996) 1994
## 10 1 5 88 Rock, The (1996) 1993
## 11 2 5 1 Toy Story (1995) 1995
## 12 2 5 24 Apollo 13 (1995) 1996
## 13 2 5 37 Star Wars (1977) 1994
## 14 2 5 134 Empire Strikes Back, The (1980) 1941
## 15 2 5 171 Indiana Jones and the Last Crusade (1989) 1991
## 16 2 5 176 Field of Dreams (1989) 1986
## 17 2 5 202 Jungle2Jungle (1997) 1993
## 18 2 4.50 166 Back to the Future (1985) 1986
## 19 2 4.49 490 Space Jam (1996) 1934
## 20 2 4.02 19 Braveheart (1995) 1995
ibcf_prediction <- predict(object = model_ibcf_cosine, newdata = known, n = 10, type = "ratings")
print("Acuracy Matrix IBCF :")
## [1] "Acuracy Matrix IBCF :"
print(calcPredictionAccuracy(x = ibcf_prediction, data = unknown, byUser = FALSE))
## RMSE MSE MAE
## 1.444221 2.085774 1.092323
training_time <- system.time({
model_ubcf_cosine <- Recommender(data = training, method = "UBCF", parameter = list(method = "Cosine"))
})
print("Model training time : ")
## [1] "Model training time : "
print(training_time)
## user system elapsed
## 0.01 0.00 0.01
print(model_ubcf_cosine)
## Recommender of type 'UBCF' for 'realRatingMatrix'
## learned using 648 users.
predicted_top_ten_movies_ubcf_cosine <- predict(object = model_ubcf_cosine, newdata = known, n = 10) #top10
predicted_top_ten_movies_df_ubcf_cosine <- data.frame(users = sort(rep(1:length(predicted_top_ten_movies_ubcf_cosine@items),
predicted_top_ten_movies_ubcf_cosine@n)),
ratings = unlist(predicted_top_ten_movies_ubcf_cosine@ratings),
index = unlist(predicted_top_ten_movies_ubcf_cosine@items))
predicted_top_ten_movies_df_ubcf_cosine$title <- predicted_top_ten_movies_ubcf_cosine@itemLabels[predicted_top_ten_movies_df_ubcf_cosine$index]
predicted_top_ten_movies_df_ubcf_cosine$year <- MovieLenseMeta$year[predicted_top_ten_movies_df_ubcf_cosine$index]
predicted_top_ten_movies_df_ubcf_cosine <- predicted_top_ten_movies_df_ubcf_cosine %>% group_by(users) %>% top_n(4,ratings) ## display 4 of the top10 ratings to save space
predicted_top_ten_movies_df_ubcf_cosine[predicted_top_ten_movies_df_ubcf_cosine$users %in% (1:2), ] ## first 2 users
## # A tibble: 8 x 5
## # Groups: users [2]
## users ratings index title year
## <int> <dbl> <int> <chr> <dbl>
## 1 1 3.78 79 Fargo (1996) 1993
## 2 1 3.76 37 Star Wars (1977) 1994
## 3 1 3.63 43 Pulp Fiction (1994) 1994
## 4 1 3.62 143 Return of the Jedi (1983) 1965
## 5 2 4.57 37 Star Wars (1977) 1994
## 6 2 4.54 79 Fargo (1996) 1993
## 7 2 4.48 143 Return of the Jedi (1983) 1965
## 8 2 4.46 77 Silence of the Lambs, The (1991) 1993
ubcf_prediction <- predict(object = model_ubcf_cosine, newdata = known, n = 10, type = "ratings")
print("Acuracy Matrix UBCF :")
## [1] "Acuracy Matrix UBCF :"
print(calcPredictionAccuracy(x = ubcf_prediction, data = unknown, byUser = FALSE))
## RMSE MSE MAE
## 0.9970938 0.9941960 0.7911661
models_evaluation <- list(
SVD = list(name = "SVD"),
IBCF = list(name = "IBCF", param = list(method = "cosine")),
UBCF = list(name = "UBCF", param = list(method = "cosine"))
)
lerror <- evaluate(x = e, method = models_evaluation, type = "ratings")
## SVD run fold/sample [model time/prediction time]
## 1 [0.03sec/0.02sec]
## 2 [0.05sec/0.01sec]
## 3 [0.05sec/0.01sec]
## 4 [0.05sec/0.03sec]
## 5 [0.07sec/0sec]
## 6 [0.06sec/0.02sec]
## 7 [0.06sec/0.02sec]
## 8 [0.06sec/0sec]
## 9 [0.27sec/0.01sec]
## 10 [0.06sec/0sec]
## IBCF run fold/sample [model time/prediction time]
## 1 [1.05sec/0.01sec]
## 2 [0.87sec/0.03sec]
## 3 [0.91sec/0sec]
## 4 [0.83sec/0.02sec]
## 5 [0.85sec/0.01sec]
## 6 [0.92sec/0.03sec]
## 7 [1.36sec/0.01sec]
## 8 [0.87sec/0.02sec]
## 9 [0.83sec/0.01sec]
## 10 [1sec/0.02sec]
## UBCF run fold/sample [model time/prediction time]
## 1 [0.02sec/0.14sec]
## 2 [0sec/0.17sec]
## 3 [0sec/0.16sec]
## 4 [0.01sec/0.13sec]
## 5 [0sec/0.16sec]
## 6 [0.01sec/0.13sec]
## 7 [0sec/0.36sec]
## 8 [0sec/0.18sec]
## 9 [0sec/0.25sec]
## 10 [0sec/0.26sec]
mdlcmp <- as.data.frame(sapply(avg(lerror), rbind))
Although there are some actual difference in the run time, all three models run within reasonable timeframe, therefore, run time should not be our concern in this particular situlation and should not be a factor in our judgement. Our final model selection should be based on the acuracy and other evaluators.
cmpMdl <- as.data.frame(t(as.matrix(mdlcmp)))
colnames(cmpMdl) <- c("RMSE", "MSE", "MAE")
pander(cmpMdl, caption = "Model Comparison")
 | RMSE | MSE | MAE |
---|---|---|---|
SVD | 1.037 | 1.077 | 0.8232 |
IBCF | 1.469 | 2.162 | 1.116 |
UBCF | 1.034 | 1.07 | 0.8197 |
rmse_ubcf<- calcPredictionAccuracy(x = ubcf_prediction, data = unknown, byUser = FALSE)
rmse_ibcf <- calcPredictionAccuracy(x = ibcf_prediction, data = unknown, byUser = FALSE)
rmse_svd <- calcPredictionAccuracy(x = svd_prediction, data = unknown, byUser = FALSE)
comparison = rbind(rmse_ibcf, rmse_ubcf, rmse_svd)
comparison = data.frame(comparison, row.names = NULL)
comparison = cbind(model =c('IBCF','UBCF','SVD'), comparison)
comparison %>% gather ('measure', 'value',-1) %>%
ggplot (aes (x=measure, y=value, fill=model)) +
geom_bar (stat='identity', position=position_dodge())
Item based content filtering performs the worst, which has the biggest RMSE (root square mean standard deviation) value. Singular value decomposition and user based content filtering performs similar.
n_recommendations = c(1,3,5,8,10,15,20, 25)
results = evaluate (x=e, method = models_evaluation, n = n_recommendations)
## SVD run fold/sample [model time/prediction time]
## 1 [0.06sec/0.03sec]
## 2 [0.07sec/0.01sec]
## 3 [0.06sec/0.01sec]
## 4 [0.05sec/0.01sec]
## 5 [0.27sec/0.01sec]
## 6 [0.05sec/0.03sec]
## 7 [0.05sec/0.03sec]
## 8 [0.05sec/0.03sec]
## 9 [0.05sec/0.03sec]
## 10 [0.05sec/0.05sec]
## IBCF run fold/sample [model time/prediction time]
## 1 [0.87sec/0.02sec]
## 2 [0.84sec/0.02sec]
## 3 [0.79sec/0.02sec]
## 4 [0.9sec/0.03sec]
## 5 [0.9sec/0.02sec]
## 6 [0.88sec/0.01sec]
## 7 [0.89sec/0.02sec]
## 8 [0.93sec/0.02sec]
## 9 [0.95sec/0.05sec]
## 10 [1.08sec/0.03sec]
## UBCF run fold/sample [model time/prediction time]
## 1 [0sec/0.15sec]
## 2 [0.02sec/0.16sec]
## 3 [0sec/0.15sec]
## 4 [0.02sec/0.15sec]
## 5 [0sec/0.16sec]
## 6 [0sec/0.16sec]
## 7 [0sec/0.14sec]
## 8 [0sec/0.16sec]
## 9 [0sec/0.14sec]
## 10 [0sec/0.16sec]
plot(results, y="ROC", annotate = 1, legend ="topleft")
title ("ROC Curve")
plot (results, y ='prec/rec', annotate=1)
title ("Precision-Recall")
The ROC (receiver operative curve) revales that singular value decomposition has the best area under the curve, followed by user based content filtering, while the item based content filtering has the worst area under curve. So is true with the precision-recall figure, with SVD ranks the best, and IBCF ranks the worst.
Singular value decomposition performes better than than the collaborative filterting family (UBCF and IBCF), in this movie setting. It is not surprising that below famous big tech all uses singular value decomposition as their recommendation system until very recently.
Figure