Research Approach: There are a few packages that are available in the R for the recommendation system,and the most commonly used is recommenderlab. I have used recommender lab throughout this course so far, and have performed below analysis on it: Content based filtering, User based filtering, and SVD. In this project, I am using all three methods to apply to the movie lense database, and to make comparison on their performances.

Background

Collaborative Filting

UserBased (UBCF) vs ItemBased (IBCF)

This gif illustrates the most commonly used recomendation system model: Collaborative Filting. Collaborative filtering can answer a question “What items do users with interests similar to yours like?

Figure

Figure

Figure

Figure

Singular Value Decomposition (SVD)

Figure

Figure

Figure

Figure

Dataset

The dataset i choose for this project is movieLense dataset. The dataset is already present in the recommenderlab package so we will be using that dataset and will explore it first before applying SVD (Singular Value Decompostion)

library(recommenderlab)
## Loading required package: Matrix
## Loading required package: arules
## 
## Attaching package: 'arules'
## The following objects are masked from 'package:base':
## 
##     abbreviate, write
## Loading required package: proxy
## 
## Attaching package: 'proxy'
## The following object is masked from 'package:Matrix':
## 
##     as.matrix
## The following objects are masked from 'package:stats':
## 
##     as.dist, dist
## The following object is masked from 'package:base':
## 
##     as.matrix
## Loading required package: registry
## Registered S3 methods overwritten by 'registry':
##   method               from 
##   print.registry_field proxy
##   print.registry_entry proxy
library(ggplot2)
library(tidyverse)
## -- Attaching packages -------------------------------------------------------------------------------------------------------------------- tidyverse 1.3.0 --
## v tibble  3.0.0     v dplyr   0.8.5
## v tidyr   1.0.2     v stringr 1.4.0
## v readr   1.3.1     v forcats 0.5.0
## v purrr   0.3.4
## -- Conflicts ----------------------------------------------------------------------------------------------------------------------- tidyverse_conflicts() --
## x tidyr::expand() masks Matrix::expand()
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
## x tidyr::pack()   masks Matrix::pack()
## x dplyr::recode() masks arules::recode()
## x tidyr::unpack() masks Matrix::unpack()
library(pander)

Data Exploration

Lets load the dataser first

data(MovieLense)
movielense <- MovieLense # Loading the movie datset
print(paste0("The dimensions of dataset : (Users x Movies)", nrow(movielense), " x ",ncol(movielense)))
## [1] "The dimensions of dataset : (Users x Movies)943 x 1664"

Lets see differnt ratings given by users.

Movie Ratings Histogram

mvector <- as.vector(movielense@data)
mvector <- mvector[mvector != 0] 
unique(mvector)
## [1] 5 4 3 1 2
mvector <- factor(mvector)

qplot(mvector,fill=I("blue"), col=I("red") ) + ggtitle("Histogram of Ratings") +
  xlab("Raings") + ylab("Count")

### Top Ten Movies

movie_watched <- data.frame(
    movie_name = names(colCounts(movielense)),
    watched_times = colCounts(movielense)
)
top_ten_movies <- movie_watched[order(movie_watched$watched_times, decreasing = TRUE), ][1:10, ] 
ggplot(top_ten_movies) + aes(x=movie_name, y=watched_times) + 
  geom_bar(stat = "identity",fill = "firebrick4", color = "dodgerblue2") + xlab("Movie Tile") + ylab("Count") +
  theme(axis.text = element_text(angle = 40, hjust = 1)) 

### Average Movie Rating Histogram

qplot(colMeans(movielense)) + stat_bin(binwidth =0.25,fill=I("blue"), col=I("red")) +
  xlim(0,5)+
  xlab("Average Rating") + ylab("Count") + 
ggtitle("Average Ratings Counts Histogram")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 2 rows containing missing values (geom_bar).

## Warning: Removed 2 rows containing missing values (geom_bar).

We would like to take a peak looking into the database now. Here is ratings of movies in the beginning and end the movie lense database, and we also peak what the movies the first user has rated.

library(kableExtra)
## 
## Attaching package: 'kableExtra'
## The following object is masked from 'package:dplyr':
## 
##     group_rows
## View Data as a 5 by 5 example
y<-as.matrix(movielense@data[1:10,1:10])
y %>% kable (caption ="DataExample") %>% kable_styling ("striped", full_width=TRUE)
DataExample
Toy Story (1995) GoldenEye (1995) Four Rooms (1995) Get Shorty (1995) Copycat (1995) Shanghai Triad (Yao a yao yao dao waipo qiao) (1995) Twelve Monkeys (1995) Babe (1995) Dead Man Walking (1995) Richard III (1995)
5 3 4 3 3 5 4 1 5 3
4 0 0 0 0 0 0 0 0 2
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
4 3 0 0 0 0 0 0 0 0
4 0 0 0 0 0 2 4 4 0
0 0 0 5 0 0 5 5 5 4
0 0 0 0 0 0 3 0 0 0
0 0 0 0 0 5 4 0 0 0
4 0 0 4 0 0 4 0 4 0
moviemeta <- MovieLenseMeta
pander(head(moviemeta),caption = "First few Rows within Movie Meta Data ")
First few Rows within Movie Meta Data (continued below)
title year
Toy Story (1995) 1995
GoldenEye (1995) 1995
Four Rooms (1995) 1995
Get Shorty (1995) 1995
Copycat (1995) 1995
Shanghai Triad (Yao a yao yao dao waipo qiao) (1995) 1995
Table continues below
url unknown
http://us.imdb.com/M/title-exact?Toy%20Story%20(1995) 0
http://us.imdb.com/M/title-exact?GoldenEye%20(1995) 0
http://us.imdb.com/M/title-exact?Four%20Rooms%20(1995) 0
http://us.imdb.com/M/title-exact?Get%20Shorty%20(1995) 0
http://us.imdb.com/M/title-exact?Copycat%20(1995) 0
http://us.imdb.com/Title?Yao+a+yao+yao+dao+waipo+qiao+(1995) 0
Table continues below
Action Adventure Animation Children’s Comedy Crime Documentary
0 0 1 1 1 0 0
1 1 0 0 0 0 0
0 0 0 0 0 0 0
1 0 0 0 1 0 0
0 0 0 0 0 1 0
0 0 0 0 0 0 0
Table continues below
Drama Fantasy Film-Noir Horror Musical Mystery Romance Sci-Fi
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0
Thriller War Western
0 0 0
1 0 0
1 0 0
0 0 0
1 0 0
0 0 0
pander(tail(moviemeta), caption = "Last few Rows within Movie Meta Data")
Last few Rows within Movie Meta Data (continued below)
  title year
1676 War at Home, The (1996) 1996
1677 Sweet Nothing (1995) 1996
1678 Mat’ i syn (1997) 1998
1679 B. Monkey (1998) 1998
1681 You So Crazy (1994) 1994
1682 Scream of Stone (Schrei aus Stein) (1991) 1996
Table continues below
  url
1676 http://us.imdb.com/M/title-exact?War%20at%20Home%2C%20The%20%281996%29
1677 http://us.imdb.com/M/title-exact?Sweet%20Nothing%20(1995)
1678 http://us.imdb.com/M/title-exact?Mat%27+i+syn+(1997)
1679 http://us.imdb.com/M/title-exact?B%2E+Monkey+(1998)
1681 http://us.imdb.com/M/title-exact?You%20So%20Crazy%20(1994)
1682 http://us.imdb.com/M/title-exact?Schrei%20aus%20Stein%20(1991)
Table continues below
  unknown Action Adventure Animation Children’s Comedy
1676 0 0 0 0 0 0
1677 0 0 0 0 0 0
1678 0 0 0 0 0 0
1679 0 0 0 0 0 0
1681 0 0 0 0 0 1
1682 0 0 0 0 0 0
Table continues below
  Crime Documentary Drama Fantasy Film-Noir Horror
1676 0 0 1 0 0 0
1677 0 0 1 0 0 0
1678 0 0 1 0 0 0
1679 0 0 0 0 0 0
1681 0 0 0 0 0 0
1682 0 0 1 0 0 0
  Musical Mystery Romance Sci-Fi Thriller War Western
1676 0 0 0 0 0 0 0
1677 0 0 0 0 0 0 0
1678 0 0 0 0 0 0 0
1679 0 0 1 0 1 0 0
1681 0 0 0 0 0 0 0
1682 0 0 0 0 0 0 0
## look at the first 8 ratings of the first user
head(as(movielense[1,], "list")[[1]], 8)
##                                     Toy Story (1995) 
##                                                    5 
##                                     GoldenEye (1995) 
##                                                    3 
##                                    Four Rooms (1995) 
##                                                    4 
##                                    Get Shorty (1995) 
##                                                    3 
##                                       Copycat (1995) 
##                                                    3 
## Shanghai Triad (Yao a yao yao dao waipo qiao) (1995) 
##                                                    5 
##                                Twelve Monkeys (1995) 
##                                                    4 
##                                          Babe (1995) 
##                                                    1
# look at the last 4 ratings of the first user
tail(as(movielense[1,], "list")[[1]], 4)
##   Full Monty, The (1997)           Gattaca (1997) Starship Troopers (1997) 
##                        5                        5                        2 
## Good Will Hunting (1997) 
##                        3
# Loading the metadata that gets loaded with main dataset

Data preparation

As the dataset is quite large so we need to cut the dataset bit smaller we do this by user which have rated at least 30 movies and movies which are rated by minimum 60 users.

movielense <- movielense [rowCounts(movielense) > 30, colCounts(movielense) > 60]
print(paste0("Number of Rows after filtering : ", nrow(movielense)))
## [1] "Number of Rows after filtering : 726"
print(paste0("Number of Columns after filtering : ", ncol(movielense)))
## [1] "Number of Columns after filtering : 529"

Training and Testing Data

set.seed(2020)#seed as year
n_folds <- 10
to_keep <- 15
threshold <- 3
e <- evaluationScheme(movielense, method="cross-validation",k = n_folds, train=0.8, given=to_keep,  goodRating=threshold)
print(e)
## Evaluation scheme with 15 items given
## Method: 'cross-validation' with 10 run(s).
## Good ratings: >=3.000000
## Data set: 726 x 529 rating matrix of class 'realRatingMatrix' with 74956 ratings.
training <- getData(e, "train")
known <- getData(e, "known")
unknown <- getData(e, "unknown")
print(paste0("Traing data has ", nrow(training)," rows"))
## [1] "Traing data has 648 rows"
print(paste0("Known Testing data has ", nrow(known)," rows"))
## [1] "Known Testing data has 78 rows"
print(paste0("Unknown Testing data has ", nrow(unknown)," rows"))
## [1] "Unknown Testing data has 78 rows"

Singular Value Decompostion

We choose 1st model as Singular Value Decompostion to train our model so that we can use it to recommend movies.

training_time <- system.time({
    model_svd <- Recommender(data = training, method = "SVD") 
})
print("Model training time : ")
## [1] "Model training time : "
print(training_time)
##    user  system elapsed 
##    0.06    0.00    0.06
print(model_svd)
## Recommender of type 'SVD' for 'realRatingMatrix' 
## learned using 648 users.

SVD Prediction

predicted_top_ten_movies_svd <- predict(object = model_svd, newdata = known, n = 10)
predicted_top_ten_movies_df_svd <- data.frame(users = sort(rep(1:length(predicted_top_ten_movies_svd@items), 
                                                          predicted_top_ten_movies_svd@n)), 
                                         ratings = unlist(predicted_top_ten_movies_svd@ratings),
                                         index = unlist(predicted_top_ten_movies_svd@items))
predicted_top_ten_movies_df_svd$title <- predicted_top_ten_movies_svd@itemLabels[predicted_top_ten_movies_df_svd$index]
predicted_top_ten_movies_df_svd$year <- MovieLenseMeta$
year[predicted_top_ten_movies_df_svd$index]
predicted_top_ten_movies_df_svd <- predicted_top_ten_movies_df_svd %>% group_by(users) %>% top_n(5,ratings)

predicted_top_ten_movies_df_svd[predicted_top_ten_movies_df_svd$users %in% (1:10), ]
## # A tibble: 50 x 5
## # Groups:   users [10]
##    users ratings index title                         year
##    <int>   <dbl> <int> <chr>                        <dbl>
##  1     1    3.51    43 Pulp Fiction (1994)           1994
##  2     1    3.44    79 Fargo (1996)                  1993
##  3     1    3.42   104 2001: A Space Odyssey (1968)  1996
##  4     1    3.40    37 Star Wars (1977)              1994
##  5     1    3.39   225 Leaving Las Vegas (1995)      1996
##  6     2    4.25    43 Pulp Fiction (1994)           1994
##  7     2    4.23   166 Back to the Future (1985)     1986
##  8     2    4.22   176 Field of Dreams (1989)        1986
##  9     2    4.22    19 Braveheart (1995)             1995
## 10     2    4.21   196 Jerry Maguire (1996)          1989
## # ... with 40 more rows

Acuracy Matrix SVD

svd_prediction <- predict(object = model_svd, newdata = known, n = 10, type = "ratings")
print("Acuracy Matrix SVD :")
## [1] "Acuracy Matrix SVD :"
print(calcPredictionAccuracy(x = svd_prediction, data = unknown, byUser = FALSE))
##      RMSE       MSE       MAE 
## 0.9986768 0.9973553 0.7918828

Item Based Collaborative Filtering (Cosine)

training_time <- system.time({
    model_ibcf_cosine <- Recommender(data = training, method = "IBCF", parameter = list(method = "Cosine"))
})
print("Model training time : ")
## [1] "Model training time : "
print(training_time)
##    user  system elapsed 
##    1.00    0.15    1.20
print(model_ibcf_cosine)
## Recommender of type 'IBCF' for 'realRatingMatrix' 
## learned using 648 users.

IBCF Prediction

predicted_top_ten_movies_ibcf_cosine <- predict(object = model_ibcf_cosine, newdata = known, n = 10)
predicted_top_ten_movies_df_ibcf_cosine <- data.frame(users = sort(rep(1:length(predicted_top_ten_movies_ibcf_cosine@items), 
                                                          predicted_top_ten_movies_ibcf_cosine@n)), 
                                         ratings = unlist(predicted_top_ten_movies_ibcf_cosine@ratings),
                                         index = unlist(predicted_top_ten_movies_ibcf_cosine@items))
predicted_top_ten_movies_df_ibcf_cosine$title <- predicted_top_ten_movies_ibcf_cosine@itemLabels[predicted_top_ten_movies_df_ibcf_cosine$index]
predicted_top_ten_movies_df_ibcf_cosine$year <- MovieLenseMeta$year[predicted_top_ten_movies_df_ibcf_cosine$index]
predicted_top_ten_movies_df_ibcf_cosine <- predicted_top_ten_movies_df_ibcf_cosine %>% group_by(users) %>% top_n(5,ratings)

predicted_top_ten_movies_df_ibcf_cosine[predicted_top_ten_movies_df_ibcf_cosine$users %in% (1:10), ]
## # A tibble: 93 x 5
## # Groups:   users [10]
##    users ratings index title                                year
##    <int>   <dbl> <int> <chr>                               <dbl>
##  1     1       5     3 Four Rooms (1995)                    1995
##  2     1       5    14 Mr. Holland's Opus (1995)            1994
##  3     1       5    29 Net, The (1995)                      1995
##  4     1       5    38 Legends of the Fall (1994)           1995
##  5     1       5    51 While You Were Sleeping (1995)       1994
##  6     1       5    59 Firm, The (1993)                     1994
##  7     1       5    67 Sleepless in Seattle (1993)          1994
##  8     1       5    80 Heavy Metal (1981)                   1993
##  9     1       5    85 Truth About Cats & Dogs, The (1996)  1994
## 10     1       5    88 Rock, The (1996)                     1993
## # ... with 83 more rows

Acuracy Matrix IBCF

ibcf_prediction <- predict(object = model_ibcf_cosine, newdata = known, n = 10, type = "ratings")
print("Acuracy Matrix IBCF :")
## [1] "Acuracy Matrix IBCF :"
print(calcPredictionAccuracy(x = ibcf_prediction, data = unknown, byUser = FALSE))
##     RMSE      MSE      MAE 
## 1.444221 2.085774 1.092323

User Based Collaborative Filtering (Cosine)

training_time <- system.time({
    model_ubcf_cosine <- Recommender(data = training, method = "UBCF", parameter = list(method = "Cosine"))
})
print("Model training time : ")
## [1] "Model training time : "
print(training_time)
##    user  system elapsed 
##       0       0       0
print(model_ubcf_cosine)
## Recommender of type 'UBCF' for 'realRatingMatrix' 
## learned using 648 users.

UBCF Prediction

predicted_top_ten_movies_ubcf_cosine <- predict(object = model_ubcf_cosine, newdata = known, n = 10)
predicted_top_ten_movies_df_ubcf_cosine <- data.frame(users = sort(rep(1:length(predicted_top_ten_movies_ubcf_cosine@items), 
                                                          predicted_top_ten_movies_ubcf_cosine@n)), 
                                         ratings = unlist(predicted_top_ten_movies_ubcf_cosine@ratings),
                                         index = unlist(predicted_top_ten_movies_ubcf_cosine@items))
predicted_top_ten_movies_df_ubcf_cosine$title <- predicted_top_ten_movies_ubcf_cosine@itemLabels[predicted_top_ten_movies_df_ubcf_cosine$index]
predicted_top_ten_movies_df_ubcf_cosine$year <- MovieLenseMeta$year[predicted_top_ten_movies_df_ubcf_cosine$index]
predicted_top_ten_movies_df_ubcf_cosine <- predicted_top_ten_movies_df_ubcf_cosine %>% group_by(users) %>% top_n(5,ratings)

predicted_top_ten_movies_df_ubcf_cosine[predicted_top_ten_movies_df_ubcf_cosine$users %in% (1:10), ]
## # A tibble: 50 x 5
## # Groups:   users [10]
##    users ratings index title                             year
##    <int>   <dbl> <int> <chr>                            <dbl>
##  1     1    3.78    79 Fargo (1996)                      1993
##  2     1    3.76    37 Star Wars (1977)                  1994
##  3     1    3.63    43 Pulp Fiction (1994)               1994
##  4     1    3.62   143 Return of the Jedi (1983)         1965
##  5     1    3.54   149 Godfather: Part II, The (1974)    1996
##  6     2    4.57    37 Star Wars (1977)                  1994
##  7     2    4.54    79 Fargo (1996)                      1993
##  8     2    4.48   143 Return of the Jedi (1983)         1965
##  9     2    4.46    77 Silence of the Lambs, The (1991)  1993
## 10     2    4.44   212 Contact (1997)                    1988
## # ... with 40 more rows

Acuracy Matrix UBCF

ubcf_prediction <- predict(object = model_ubcf_cosine, newdata = known, n = 10, type = "ratings")
print("Acuracy Matrix UBCF :")
## [1] "Acuracy Matrix UBCF :"
print(calcPredictionAccuracy(x = ubcf_prediction, data = unknown, byUser = FALSE))
##      RMSE       MSE       MAE 
## 0.9970938 0.9941960 0.7911661

Model Comparison

models_evaluation <- list( 
SVD = list(name = "SVD"),
IBCF = list(name = "IBCF", param = list(method = "cosine")),  
UBCF = list(name = "UBCF", param = list(method = "cosine"))
)
lerror <- evaluate(x = e, method = models_evaluation, type = "ratings")
## SVD run fold/sample [model time/prediction time]
##   1  [0.07sec/0.01sec] 
##   2  [0.22sec/0sec] 
##   3  [0.2sec/0.02sec] 
##   4  [0.21sec/0sec] 
##   5  [0.25sec/0.02sec] 
##   6  [0.05sec/0.02sec] 
##   7  [0.05sec/0.02sec] 
##   8  [0.06sec/0.01sec] 
##   9  [0.04sec/0.01sec] 
##   10  [0.04sec/0.02sec] 
## IBCF run fold/sample [model time/prediction time]
##   1  [1.04sec/0.02sec] 
##   2  [1.05sec/0.01sec] 
##   3  [1.08sec/0sec] 
##   4  [1.27sec/0.02sec] 
##   5  [0.96sec/0.19sec] 
##   6  [1.07sec/0.19sec] 
##   7  [1.08sec/0.03sec] 
##   8  [1.1sec/0.02sec] 
##   9  [1.18sec/0.02sec] 
##   10  [0.84sec/0sec] 
## UBCF run fold/sample [model time/prediction time]
##   1  [0.01sec/0.13sec] 
##   2  [0.01sec/0.13sec] 
##   3  [0sec/0.13sec] 
##   4  [0sec/0.14sec] 
##   5  [0.02sec/0.12sec] 
##   6  [0sec/0.15sec] 
##   7  [0sec/0.17sec] 
##   8  [0.02sec/0.12sec] 
##   9  [0sec/0.14sec] 
##   10  [0.02sec/0.3sec]
mdlcmp <- as.data.frame(sapply(avg(lerror), rbind))
cmpMdl <- as.data.frame(t(as.matrix(mdlcmp)))
colnames(cmpMdl) <- c("RMSE", "MSE", "MAE")
pander(cmpMdl, caption = "Model Comparison")
Model Comparison
  RMSE MSE MAE
SVD 1.037 1.077 0.8232
IBCF 1.469 2.162 1.116
UBCF 1.034 1.07 0.8197
rmse_ubcf<- calcPredictionAccuracy(x = ubcf_prediction, data = unknown, byUser = FALSE)
print (rmse_ubcf)
##      RMSE       MSE       MAE 
## 0.9970938 0.9941960 0.7911661
rmse_ibcf <- calcPredictionAccuracy(x = ibcf_prediction, data = unknown, byUser = FALSE)
print(rmse_ibcf)
##     RMSE      MSE      MAE 
## 1.444221 2.085774 1.092323
rmse_svd <- calcPredictionAccuracy(x = svd_prediction, data = unknown, byUser = FALSE)
print (rmse_svd)
##      RMSE       MSE       MAE 
## 0.9986768 0.9973553 0.7918828
library (ggplot2)
library(dplyr)
comparison = rbind(rmse_ibcf, rmse_ubcf, rmse_svd)
comparison = data.frame(comparison, row.names = NULL)
comparison = cbind(model =c('IBCF','UBCF','SVD'), comparison)

comparison %>% gather ('measure', 'value',-1) %>% 
  ggplot (aes (x=measure, y=value, fill=model)) +
  geom_bar (stat='identity', position=position_dodge())

Item based content filtering performs the worst, which has the biggest RMSE (root square mean standard deviation) value. Singular value decomposition and user based content filtering performs similar.

n_recommendations = c(1,3,5,8,10,15,20, 25)
results = evaluate (x=e, method = models_evaluation, n = n_recommendations)
## SVD run fold/sample [model time/prediction time]
##   1  [0.05sec/0.03sec] 
##   2  [0.06sec/0.02sec] 
##   3  [0.05sec/0.01sec] 
##   4  [0.05sec/0.03sec] 
##   5  [0.04sec/0.03sec] 
##   6  [0.04sec/0.03sec] 
##   7  [0.06sec/0.01sec] 
##   8  [0.04sec/0.04sec] 
##   9  [0.04sec/0.04sec] 
##   10  [0.05sec/0.03sec] 
## IBCF run fold/sample [model time/prediction time]
##   1  [0.91sec/0.01sec] 
##   2  [1sec/0.03sec] 
##   3  [0.84sec/0.02sec] 
##   4  [1.13sec/0.03sec] 
##   5  [1.12sec/0.02sec] 
##   6  [1sec/0.03sec] 
##   7  [1.15sec/0.03sec] 
##   8  [1.06sec/0.04sec] 
##   9  [1.07sec/0.03sec] 
##   10  [0.88sec/0.03sec] 
## UBCF run fold/sample [model time/prediction time]
##   1  [0sec/0.16sec] 
##   2  [0.02sec/0.15sec] 
##   3  [0sec/0.33sec] 
##   4  [0sec/0.15sec] 
##   5  [0sec/0.15sec] 
##   6  [0.01sec/0.14sec] 
##   7  [0sec/0.16sec] 
##   8  [0sec/0.14sec] 
##   9  [0sec/0.16sec] 
##   10  [0.02sec/0.14sec]
plot(results, y="ROC", annotate = 1, legend ="topleft")
title ("ROC Curve")

plot (results, y ='prec/rec', annotate=1)
title ("Precision-Recall")

The ROC (receiver operative curve) revales that singular value decomposition has the best area under the curve, followed by user based content filtering, while the item based content filtering has the worst area under curve. So is true with the precision-recall figure, with SVD ranks the best, and IBCF ranks the worst.

Conclusion

Singular value decomposition performes better than than the collaborative filterting family (UBCF and IBCF), in this movie setting. It is not surprising that below famous big tech all uses singular value decomposition as their recommendation system until very recently.

Figure

Figure