MovieLense Recommendation Overview2

Research Approach: There are a few packages that are available in the R for the recommendation system,and the most commonly used is recommenderlab. I have used recommender lab throughout this course so far, and have performed below analysis on it: Content based filtering, User based filtering, and SVD. In this project, I am using all three methods to apply to the movie lense database, and to make comparison on their performances.

Background

Collaborative Filting

UserBased (UBCF) vs ItemBased (IBCF)

This gif illustrates the most commonly used recomendation system model: Collaborative Filting. Collaborative filtering can answer a question “What items do users with interests similar to yours like?

Figure

Singular Value Decomposition (SVD)

Figure

Dataset

The dataset i choose for this project is movieLense dataset. The dataset is already present in the recommenderlab package so we will be using that dataset and will explore it first before applying SVD (Singular Value Decompostion)

library(recommenderlab)

## Loading required package: Matrix

## Loading required package: arules

## 
## Attaching package: 'arules'

## The following objects are masked from 'package:base':
## 
##     abbreviate, write

## Loading required package: proxy

## 
## Attaching package: 'proxy'

## The following object is masked from 'package:Matrix':
## 
##     as.matrix

## The following objects are masked from 'package:stats':
## 
##     as.dist, dist

## The following object is masked from 'package:base':
## 
##     as.matrix

## Loading required package: registry

## Registered S3 methods overwritten by 'registry':
##   method               from 
##   print.registry_field proxy
##   print.registry_entry proxy

library(ggplot2)
library(tidyverse)

## -- Attaching packages -------------------------------------------------------------------------------------------------------------------- tidyverse 1.3.0 --

## v tibble  3.0.0     v dplyr   0.8.5
## v tidyr   1.0.2     v stringr 1.4.0
## v readr   1.3.1     v forcats 0.5.0
## v purrr   0.3.4

## -- Conflicts ----------------------------------------------------------------------------------------------------------------------- tidyverse_conflicts() --
## x tidyr::expand() masks Matrix::expand()
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
## x tidyr::pack()   masks Matrix::pack()
## x dplyr::recode() masks arules::recode()
## x tidyr::unpack() masks Matrix::unpack()

library(pander)

Data Exploration

Lets load the dataser first

data(MovieLense)
movielense <- MovieLense # Loading the movie datset
print(paste0("The dimensions of dataset : (Users x Movies)", nrow(movielense), " x ",ncol(movielense)))

## [1] "The dimensions of dataset : (Users x Movies)943 x 1664"

Lets see differnt ratings given by users.

Movie Ratings Histogram

mvector <- as.vector(movielense@data)
mvector <- mvector[mvector != 0] 
unique(mvector)

## [1] 5 4 3 1 2

mvector <- factor(mvector)

qplot(mvector,fill=I("blue"), col=I("red") ) + ggtitle("Histogram of Ratings") +
  xlab("Raings") + ylab("Count")

### Top Ten Movies

movie_watched <- data.frame(
    movie_name = names(colCounts(movielense)),
    watched_times = colCounts(movielense)
)
top_ten_movies <- movie_watched[order(movie_watched$watched_times, decreasing = TRUE), ][1:10, ] 
ggplot(top_ten_movies) + aes(x=movie_name, y=watched_times) + 
  geom_bar(stat = "identity",fill = "firebrick4", color = "dodgerblue2") + xlab("Movie Tile") + ylab("Count") +
  theme(axis.text = element_text(angle = 40, hjust = 1))

### Average Movie Rating Histogram

qplot(colMeans(movielense)) + stat_bin(binwidth =0.25,fill=I("blue"), col=I("red")) +
  xlim(0,5)+
  xlab("Average Rating") + ylab("Count") + 
ggtitle("Average Ratings Counts Histogram")

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

## Warning: Removed 2 rows containing missing values (geom_bar).

## Warning: Removed 2 rows containing missing values (geom_bar).

We would like to take a peak looking into the database now. Here is ratings of movies in the beginning and end the movie lense database, and we also peak what the movies the first user has rated.

library(kableExtra)

## 
## Attaching package: 'kableExtra'

## The following object is masked from 'package:dplyr':
## 
##     group_rows

## View Data as a 5 by 5 example
y<-as.matrix(movielense@data[1:10,1:10])
y %>% kable (caption ="DataExample") %>% kable_styling ("striped", full_width=TRUE)

DataExample
Toy Story (1995)	GoldenEye (1995)	Four Rooms (1995)	Get Shorty (1995)	Copycat (1995)	Shanghai Triad (Yao a yao yao dao waipo qiao) (1995)	Twelve Monkeys (1995)	Babe (1995)	Dead Man Walking (1995)	Richard III (1995)
5	3	4	3	3	5	4	1	5	3
4	0	0	0	0	0	0	0	0	2
0	0	0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0	0	0
4	3	0	0	0	0	0	0	0	0
4	0	0	0	0	0	2	4	4	0
0	0	0	5	0	0	5	5	5	4
0	0	0	0	0	0	3	0	0	0
0	0	0	0	0	5	4	0	0	0
4	0	0	4	0	0	4	0	4	0

moviemeta <- MovieLenseMeta
pander(head(moviemeta),caption = "First few Rows within Movie Meta Data ")

First few Rows within Movie Meta Data (continued below)
title	year
Toy Story (1995)	1995
GoldenEye (1995)	1995
Four Rooms (1995)	1995
Get Shorty (1995)	1995
Copycat (1995)	1995
Shanghai Triad (Yao a yao yao dao waipo qiao) (1995)	1995

Table continues below
url	unknown
http://us.imdb.com/M/title-exact?Toy%20Story%20(1995)	0
http://us.imdb.com/M/title-exact?GoldenEye%20(1995)	0
http://us.imdb.com/M/title-exact?Four%20Rooms%20(1995)	0
http://us.imdb.com/M/title-exact?Get%20Shorty%20(1995)	0
http://us.imdb.com/M/title-exact?Copycat%20(1995)	0
http://us.imdb.com/Title?Yao+a+yao+yao+dao+waipo+qiao+(1995)	0

Table continues below
Action	Adventure	Animation	Children’s	Comedy	Crime
0	0	1	1	1	0
1	1	0	0	0	0
0	0	0	0	0	0
1	0	0	0	1	0
0	0	0	0	0	1
0	0	0	0	0	0

Table continues below
Drama	Fantasy	Film-Noir	Horror	Musical	Mystery	Romance	Sci-Fi
0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0
1	0	0	0	0	0	0	0
1	0	0	0	0	0	0	0
1	0	0	0	0	0	0	0

Thriller	War	Western
0	0	0
1	0	0
1	0	0
0	0	0
1	0	0
0	0	0

pander(tail(moviemeta), caption = "Last few Rows within Movie Meta Data")

Last few Rows within Movie Meta Data (continued below)
	title	year
1676	War at Home, The (1996)	1996
1677	Sweet Nothing (1995)	1996
1678	Mat’ i syn (1997)	1998
1679	B. Monkey (1998)	1998
1681	You So Crazy (1994)	1994
1682	Scream of Stone (Schrei aus Stein) (1991)	1996

Table continues below
	url
1676	http://us.imdb.com/M/title-exact?War%20at%20Home%2C%20The%20%281996%29
1677	http://us.imdb.com/M/title-exact?Sweet%20Nothing%20(1995)
1678	http://us.imdb.com/M/title-exact?Mat%27+i+syn+(1997)
1679	http://us.imdb.com/M/title-exact?B%2E+Monkey+(1998)
1681	http://us.imdb.com/M/title-exact?You%20So%20Crazy%20(1994)
1682	http://us.imdb.com/M/title-exact?Schrei%20aus%20Stein%20(1991)

Table continues below
	unknown	Action	Adventure	Animation	Children’s	Comedy
1676	0	0	0	0	0	0
1677	0	0	0	0	0	0
1678	0	0	0	0	0	0
1679	0	0	0	0	0	0
1681	0	0	0	0	0	1
1682	0	0	0	0	0	0

Table continues below
	Crime	Documentary	Drama	Fantasy	Film-Noir	Horror
1676	0	0	1	0	0	0
1677	0	0	1	0	0	0
1678	0	0	1	0	0	0
1679	0	0	0	0	0	0
1681	0	0	0	0	0	0
1682	0	0	1	0	0	0

	Romance	Thriller
1676	0	0
1677	0	0
1678	0	0
1679	1	1
1681	0	0
1682	0	0

## look at the first 8 ratings of the first user
head(as(movielense[1,], "list")[[1]], 8)

##                                     Toy Story (1995) 
##                                                    5 
##                                     GoldenEye (1995) 
##                                                    3 
##                                    Four Rooms (1995) 
##                                                    4 
##                                    Get Shorty (1995) 
##                                                    3 
##                                       Copycat (1995) 
##                                                    3 
## Shanghai Triad (Yao a yao yao dao waipo qiao) (1995) 
##                                                    5 
##                                Twelve Monkeys (1995) 
##                                                    4 
##                                          Babe (1995) 
##                                                    1

# look at the last 4 ratings of the first user
tail(as(movielense[1,], "list")[[1]], 4)

##   Full Monty, The (1997)           Gattaca (1997) Starship Troopers (1997) 
##                        5                        5                        2 
## Good Will Hunting (1997) 
##                        3

# Loading the metadata that gets loaded with main dataset

Data preparation

As the dataset is quite large so we need to cut the dataset bit smaller we do this by user which have rated at least 30 movies and movies which are rated by minimum 60 users.

movielense <- movielense [rowCounts(movielense) > 30, colCounts(movielense) > 60]
print(paste0("Number of Rows after filtering : ", nrow(movielense)))

## [1] "Number of Rows after filtering : 726"

print(paste0("Number of Columns after filtering : ", ncol(movielense)))

## [1] "Number of Columns after filtering : 529"

Training and Testing Data

set.seed(2020)#seed as year
n_folds <- 10
to_keep <- 15
threshold <- 3
e <- evaluationScheme(movielense, method="cross-validation",k = n_folds, train=0.8, given=to_keep,  goodRating=threshold)
print(e)

## Evaluation scheme with 15 items given
## Method: 'cross-validation' with 10 run(s).
## Good ratings: >=3.000000
## Data set: 726 x 529 rating matrix of class 'realRatingMatrix' with 74956 ratings.

training <- getData(e, "train")
known <- getData(e, "known")
unknown <- getData(e, "unknown")
print(paste0("Traing data has ", nrow(training)," rows"))

## [1] "Traing data has 648 rows"

print(paste0("Known Testing data has ", nrow(known)," rows"))

## [1] "Known Testing data has 78 rows"

print(paste0("Unknown Testing data has ", nrow(unknown)," rows"))

## [1] "Unknown Testing data has 78 rows"

Singular Value Decompostion

We choose 1st model as Singular Value Decompostion to train our model so that we can use it to recommend movies.

training_time <- system.time({
    model_svd <- Recommender(data = training, method = "SVD") 
})
print("Model training time : ")

## [1] "Model training time : "

print(training_time)

##    user  system elapsed 
##    0.06    0.00    0.06

print(model_svd)

## Recommender of type 'SVD' for 'realRatingMatrix' 
## learned using 648 users.

SVD Prediction

predicted_top_ten_movies_svd <- predict(object = model_svd, newdata = known, n = 10)
predicted_top_ten_movies_df_svd <- data.frame(users = sort(rep(1:length(predicted_top_ten_movies_svd@items), 
                                                          predicted_top_ten_movies_svd@n)), 
                                         ratings = unlist(predicted_top_ten_movies_svd@ratings),
                                         index = unlist(predicted_top_ten_movies_svd@items))

predicted_top_ten_movies_df_svd$title <- predicted_top_ten_movies_svd@itemLabels[predicted_top_ten_movies_df_svd$index]
predicted_top_ten_movies_df_svd$year <- MovieLenseMeta$
year[predicted_top_ten_movies_df_svd$index]
predicted_top_ten_movies_df_svd <- predicted_top_ten_movies_df_svd %>% group_by(users) %>% top_n(5,ratings)

predicted_top_ten_movies_df_svd[predicted_top_ten_movies_df_svd$users %in% (1:10), ]

## # A tibble: 50 x 5
## # Groups:   users [10]
##    users ratings index title                         year
##    <int>   <dbl> <int> <chr>                        <dbl>
##  1     1    3.51    43 Pulp Fiction (1994)           1994
##  2     1    3.44    79 Fargo (1996)                  1993
##  3     1    3.42   104 2001: A Space Odyssey (1968)  1996
##  4     1    3.40    37 Star Wars (1977)              1994
##  5     1    3.39   225 Leaving Las Vegas (1995)      1996
##  6     2    4.25    43 Pulp Fiction (1994)           1994
##  7     2    4.23   166 Back to the Future (1985)     1986
##  8     2    4.22   176 Field of Dreams (1989)        1986
##  9     2    4.22    19 Braveheart (1995)             1995
## 10     2    4.21   196 Jerry Maguire (1996)          1989
## # ... with 40 more rows

Acuracy Matrix SVD

svd_prediction <- predict(object = model_svd, newdata = known, n = 10, type = "ratings")
print("Acuracy Matrix SVD :")

## [1] "Acuracy Matrix SVD :"

print(calcPredictionAccuracy(x = svd_prediction, data = unknown, byUser = FALSE))

##      RMSE       MSE       MAE 
## 0.9986768 0.9973553 0.7918828

Item Based Collaborative Filtering (Cosine)

training_time <- system.time({
    model_ibcf_cosine <- Recommender(data = training, method = "IBCF", parameter = list(method = "Cosine"))
})
print("Model training time : ")

## [1] "Model training time : "

print(training_time)

##    user  system elapsed 
##    1.00    0.15    1.20

print(model_ibcf_cosine)

## Recommender of type 'IBCF' for 'realRatingMatrix' 
## learned using 648 users.

IBCF Prediction

predicted_top_ten_movies_ibcf_cosine <- predict(object = model_ibcf_cosine, newdata = known, n = 10)
predicted_top_ten_movies_df_ibcf_cosine <- data.frame(users = sort(rep(1:length(predicted_top_ten_movies_ibcf_cosine@items), 
                                                          predicted_top_ten_movies_ibcf_cosine@n)), 
                                         ratings = unlist(predicted_top_ten_movies_ibcf_cosine@ratings),
                                         index = unlist(predicted_top_ten_movies_ibcf_cosine@items))

predicted_top_ten_movies_df_ibcf_cosine$title <- predicted_top_ten_movies_ibcf_cosine@itemLabels[predicted_top_ten_movies_df_ibcf_cosine$index]
predicted_top_ten_movies_df_ibcf_cosine$year <- MovieLenseMeta$year[predicted_top_ten_movies_df_ibcf_cosine$index]
predicted_top_ten_movies_df_ibcf_cosine <- predicted_top_ten_movies_df_ibcf_cosine %>% group_by(users) %>% top_n(5,ratings)

predicted_top_ten_movies_df_ibcf_cosine[predicted_top_ten_movies_df_ibcf_cosine$users %in% (1:10), ]

## # A tibble: 93 x 5
## # Groups:   users [10]
##    users ratings index title                                year
##    <int>   <dbl> <int> <chr>                               <dbl>
##  1     1       5     3 Four Rooms (1995)                    1995
##  2     1       5    14 Mr. Holland's Opus (1995)            1994
##  3     1       5    29 Net, The (1995)                      1995
##  4     1       5    38 Legends of the Fall (1994)           1995
##  5     1       5    51 While You Were Sleeping (1995)       1994
##  6     1       5    59 Firm, The (1993)                     1994
##  7     1       5    67 Sleepless in Seattle (1993)          1994
##  8     1       5    80 Heavy Metal (1981)                   1993
##  9     1       5    85 Truth About Cats & Dogs, The (1996)  1994
## 10     1       5    88 Rock, The (1996)                     1993
## # ... with 83 more rows

Acuracy Matrix IBCF

ibcf_prediction <- predict(object = model_ibcf_cosine, newdata = known, n = 10, type = "ratings")
print("Acuracy Matrix IBCF :")

## [1] "Acuracy Matrix IBCF :"

print(calcPredictionAccuracy(x = ibcf_prediction, data = unknown, byUser = FALSE))

##     RMSE      MSE      MAE 
## 1.444221 2.085774 1.092323

User Based Collaborative Filtering (Cosine)

training_time <- system.time({
    model_ubcf_cosine <- Recommender(data = training, method = "UBCF", parameter = list(method = "Cosine"))
})
print("Model training time : ")

## [1] "Model training time : "

print(training_time)

##    user  system elapsed 
##       0       0       0

print(model_ubcf_cosine)

## Recommender of type 'UBCF' for 'realRatingMatrix' 
## learned using 648 users.

UBCF Prediction

predicted_top_ten_movies_ubcf_cosine <- predict(object = model_ubcf_cosine, newdata = known, n = 10)
predicted_top_ten_movies_df_ubcf_cosine <- data.frame(users = sort(rep(1:length(predicted_top_ten_movies_ubcf_cosine@items), 
                                                          predicted_top_ten_movies_ubcf_cosine@n)), 
                                         ratings = unlist(predicted_top_ten_movies_ubcf_cosine@ratings),
                                         index = unlist(predicted_top_ten_movies_ubcf_cosine@items))

predicted_top_ten_movies_df_ubcf_cosine$title <- predicted_top_ten_movies_ubcf_cosine@itemLabels[predicted_top_ten_movies_df_ubcf_cosine$index]
predicted_top_ten_movies_df_ubcf_cosine$year <- MovieLenseMeta$year[predicted_top_ten_movies_df_ubcf_cosine$index]
predicted_top_ten_movies_df_ubcf_cosine <- predicted_top_ten_movies_df_ubcf_cosine %>% group_by(users) %>% top_n(5,ratings)

predicted_top_ten_movies_df_ubcf_cosine[predicted_top_ten_movies_df_ubcf_cosine$users %in% (1:10), ]

## # A tibble: 50 x 5
## # Groups:   users [10]
##    users ratings index title                             year
##    <int>   <dbl> <int> <chr>                            <dbl>
##  1     1    3.78    79 Fargo (1996)                      1993
##  2     1    3.76    37 Star Wars (1977)                  1994
##  3     1    3.63    43 Pulp Fiction (1994)               1994
##  4     1    3.62   143 Return of the Jedi (1983)         1965
##  5     1    3.54   149 Godfather: Part II, The (1974)    1996
##  6     2    4.57    37 Star Wars (1977)                  1994
##  7     2    4.54    79 Fargo (1996)                      1993
##  8     2    4.48   143 Return of the Jedi (1983)         1965
##  9     2    4.46    77 Silence of the Lambs, The (1991)  1993
## 10     2    4.44   212 Contact (1997)                    1988
## # ... with 40 more rows

Acuracy Matrix UBCF

ubcf_prediction <- predict(object = model_ubcf_cosine, newdata = known, n = 10, type = "ratings")
print("Acuracy Matrix UBCF :")

## [1] "Acuracy Matrix UBCF :"

print(calcPredictionAccuracy(x = ubcf_prediction, data = unknown, byUser = FALSE))

##      RMSE       MSE       MAE 
## 0.9970938 0.9941960 0.7911661

Model Comparison

models_evaluation <- list( 
SVD = list(name = "SVD"),
IBCF = list(name = "IBCF", param = list(method = "cosine")),  
UBCF = list(name = "UBCF", param = list(method = "cosine"))
)

lerror <- evaluate(x = e, method = models_evaluation, type = "ratings")

## SVD run fold/sample [model time/prediction time]
##   1  [0.07sec/0.01sec] 
##   2  [0.22sec/0sec] 
##   3  [0.2sec/0.02sec] 
##   4  [0.21sec/0sec] 
##   5  [0.25sec/0.02sec] 
##   6  [0.05sec/0.02sec] 
##   7  [0.05sec/0.02sec] 
##   8  [0.06sec/0.01sec] 
##   9  [0.04sec/0.01sec] 
##   10  [0.04sec/0.02sec] 
## IBCF run fold/sample [model time/prediction time]
##   1  [1.04sec/0.02sec] 
##   2  [1.05sec/0.01sec] 
##   3  [1.08sec/0sec] 
##   4  [1.27sec/0.02sec] 
##   5  [0.96sec/0.19sec] 
##   6  [1.07sec/0.19sec] 
##   7  [1.08sec/0.03sec] 
##   8  [1.1sec/0.02sec] 
##   9  [1.18sec/0.02sec] 
##   10  [0.84sec/0sec] 
## UBCF run fold/sample [model time/prediction time]
##   1  [0.01sec/0.13sec] 
##   2  [0.01sec/0.13sec] 
##   3  [0sec/0.13sec] 
##   4  [0sec/0.14sec] 
##   5  [0.02sec/0.12sec] 
##   6  [0sec/0.15sec] 
##   7  [0sec/0.17sec] 
##   8  [0.02sec/0.12sec] 
##   9  [0sec/0.14sec] 
##   10  [0.02sec/0.3sec]

mdlcmp <- as.data.frame(sapply(avg(lerror), rbind))

cmpMdl <- as.data.frame(t(as.matrix(mdlcmp)))
colnames(cmpMdl) <- c("RMSE", "MSE", "MAE")
pander(cmpMdl, caption = "Model Comparison")

Model Comparison
	RMSE	MSE	MAE
SVD	1.037	1.077	0.8232
IBCF	1.469	2.162	1.116
UBCF	1.034	1.07	0.8197

rmse_ubcf<- calcPredictionAccuracy(x = ubcf_prediction, data = unknown, byUser = FALSE)
print (rmse_ubcf)

##      RMSE       MSE       MAE 
## 0.9970938 0.9941960 0.7911661

rmse_ibcf <- calcPredictionAccuracy(x = ibcf_prediction, data = unknown, byUser = FALSE)
print(rmse_ibcf)

##     RMSE      MSE      MAE 
## 1.444221 2.085774 1.092323

rmse_svd <- calcPredictionAccuracy(x = svd_prediction, data = unknown, byUser = FALSE)
print (rmse_svd)

##      RMSE       MSE       MAE 
## 0.9986768 0.9973553 0.7918828

library (ggplot2)
library(dplyr)
comparison = rbind(rmse_ibcf, rmse_ubcf, rmse_svd)
comparison = data.frame(comparison, row.names = NULL)
comparison = cbind(model =c('IBCF','UBCF','SVD'), comparison)

comparison %>% gather ('measure', 'value',-1) %>% 
  ggplot (aes (x=measure, y=value, fill=model)) +
  geom_bar (stat='identity', position=position_dodge())

Item based content filtering performs the worst, which has the biggest RMSE (root square mean standard deviation) value. Singular value decomposition and user based content filtering performs similar.

n_recommendations = c(1,3,5,8,10,15,20, 25)
results = evaluate (x=e, method = models_evaluation, n = n_recommendations)

## SVD run fold/sample [model time/prediction time]
##   1  [0.05sec/0.03sec] 
##   2  [0.06sec/0.02sec] 
##   3  [0.05sec/0.01sec] 
##   4  [0.05sec/0.03sec] 
##   5  [0.04sec/0.03sec] 
##   6  [0.04sec/0.03sec] 
##   7  [0.06sec/0.01sec] 
##   8  [0.04sec/0.04sec] 
##   9  [0.04sec/0.04sec] 
##   10  [0.05sec/0.03sec] 
## IBCF run fold/sample [model time/prediction time]
##   1  [0.91sec/0.01sec] 
##   2  [1sec/0.03sec] 
##   3  [0.84sec/0.02sec] 
##   4  [1.13sec/0.03sec] 
##   5  [1.12sec/0.02sec] 
##   6  [1sec/0.03sec] 
##   7  [1.15sec/0.03sec] 
##   8  [1.06sec/0.04sec] 
##   9  [1.07sec/0.03sec] 
##   10  [0.88sec/0.03sec] 
## UBCF run fold/sample [model time/prediction time]
##   1  [0sec/0.16sec] 
##   2  [0.02sec/0.15sec] 
##   3  [0sec/0.33sec] 
##   4  [0sec/0.15sec] 
##   5  [0sec/0.15sec] 
##   6  [0.01sec/0.14sec] 
##   7  [0sec/0.16sec] 
##   8  [0sec/0.14sec] 
##   9  [0sec/0.16sec] 
##   10  [0.02sec/0.14sec]

plot(results, y="ROC", annotate = 1, legend ="topleft")
title ("ROC Curve")

plot (results, y ='prec/rec', annotate=1)
title ("Precision-Recall")

The ROC (receiver operative curve) revales that singular value decomposition has the best area under the curve, followed by user based content filtering, while the item based content filtering has the worst area under curve. So is true with the precision-recall figure, with SVD ranks the best, and IBCF ranks the worst.

Conclusion

Singular value decomposition performes better than than the collaborative filterting family (UBCF and IBCF), in this movie setting. It is not surprising that below famous big tech all uses singular value decomposition as their recommendation system until very recently.

Figure

Toy Story (1995)	GoldenEye (1995)	Four Rooms (1995)	Get Shorty (1995)	Copycat (1995)	Shanghai Triad (Yao a yao yao dao waipo qiao) (1995)	Twelve Monkeys (1995)	Babe (1995)	Dead Man Walking (1995)	Richard III (1995)
5	3	4	3	3	5	4	1	5	3
4	0	0	0	0	0	0	0	0	2
0	0	0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0	0	0
4	3	0	0	0	0	0	0	0	0
4	0	0	0	0	0	2	4	4	0
0	0	0	5	0	0	5	5	5	4
0	0	0	0	0	0	3	0	0	0
0	0	0	0	0	5	4	0	0	0
4	0	0	4	0	0	4	0	4	0

Toy Story (1995)	GoldenEye (1995)	Four Rooms (1995)	Get Shorty (1995)	Copycat (1995)	Shanghai Triad (Yao a yao yao dao waipo qiao) (1995)	Twelve Monkeys (1995)	Babe (1995)	Dead Man Walking (1995)	Richard III (1995)
5	3	4	3	3	5	4	1	5	3
4	0	0	0	0	0	0	0	0	2
0	0	0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0	0	0
4	3	0	0	0	0	0	0	0	0
4	0	0	0	0	0	2	4	4	0
0	0	0	5	0	0	5	5	5	4
0	0	0	0	0	0	3	0	0	0
0	0	0	0	0	5	4	0	0	0
4	0	0	4	0	0	4	0	4	0

MovieLense Recommendation Overview2

Gracie Hui Han

Background

Collaborative Filting

Singular Value Decomposition (SVD)

Dataset

Data Exploration

Movie Ratings Histogram

Data preparation

Training and Testing Data

Singular Value Decompostion

SVD Prediction

Acuracy Matrix SVD

Item Based Collaborative Filtering (Cosine)

IBCF Prediction

Acuracy Matrix IBCF

User Based Collaborative Filtering (Cosine)

UBCF Prediction

Acuracy Matrix UBCF

Model Comparison

Conclusion

Toy Story (1995)	GoldenEye (1995)	Four Rooms (1995)	Get Shorty (1995)	Copycat (1995)	Shanghai Triad (Yao a yao yao dao waipo qiao) (1995)	Twelve Monkeys (1995)	Babe (1995)	Dead Man Walking (1995)	Richard III (1995)
5	3	4	3	3	5	4	1	5	3
4	0	0	0	0	0	0	0	0	2
0	0	0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0	0	0
4	3	0	0	0	0	0	0	0	0
4	0	0	0	0	0	2	4	4	0
0	0	0	5	0	0	5	5	5	4
0	0	0	0	0	0	3	0	0	0
0	0	0	0	0	5	4	0	0	0
4	0	0	4	0	0	4	0	4	0