Project 2 | Content-Based and Collaborative Filtering

Project Objectives
Libraries
Data Preparation and Exploration
Citation
Data Visualization
Data Normalization
Recommendation algorithms
- Split the dataset into training set (80%) and testing set (20%):
Item-Item Collaborative Filtering
User-User Collaborative Filtering
- Training the model
- Recommendations using test set
Comparison of Recommender Models
Summary

Project Objectives

For assignment 2, start with an existing dataset of user-item ratings, such as our toy books dataset, MovieLens, Jester [http://eigentaste.berkeley.edu/dataset/] or another dataset of your choosing.

Implement at least two of these recommendation algorithms:
• Content-Based Filtering
• User-User Collaborative Filtering
• Item-Item Collaborative Filtering

You should evaluate and compare different approaches, using different algorithms, normalization techniques, similarity methods, neighborhood sizes, etc. You don’t need to be exhaustive—these are just some suggested possibilities.

You may use the course text’s recommenderlab or any other library that you want. Please provide at least one graph, and a textual summary of your findings and recommendations.

Libraries

library(recommenderlab)
library(reshape2)
library(RCurl)
library(ggplot2)
library(knitr)
library(kableExtra)
library(dplyr)
library(tidyr)
library(ggplot2)

Data Preparation and Exploration

We gathered data from section “recommended for education and development” of site https://grouplens.org/datasets/movielens/. This site provides two links, from which we chose the link for the smaller file, because the larger one (named as Full) is too large to load into github. Description of the data is as follows:

This dataset (ml-latest-small) describes 5-star rating and free-text tagging activity from MovieLens, a movie recommendation service. It contains 100836 ratings and 3683 tag applications across 9742 movies. These data were created by 610 users between March 29, 1996 and September 24, 2018. This dataset was generated on September 26, 2018. There are 4 *.csv files, from which we chose two files movies.cv and ratings.csv, for our down stream analysis.

Citation

F. Maxwell Harper and Joseph A. Konstan. 2015. The MovieLens Datasets: History and Context. ACM Transactions on Interactive Intelligent Systems (TiiS) 5, 4: 19:1–19:19. https://doi.org/10.1145/2827872

Load Data

movies <- read.csv("https://raw.githubusercontent.com/forhadakbar/data612summer2020/master/Project%2002/movies.csv")
ratings <- read.csv("https://raw.githubusercontent.com/forhadakbar/data612summer2020/master/Project%2002/ratings.csv")

Preview data

#Preview movies data
kable(head(movies, n = 10L)) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive")) %>%
  row_spec(0, bold = T, color = "white", background = "#fc5e5e") %>%
    scroll_box(width = "100%", height = "200px")

movieId	title	genres
1	Toy Story (1995)	Adventure\|Animation\|Children\|Comedy\|Fantasy
2	Jumanji (1995)	Adventure\|Children\|Fantasy
3	Grumpier Old Men (1995)	Comedy\|Romance
4	Waiting to Exhale (1995)	Comedy\|Drama\|Romance
5	Father of the Bride Part II (1995)	Comedy
6	Heat (1995)	Action\|Crime\|Thriller
7	Sabrina (1995)	Comedy\|Romance
8	Tom and Huck (1995)	Adventure\|Children
9	Sudden Death (1995)	Action
10	GoldenEye (1995)	Action\|Adventure\|Thriller

#Preview ratings data
kable(head(ratings, n = 10L)) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive")) %>%
  row_spec(0, bold = T, color = "white", background = "#fc5e5e") %>%
    scroll_box(width = "100%", height = "200px")

userId	movieId	rating	timestamp
1	1	4	964982703
1	3	4	964981247
1	6	4	964982224
1	47	5	964983815
1	50	5	964982931
1	70	3	964982400
1	101	5	964980868
1	110	4	964982176
1	151	5	964984041
1	157	5	964984100

Combine Data

Join movies with ratings on movieId

movie_ratings <- merge(ratings, movies, by="movieId")
#Preview ratings data
kable(head(movie_ratings, n = 10L)) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive")) %>%
  row_spec(0, bold = T, color = "white", background = "#fc5e5e") %>%
    scroll_box(width = "100%", height = "200px")

movieId	userId	rating	timestamp	title	genres
1	1	4.0	964982703	Toy Story (1995)	Adventure\|Animation\|Children\|Comedy\|Fantasy
1	555	4.0	978746159	Toy Story (1995)	Adventure\|Animation\|Children\|Comedy\|Fantasy
1	232	3.5	1076955621	Toy Story (1995)	Adventure\|Animation\|Children\|Comedy\|Fantasy
1	590	4.0	1258420408	Toy Story (1995)	Adventure\|Animation\|Children\|Comedy\|Fantasy
1	601	4.0	1521467801	Toy Story (1995)	Adventure\|Animation\|Children\|Comedy\|Fantasy
1	179	4.0	852114051	Toy Story (1995)	Adventure\|Animation\|Children\|Comedy\|Fantasy
1	606	2.5	1349082950	Toy Story (1995)	Adventure\|Animation\|Children\|Comedy\|Fantasy
1	328	5.0	1494210665	Toy Story (1995)	Adventure\|Animation\|Children\|Comedy\|Fantasy
1	206	5.0	850763267	Toy Story (1995)	Adventure\|Animation\|Children\|Comedy\|Fantasy
1	468	4.0	831400444	Toy Story (1995)	Adventure\|Animation\|Children\|Comedy\|Fantasy

Create Matrix

As opposed to MovieLense of the recommenderlab, our dataset does not come as member of class realRatingMatrix. So, in the following code-chunk, we’ll create a realRatingMatrix dataset, called moviematrix. By putting moviematrix into class realRatingMatrix, we’ll be able to apply some useful functions on moviematrix (refer: page 33 of Building a Recommendation System with R).

moviematrix <- ratings %>% select(-timestamp) %>% spread(movieId, rating)

row.names(moviematrix) <- moviematrix[, 1]

moviematrix <- moviematrix[-c(1)]

moviematrix <- as(as.matrix(moviematrix), "realRatingMatrix")

moviematrix

## 610 x 9724 rating matrix of class 'realRatingMatrix' with 100836 ratings.

At this point, we’ll take stalk of the important characteristics of moviematrix.

dim(moviematrix)

## [1]  610 9724

head(getData.frame(moviematrix))

##      user item rating
## 1       1    1      4
## 326     1    3      4
## 434     1    6      4
## 2108    1   47      5
## 2380    1   50      5
## 2860    1   70      3

Data Visualization

Exploring the values of the rating

# Vectorize and create unique vector.
vector_ratings <- as.vector(moviematrix@data)
unique(vector_ratings)

##  [1] 4.0 0.0 4.5 2.5 3.5 3.0 5.0 0.5 2.0 1.5 1.0

# The ratings are in the range 0-5. Let's count the occurrences of each of them.
table_ratings <- table(vector_ratings)
kable(table_ratings)

vector_ratings	Freq
0	5830804
0.5	1370
1	2811
1.5	1791
2	7551
2.5	5550
3	20047
3.5	13136
4	26818
4.5	8551
5	13211

Rating equal to 0 represents a missing value, so we’ll purge out the zero-ratings from vector_ratings.

vector_ratings <- vector_ratings[vector_ratings != 0]
vector_ratings <- factor(vector_ratings)
qplot(vector_ratings) + ggtitle("Distribution of the ratings")

Exploring which movies have been viewed

views_per_movie <- colCounts(moviematrix)

table_views <- data.frame(
movie = names(views_per_movie),
views = views_per_movie
)
names(table_views)[names(table_views) == "movie"] <- "movieId"
table_views <- merge(table_views, movies, by="movieId")
table_views <- table_views[order(table_views$views, decreasing =
TRUE), ]

ggplot(table_views[1:6, ], aes(x = title, y = views)) +
geom_bar(stat="identity") + theme(axis.text.x =
element_text(angle = 45, hjust = 1)) + ggtitle("Number of views
of the top movies")

Exploring the average ratings

average_ratings <- colMeans(moviematrix)
qplot(average_ratings, fill=..count.., geom="histogram",binwidth = 0.1, main= "Distribution of average movie rating", xlab = "Average Rating", ylab = "Count")

Selecting the most relevant data

When we explored the data, we noticed that the table contains

Movies that have been viewed only a few times. Therefore, their ratings might be biased. So, we’ll keep movies that have been watched at least 50 times.
Users, who rated only a few movies. Therefore, their ratings might be biased too. So, we’ll keep users, who have rated at least 50 movies

# rowCounts returns the users
# colCounts returns the movies
(moviematrix <- moviematrix[rowCounts(moviematrix) > 50, colCounts(moviematrix) > 50])

## 378 x 436 rating matrix of class 'realRatingMatrix' with 36214 ratings.

dim(moviematrix)

## [1] 378 436

Now we have 378 users and 436 items with 36214 ratings.

Let’s build the chart again:

average_ratings <- colMeans(moviematrix)
# Histogram: average rating per user
qplot(average_ratings, fill=..count.., geom="histogram",binwidth = 0.1, main= "Histogram of average rating per user", xlab = "Average Rating", ylab = "No. of Ratings")

Data Normalization

movie_Normalization <- normalize(moviematrix)
avg <- round(rowMeans(movie_Normalization), 5)
table(avg)

## avg
##   0 
## 378

min_Items <- quantile(rowCounts(moviematrix), 0.95)
min_Users <- quantile(colCounts(moviematrix), 0.95)

image(moviematrix[rowCounts(moviematrix) > min_Items, colCounts(moviematrix) > min_Users], 
    main = "Heatmap of the Top Users and Movies (Non-Normalized")

image(movie_Normalization[rowCounts(movie_Normalization) > min_Items, colCounts(movie_Normalization) > 
    min_Users], main = "Heatmap of the Top Users and Movies (Normalized)")

Recommendation algorithms

Split the dataset into training set (80%) and testing set (20%):

set.seed(80)
train_set <- sample(x = c(TRUE, FALSE), size = nrow(moviematrix), replace = TRUE, 
    prob = c(0.8, 0.2))

movie_Train <- moviematrix[train_set, ]
movie_Test <- moviematrix[!train_set, ]

Item-Item Collaborative Filtering

This is a filtering method, where similarity between items is calculated using users’ ratings of items. That means the algorithm recommends items similar to the users’ previous selections. In the algorithm, the similarities between different items are computed by one of the similarity measures, and then similarity values are used to predict ratings for user-item pairs absent in the data.

Training model

In below step we’ll train the model, with a value of k = 30, which is the default.

(model <- Recommender(movie_Train, method = "IBCF", parameter = list(k = 30)))

## Recommender of type 'IBCF' for 'realRatingMatrix' 
## learned using 302 users.

model

## Recommender of type 'IBCF' for 'realRatingMatrix' 
## learned using 302 users.

Examining the Similarity Matrix

similarityMatrix <- getModel(model)$sim
which_max <- order(colSums(similarityMatrix > 0), decreasing = TRUE)[1:10]
topMovies <- as.data.frame(as.integer(rownames(similarityMatrix)[which_max]))
colnames(topMovies) <- c("movieId")

data <- topMovies %>% inner_join(movies, by = "movieId") %>% select(Movie = "title")

kable((data)) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive")) %>%
  row_spec(0, bold = T, color = "white", background = "#fc5e5e") %>%
    scroll_box(width = "100%", height = "200px")

Movie
Disclosure (1994)
Piano, The (1993)
City Slickers II: The Legend of Curly’s Gold (1994)
Congo (1995)
Broken Arrow (1996)
Wild Wild West (1999)
First Knight (1995)
Eraser (1996)
Coneheads (1993)
Beverly Hills Cop III (1994)

Recommendations using test set

(pred <- predict(model, newdata = movie_Test, n = 6))

## Recommendations as 'topNList' with n = 6 for 76 users.

user1 <- as.data.frame(movie_Test@data[1, movie_Test@data[1, ] > 0])
colnames(user1) <- c("Rating")
user1[c("movieId")] <- as.integer(rownames(user1))

data <- movies %>% inner_join(user1, by = "movieId") %>% select(Movie = "title", Rating, genres) %>% arrange(desc(Rating))

kable((data)) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive")) %>%
  row_spec(0, bold = T, color = "white", background = "#fc5e5e") %>%
    scroll_box(width = "100%", height = "200px")

Movie	Rating	genres
Shawshank Redemption, The (1994)	5.0	Crime\|Drama
Forrest Gump (1994)	5.0	Comedy\|Drama\|Romance\|War
Blade Runner (1982)	5.0	Action\|Sci-Fi\|Thriller
One Flew Over the Cuckoo’s Nest (1975)	5.0	Drama
Hook (1991)	5.0	Adventure\|Comedy\|Fantasy
Kill Bill: Vol. 2 (2004)	5.0	Action\|Drama\|Thriller
Casino Royale (2006)	5.0	Action\|Adventure\|Thriller
Crouching Tiger, Hidden Dragon (Wo hu cang long) (2000)	4.5	Action\|Drama\|Romance
Traffic (2000)	4.5	Crime\|Drama\|Thriller
Mulholland Drive (2001)	4.5	Crime\|Drama\|Film-Noir\|Mystery\|Thriller
Bowling for Columbine (2002)	4.5	Documentary
Interview with the Vampire: The Vampire Chronicles (1994)	4.0	Drama\|Horror
Gladiator (2000)	4.0	Action\|Adventure\|Drama
Chicken Run (2000)	4.0	Animation\|Children\|Comedy
Best in Show (2000)	4.0	Comedy
Lost in Translation (2003)	4.0	Comedy\|Drama\|Romance
Mystic River (2003)	4.0	Crime\|Drama\|Mystery
Kill Bill: Vol. 1 (2003)	4.0	Action\|Crime\|Thriller
Incredibles, The (2004)	4.0	Action\|Adventure\|Animation\|Children\|Comedy
Prestige, The (2006)	4.0	Drama\|Mystery\|Sci-Fi\|Thriller
No Country for Old Men (2007)	4.0	Crime\|Drama
Inglourious Basterds (2009)	4.0	Action\|Drama\|War
Fight Club (1999)	3.5	Action\|Crime\|Drama\|Thriller
Monsters, Inc. (2001)	3.5	Adventure\|Animation\|Children\|Comedy\|Fantasy
Royal Tenenbaums, The (2001)	3.5	Comedy\|Drama
Beautiful Mind, A (2001)	3.5	Drama\|Romance
Bourne Identity, The (2002)	3.5	Action\|Mystery\|Thriller
Finding Nemo (2003)	3.5	Adventure\|Animation\|Children\|Comedy
Eternal Sunshine of the Spotless Mind (2004)	3.5	Drama\|Romance\|Sci-Fi
Superbad (2007)	3.5	Comedy
Avatar (2009)	3.5	Action\|Adventure\|Sci-Fi\|IMAX
Godfather, The (1972)	3.0	Crime\|Drama
Memento (2000)	3.0	Mystery\|Thriller
Shrek (2001)	3.0	Adventure\|Animation\|Children\|Comedy\|Fantasy\|Romance
Dark Knight, The (2008)	3.0	Action\|Crime\|Drama\|IMAX
O Brother, Where Art Thou? (2000)	2.5	Adventure\|Comedy\|Crime
Pirates of the Caribbean: The Curse of the Black Pearl (2003)	2.5	Action\|Adventure\|Comedy\|Fantasy
Batman Begins (2005)	2.5	Action\|Crime\|IMAX
Departed, The (2006)	2.5	Crime\|Drama\|Thriller
Bourne Ultimatum, The (2007)	2.5	Action\|Crime\|Thriller
Million Dollar Baby (2004)	2.0	Drama
WALLÂ·E (2008)	2.0	Adventure\|Animation\|Children\|Romance\|Sci-Fi
Up (2009)	2.0	Adventure\|Animation\|Children\|Drama
Donnie Darko (2001)	1.5	Drama\|Mystery\|Sci-Fi\|Thriller
28 Days Later (2002)	1.5	Action\|Horror\|Sci-Fi
Pan’s Labyrinth (Laberinto del fauno, El) (2006)	1.5	Drama\|Fantasy\|Thriller
Ratatouille (2007)	1.5	Animation\|Children\|Drama
Lord of the Rings: The Fellowship of the Ring, The (2001)	1.0	Adventure\|Fantasy
Lord of the Rings: The Return of the King, The (2003)	1.0	Action\|Adventure\|Drama\|Fantasy
High Fidelity (2000)	0.5	Comedy\|Drama\|Romance
Requiem for a Dream (2000)	0.5	Drama
Harry Potter and the Chamber of Secrets (2002)	0.5	Adventure\|Fantasy
Big Fish (2003)	0.5	Drama\|Fantasy\|Romance
V for Vendetta (2006)	0.5	Action\|Sci-Fi\|Thriller\|IMAX
Juno (2007)	0.5	Comedy\|Drama\|Romance
Iron Man (2008)	0.5	Action\|Adventure\|Sci-Fi
Slumdog Millionaire (2008)	0.5	Crime\|Drama\|Romance
Star Trek (2009)	0.5	Action\|Adventure\|Sci-Fi\|IMAX
Hangover, The (2009)	0.5	Comedy\|Crime
District 9 (2009)	0.5	Mystery\|Sci-Fi\|Thriller

recommended <- pred@itemLabels[pred@items[[1]]]
recommended <- as.data.frame(as.integer(recommended))
colnames(recommended) <- c("movieId")

data <- recommended %>% inner_join(movies, by = "movieId") %>% select(Movie = "title",genres)

kable((data)) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive")) %>%
  row_spec(0, bold = T, color = "white", background = "#fc5e5e") %>%
    scroll_box(width = "100%", height = "200px")

Movie	genres
Stargate (1994)	Action\|Adventure\|Sci-Fi
Robin Hood: Men in Tights (1993)	Comedy
Schindler’s List (1993)	Drama\|War
Alien (1979)	Horror\|Sci-Fi
The Devil’s Advocate (1997)	Drama\|Mystery\|Thriller
Big (1988)	Comedy\|Drama\|Fantasy\|Romance

User-User Collaborative Filtering

Training the model

(model <- Recommender(movie_Train, method = "UBCF"))

## Recommender of type 'UBCF' for 'realRatingMatrix' 
## learned using 302 users.

Recommendations using test set

(pred <- predict(model, newdata = movie_Test, n = 6))

## Recommendations as 'topNList' with n = 6 for 76 users.

# Recommendations for the first user
recommended <- pred@itemLabels[pred@items[[1]]]
recommended <- as.data.frame(as.integer(recommended))
colnames(recommended) <- c("movieId")

data <- recommended %>% inner_join(movies, by = "movieId") %>% select(Movie = "title",genres)

kable((data)) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive")) %>%
  row_spec(0, bold = T, color = "white", background = "#fc5e5e") %>%
    scroll_box(width = "100%", height = "200px")

Movie	genres
Babe (1995)	Children\|Drama
Fugitive, The (1993)	Thriller
Braveheart (1995)	Action\|Drama\|War
Lion King, The (1994)	Adventure\|Animation\|Children\|Drama\|Musical\|IMAX
Star Wars: Episode IV - A New Hope (1977)	Action\|Adventure\|Sci-Fi
Aladdin (1992)	Adventure\|Animation\|Children\|Comedy\|Musical

Comparison of Recommender Models

set.seed(101)

minimum = min(rowCounts(moviematrix))
minimum

## [1] 11

evaluation = evaluationScheme(data = moviematrix, method = "cross-validation", k = 10, given = 10, goodRating = 3.5)
evaluation

## Evaluation scheme with 10 items given
## Method: 'cross-validation' with 10 run(s).
## Good ratings: >=3.500000
## Data set: 378 x 436 rating matrix of class 'realRatingMatrix' with 36214 ratings.

ev_train = getData(evaluation, "train")
ev_known = getData(evaluation, "known")
ev_unknown = getData(evaluation, "unknown")

# Item
item_model = Recommender(data = ev_train, method = "IBCF", parameter = list(method = "Cosine"))
item_model_pred = predict(object = item_model, newdata = ev_known, n = 10, type = "ratings")
item = calcPredictionAccuracy(x = item_model_pred, data = ev_unknown, byUser = FALSE)

# User
user_model = Recommender(data = ev_train, method = "UBCF", parameter = list(method = "Cosine"))
user_model_pred = predict(object = user_model, newdata = ev_known, n = 10, type = "ratings")
user = calcPredictionAccuracy(x = user_model_pred, data = ev_unknown, byUser = FALSE)

# Comparison
kable(rbind(item, user))%>%
  kable_styling(bootstrap_options = c("striped","hover","condensed","responsive"),full_width   = F,position = "left",font_size = 12) %>%
  row_spec(0, background ="gray")

	RMSE	MSE	MAE
item	1.2937300	1.6737372	0.955398
user	0.9555293	0.9130362	0.734025

eval_sets = evaluationScheme(data = moviematrix, method = "cross-validation", k = 4, given = 10, goodRating = 3.5)
I_results = evaluate(x = eval_sets, method = "IBCF", n = seq(10, 100, 10))

## IBCF run fold/sample [model time/prediction time]
##   1  [0.35sec/0.03sec] 
##   2  [0.33sec/0.03sec] 
##   3  [0.34sec/0.03sec] 
##   4  [0.39sec/0.03sec]

kable(head(getConfusionMatrix(I_results)[[1]]))%>%
  kable_styling(bootstrap_options = c("striped","hover","condensed","responsive"),full_width   = F,position = "left",font_size = 12) %>%
  row_spec(0, background ="gray")

	TP	FP	FN	TN	precision	recall	TPR	FPR
10	1.416667	8.583333	59.61458	356.3854	0.1416667	0.0291171	0.0291171	0.0237326
20	2.947917	17.052083	58.08333	347.9167	0.1473958	0.0564288	0.0564288	0.0469233
30	4.645833	25.354167	56.38542	339.6146	0.1548611	0.0836695	0.0836695	0.0695461
40	6.031250	33.968750	55.00000	331.0000	0.1507812	0.1065546	0.1065546	0.0933082
50	7.489583	42.510417	53.54167	322.4583	0.1497917	0.1313989	0.1313989	0.1167724
60	8.572917	51.427083	52.45833	313.5417	0.1428819	0.1471139	0.1471139	0.1413827

U_results = evaluate(x = eval_sets, method = "UBCF", n = seq(10, 100, 10))

## UBCF run fold/sample [model time/prediction time]
##   1  [0sec/0.12sec] 
##   2  [0sec/0.14sec] 
##   3  [0sec/0.13sec] 
##   4  [0sec/0.14sec]

kable(head(getConfusionMatrix(U_results)[[1]]))%>%
  kable_styling(bootstrap_options = c("striped","hover","condensed","responsive"),full_width   = F,position = "left",font_size = 12) %>%
  row_spec(0, background ="gray")

	TP	FP	FN	TN	precision	recall	TPR	FPR
10	2.854167	7.145833	58.17708	357.8229	0.2854167	0.0570086	0.0570086	0.0190046
20	4.875000	15.125000	56.15625	349.8438	0.2437500	0.0948137	0.0948137	0.0406465
30	6.687500	23.312500	54.34375	341.6562	0.2229167	0.1221262	0.1221262	0.0627652
40	8.645833	31.354167	52.38542	333.6146	0.2161458	0.1522087	0.1522087	0.0844114
50	10.510417	39.489583	50.52083	325.4792	0.2102083	0.1786933	0.1786933	0.1062689
60	12.197917	47.802083	48.83333	317.1667	0.2032986	0.2051133	0.2051133	0.1289054

plot(U_results, annotate = TRUE, main = "ROC curve of UBCF")

plot(I_results, annotate = TRUE, main = "ROC curve of IBCF")

mult_models = list(
  IBCF_cos = list(name = "IBCF", param = list(method = "Cosine")),
  IBCF_pearson = list(name = "IBCF", param = list(method = "pearson")),
  UBCF_cos = list(name = "UBCF", param = list(method = "Cosine")),
  UBCF_pearson = list(name = "UBCF", param = list(method = "pearson")),
  Random = list(name = "RANDOM", param = NULL),
  Popular = list(name = "POPULAR", param = NULL)
)

# Testing models
models = evaluate(eval_sets, mult_models, n= c(1, 5, seq(10, 100, 10)))

## IBCF run fold/sample [model time/prediction time]
##   1  [0.42sec/0.05sec] 
##   2  [0.41sec/0.02sec] 
##   3  [0.57sec/0.01sec] 
##   4  [0.35sec/0.03sec] 
## IBCF run fold/sample [model time/prediction time]
##   1  [0.41sec/0.03sec] 
##   2  [0.41sec/0.03sec] 
##   3  [0.42sec/0.03sec] 
##   4  [0.41sec/0.03sec] 
## UBCF run fold/sample [model time/prediction time]
##   1  [0sec/0.15sec] 
##   2  [0.02sec/0.12sec] 
##   3  [0sec/0.12sec] 
##   4  [0.02sec/0.12sec] 
## UBCF run fold/sample [model time/prediction time]
##   1  [0sec/0.16sec] 
##   2  [0sec/0.14sec] 
##   3  [0sec/0.14sec] 
##   4  [0.02sec/0.12sec] 
## RANDOM run fold/sample [model time/prediction time]
##   1  [0sec/0.03sec] 
##   2  [0sec/0.03sec] 
##   3  [0sec/0.03sec] 
##   4  [0sec/0.04sec] 
## POPULAR run fold/sample [model time/prediction time]
##   1  [0sec/0.18sec] 
##   2  [0sec/0.28sec] 
##   3  [0sec/0.17sec] 
##   4  [0sec/0.17sec]

# Plotting models
plot(models, annotate = T, legend="topleft")

plot(models, "prec/rec", annotate = F, main="Precision/Recall", legend="topright")

Summary

By building the movie recommender system, we got a better understanding of how it works. The text book “Building Recommendation System with R” is not clear in some places. So, we had to google, to find out the implementation details.

The pros and cons of User based Collaborative Filtering (UBCF) and Item based Collaborative Filtering (IBCF) approaches.

Recommendations of UBCF complements the item that the user was interactibg with. Since users might not be looking for direct substitutes to a movie, UBCF provides a better recommendation than IBCF.
UBCF is memory intensive. So, with humongous number of users, processing time would be high.
UBCF relies on historial choices of user to make future recommendations. It assumes that users’ preference to be by and large constant.