Project 2 | Content-Based and Collaborative Filtering
Project 2 | Content-Based and Collaborative Filtering
Project Objectives
For assignment 2, start with an existing dataset of user-item ratings, such as our toy books dataset, MovieLens, Jester [http://eigentaste.berkeley.edu/dataset/] or another dataset of your choosing.
Implement at least two of these recommendation algorithms:
• Content-Based Filtering
• User-User Collaborative Filtering
• Item-Item Collaborative Filtering
You should evaluate and compare different approaches, using different algorithms, normalization techniques, similarity methods, neighborhood sizes, etc. You don’t need to be exhaustive—these are just some suggested possibilities.
You may use the course text’s recommenderlab or any other library that you want. Please provide at least one graph, and a textual summary of your findings and recommendations.
Libraries
Data Preparation and Exploration
We gathered data from section “recommended for education and development” of site https://grouplens.org/datasets/movielens/. This site provides two links, from which we chose the link for the smaller file, because the larger one (named as Full) is too large to load into github. Description of the data is as follows:
This dataset (ml-latest-small) describes 5-star rating and free-text tagging activity from MovieLens, a movie recommendation service. It contains 100836 ratings and 3683 tag applications across 9742 movies. These data were created by 610 users between March 29, 1996 and September 24, 2018. This dataset was generated on September 26, 2018. There are 4 *.csv files, from which we chose two files movies.cv and ratings.csv, for our down stream analysis.
Citation
F. Maxwell Harper and Joseph A. Konstan. 2015. The MovieLens Datasets: History and Context. ACM Transactions on Interactive Intelligent Systems (TiiS) 5, 4: 19:1–19:19. https://doi.org/10.1145/2827872
Load Data
Preview data
#Preview movies data
kable(head(movies, n = 10L)) %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive")) %>%
row_spec(0, bold = T, color = "white", background = "#fc5e5e") %>%
scroll_box(width = "100%", height = "200px")| movieId | title | genres |
|---|---|---|
| 1 | Toy Story (1995) | Adventure|Animation|Children|Comedy|Fantasy |
| 2 | Jumanji (1995) | Adventure|Children|Fantasy |
| 3 | Grumpier Old Men (1995) | Comedy|Romance |
| 4 | Waiting to Exhale (1995) | Comedy|Drama|Romance |
| 5 | Father of the Bride Part II (1995) | Comedy |
| 6 | Heat (1995) | Action|Crime|Thriller |
| 7 | Sabrina (1995) | Comedy|Romance |
| 8 | Tom and Huck (1995) | Adventure|Children |
| 9 | Sudden Death (1995) | Action |
| 10 | GoldenEye (1995) | Action|Adventure|Thriller |
#Preview ratings data
kable(head(ratings, n = 10L)) %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive")) %>%
row_spec(0, bold = T, color = "white", background = "#fc5e5e") %>%
scroll_box(width = "100%", height = "200px")| userId | movieId | rating | timestamp |
|---|---|---|---|
| 1 | 1 | 4 | 964982703 |
| 1 | 3 | 4 | 964981247 |
| 1 | 6 | 4 | 964982224 |
| 1 | 47 | 5 | 964983815 |
| 1 | 50 | 5 | 964982931 |
| 1 | 70 | 3 | 964982400 |
| 1 | 101 | 5 | 964980868 |
| 1 | 110 | 4 | 964982176 |
| 1 | 151 | 5 | 964984041 |
| 1 | 157 | 5 | 964984100 |
Combine Data
Join movies with ratings on movieId
movie_ratings <- merge(ratings, movies, by="movieId")
#Preview ratings data
kable(head(movie_ratings, n = 10L)) %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive")) %>%
row_spec(0, bold = T, color = "white", background = "#fc5e5e") %>%
scroll_box(width = "100%", height = "200px")| movieId | userId | rating | timestamp | title | genres |
|---|---|---|---|---|---|
| 1 | 1 | 4.0 | 964982703 | Toy Story (1995) | Adventure|Animation|Children|Comedy|Fantasy |
| 1 | 555 | 4.0 | 978746159 | Toy Story (1995) | Adventure|Animation|Children|Comedy|Fantasy |
| 1 | 232 | 3.5 | 1076955621 | Toy Story (1995) | Adventure|Animation|Children|Comedy|Fantasy |
| 1 | 590 | 4.0 | 1258420408 | Toy Story (1995) | Adventure|Animation|Children|Comedy|Fantasy |
| 1 | 601 | 4.0 | 1521467801 | Toy Story (1995) | Adventure|Animation|Children|Comedy|Fantasy |
| 1 | 179 | 4.0 | 852114051 | Toy Story (1995) | Adventure|Animation|Children|Comedy|Fantasy |
| 1 | 606 | 2.5 | 1349082950 | Toy Story (1995) | Adventure|Animation|Children|Comedy|Fantasy |
| 1 | 328 | 5.0 | 1494210665 | Toy Story (1995) | Adventure|Animation|Children|Comedy|Fantasy |
| 1 | 206 | 5.0 | 850763267 | Toy Story (1995) | Adventure|Animation|Children|Comedy|Fantasy |
| 1 | 468 | 4.0 | 831400444 | Toy Story (1995) | Adventure|Animation|Children|Comedy|Fantasy |
Create Matrix
As opposed to MovieLense of the recommenderlab, our dataset does not come as member of class realRatingMatrix. So, in the following code-chunk, we’ll create a realRatingMatrix dataset, called moviematrix. By putting moviematrix into class realRatingMatrix, we’ll be able to apply some useful functions on moviematrix (refer: page 33 of Building a Recommendation System with R).
moviematrix <- ratings %>% select(-timestamp) %>% spread(movieId, rating)
row.names(moviematrix) <- moviematrix[, 1]
moviematrix <- moviematrix[-c(1)]
moviematrix <- as(as.matrix(moviematrix), "realRatingMatrix")
moviematrix## 610 x 9724 rating matrix of class 'realRatingMatrix' with 100836 ratings.
At this point, we’ll take stalk of the important characteristics of moviematrix.
## [1] 610 9724
## user item rating
## 1 1 1 4
## 326 1 3 4
## 434 1 6 4
## 2108 1 47 5
## 2380 1 50 5
## 2860 1 70 3
Data Visualization
Exploring the values of the rating
# Vectorize and create unique vector.
vector_ratings <- as.vector(moviematrix@data)
unique(vector_ratings)## [1] 4.0 0.0 4.5 2.5 3.5 3.0 5.0 0.5 2.0 1.5 1.0
# The ratings are in the range 0-5. Let's count the occurrences of each of them.
table_ratings <- table(vector_ratings)
kable(table_ratings)| vector_ratings | Freq |
|---|---|
| 0 | 5830804 |
| 0.5 | 1370 |
| 1 | 2811 |
| 1.5 | 1791 |
| 2 | 7551 |
| 2.5 | 5550 |
| 3 | 20047 |
| 3.5 | 13136 |
| 4 | 26818 |
| 4.5 | 8551 |
| 5 | 13211 |
Rating equal to 0 represents a missing value, so we’ll purge out the zero-ratings from vector_ratings.
vector_ratings <- vector_ratings[vector_ratings != 0]
vector_ratings <- factor(vector_ratings)
qplot(vector_ratings) + ggtitle("Distribution of the ratings")Exploring which movies have been viewed
table_views <- data.frame(
movie = names(views_per_movie),
views = views_per_movie
)
names(table_views)[names(table_views) == "movie"] <- "movieId"
table_views <- merge(table_views, movies, by="movieId")
table_views <- table_views[order(table_views$views, decreasing =
TRUE), ]ggplot(table_views[1:6, ], aes(x = title, y = views)) +
geom_bar(stat="identity") + theme(axis.text.x =
element_text(angle = 45, hjust = 1)) + ggtitle("Number of views
of the top movies")Exploring the average ratings
average_ratings <- colMeans(moviematrix)
qplot(average_ratings, fill=..count.., geom="histogram",binwidth = 0.1, main= "Distribution of average movie rating", xlab = "Average Rating", ylab = "Count") Selecting the most relevant data
When we explored the data, we noticed that the table contains
- Movies that have been viewed only a few times. Therefore, their ratings might be biased. So, we’ll keep movies that have been watched at least 50 times.
- Users, who rated only a few movies. Therefore, their ratings might be biased too. So, we’ll keep users, who have rated at least 50 movies
# rowCounts returns the users
# colCounts returns the movies
(moviematrix <- moviematrix[rowCounts(moviematrix) > 50, colCounts(moviematrix) > 50])## 378 x 436 rating matrix of class 'realRatingMatrix' with 36214 ratings.
## [1] 378 436
Now we have 378 users and 436 items with 36214 ratings.
Let’s build the chart again:
average_ratings <- colMeans(moviematrix)
# Histogram: average rating per user
qplot(average_ratings, fill=..count.., geom="histogram",binwidth = 0.1, main= "Histogram of average rating per user", xlab = "Average Rating", ylab = "No. of Ratings") Data Normalization
movie_Normalization <- normalize(moviematrix)
avg <- round(rowMeans(movie_Normalization), 5)
table(avg)## avg
## 0
## 378
min_Items <- quantile(rowCounts(moviematrix), 0.95)
min_Users <- quantile(colCounts(moviematrix), 0.95)
image(moviematrix[rowCounts(moviematrix) > min_Items, colCounts(moviematrix) > min_Users],
main = "Heatmap of the Top Users and Movies (Non-Normalized")image(movie_Normalization[rowCounts(movie_Normalization) > min_Items, colCounts(movie_Normalization) >
min_Users], main = "Heatmap of the Top Users and Movies (Normalized)")Recommendation algorithms
Split the dataset into training set (80%) and testing set (20%):
Item-Item Collaborative Filtering
This is a filtering method, where similarity between items is calculated using users’ ratings of items. That means the algorithm recommends items similar to the users’ previous selections. In the algorithm, the similarities between different items are computed by one of the similarity measures, and then similarity values are used to predict ratings for user-item pairs absent in the data.
Training model
In below step we’ll train the model, with a value of k = 30, which is the default.
## Recommender of type 'IBCF' for 'realRatingMatrix'
## learned using 302 users.
## Recommender of type 'IBCF' for 'realRatingMatrix'
## learned using 302 users.
Examining the Similarity Matrix
similarityMatrix <- getModel(model)$sim
which_max <- order(colSums(similarityMatrix > 0), decreasing = TRUE)[1:10]
topMovies <- as.data.frame(as.integer(rownames(similarityMatrix)[which_max]))
colnames(topMovies) <- c("movieId")
data <- topMovies %>% inner_join(movies, by = "movieId") %>% select(Movie = "title")
kable((data)) %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive")) %>%
row_spec(0, bold = T, color = "white", background = "#fc5e5e") %>%
scroll_box(width = "100%", height = "200px")| Movie |
|---|
| Disclosure (1994) |
| Piano, The (1993) |
| City Slickers II: The Legend of Curly’s Gold (1994) |
| Congo (1995) |
| Broken Arrow (1996) |
| Wild Wild West (1999) |
| First Knight (1995) |
| Eraser (1996) |
| Coneheads (1993) |
| Beverly Hills Cop III (1994) |
Recommendations using test set
## Recommendations as 'topNList' with n = 6 for 76 users.
user1 <- as.data.frame(movie_Test@data[1, movie_Test@data[1, ] > 0])
colnames(user1) <- c("Rating")
user1[c("movieId")] <- as.integer(rownames(user1))
data <- movies %>% inner_join(user1, by = "movieId") %>% select(Movie = "title", Rating, genres) %>% arrange(desc(Rating))
kable((data)) %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive")) %>%
row_spec(0, bold = T, color = "white", background = "#fc5e5e") %>%
scroll_box(width = "100%", height = "200px")| Movie | Rating | genres |
|---|---|---|
| Shawshank Redemption, The (1994) | 5.0 | Crime|Drama |
| Forrest Gump (1994) | 5.0 | Comedy|Drama|Romance|War |
| Blade Runner (1982) | 5.0 | Action|Sci-Fi|Thriller |
| One Flew Over the Cuckoo’s Nest (1975) | 5.0 | Drama |
| Hook (1991) | 5.0 | Adventure|Comedy|Fantasy |
| Kill Bill: Vol. 2 (2004) | 5.0 | Action|Drama|Thriller |
| Casino Royale (2006) | 5.0 | Action|Adventure|Thriller |
| Crouching Tiger, Hidden Dragon (Wo hu cang long) (2000) | 4.5 | Action|Drama|Romance |
| Traffic (2000) | 4.5 | Crime|Drama|Thriller |
| Mulholland Drive (2001) | 4.5 | Crime|Drama|Film-Noir|Mystery|Thriller |
| Bowling for Columbine (2002) | 4.5 | Documentary |
| Interview with the Vampire: The Vampire Chronicles (1994) | 4.0 | Drama|Horror |
| Gladiator (2000) | 4.0 | Action|Adventure|Drama |
| Chicken Run (2000) | 4.0 | Animation|Children|Comedy |
| Best in Show (2000) | 4.0 | Comedy |
| Lost in Translation (2003) | 4.0 | Comedy|Drama|Romance |
| Mystic River (2003) | 4.0 | Crime|Drama|Mystery |
| Kill Bill: Vol. 1 (2003) | 4.0 | Action|Crime|Thriller |
| Incredibles, The (2004) | 4.0 | Action|Adventure|Animation|Children|Comedy |
| Prestige, The (2006) | 4.0 | Drama|Mystery|Sci-Fi|Thriller |
| No Country for Old Men (2007) | 4.0 | Crime|Drama |
| Inglourious Basterds (2009) | 4.0 | Action|Drama|War |
| Fight Club (1999) | 3.5 | Action|Crime|Drama|Thriller |
| Monsters, Inc. (2001) | 3.5 | Adventure|Animation|Children|Comedy|Fantasy |
| Royal Tenenbaums, The (2001) | 3.5 | Comedy|Drama |
| Beautiful Mind, A (2001) | 3.5 | Drama|Romance |
| Bourne Identity, The (2002) | 3.5 | Action|Mystery|Thriller |
| Finding Nemo (2003) | 3.5 | Adventure|Animation|Children|Comedy |
| Eternal Sunshine of the Spotless Mind (2004) | 3.5 | Drama|Romance|Sci-Fi |
| Superbad (2007) | 3.5 | Comedy |
| Avatar (2009) | 3.5 | Action|Adventure|Sci-Fi|IMAX |
| Godfather, The (1972) | 3.0 | Crime|Drama |
| Memento (2000) | 3.0 | Mystery|Thriller |
| Shrek (2001) | 3.0 | Adventure|Animation|Children|Comedy|Fantasy|Romance |
| Dark Knight, The (2008) | 3.0 | Action|Crime|Drama|IMAX |
| O Brother, Where Art Thou? (2000) | 2.5 | Adventure|Comedy|Crime |
| Pirates of the Caribbean: The Curse of the Black Pearl (2003) | 2.5 | Action|Adventure|Comedy|Fantasy |
| Batman Begins (2005) | 2.5 | Action|Crime|IMAX |
| Departed, The (2006) | 2.5 | Crime|Drama|Thriller |
| Bourne Ultimatum, The (2007) | 2.5 | Action|Crime|Thriller |
| Million Dollar Baby (2004) | 2.0 | Drama |
| WALL·E (2008) | 2.0 | Adventure|Animation|Children|Romance|Sci-Fi |
| Up (2009) | 2.0 | Adventure|Animation|Children|Drama |
| Donnie Darko (2001) | 1.5 | Drama|Mystery|Sci-Fi|Thriller |
| 28 Days Later (2002) | 1.5 | Action|Horror|Sci-Fi |
| Pan’s Labyrinth (Laberinto del fauno, El) (2006) | 1.5 | Drama|Fantasy|Thriller |
| Ratatouille (2007) | 1.5 | Animation|Children|Drama |
| Lord of the Rings: The Fellowship of the Ring, The (2001) | 1.0 | Adventure|Fantasy |
| Lord of the Rings: The Return of the King, The (2003) | 1.0 | Action|Adventure|Drama|Fantasy |
| High Fidelity (2000) | 0.5 | Comedy|Drama|Romance |
| Requiem for a Dream (2000) | 0.5 | Drama |
| Harry Potter and the Chamber of Secrets (2002) | 0.5 | Adventure|Fantasy |
| Big Fish (2003) | 0.5 | Drama|Fantasy|Romance |
| V for Vendetta (2006) | 0.5 | Action|Sci-Fi|Thriller|IMAX |
| Juno (2007) | 0.5 | Comedy|Drama|Romance |
| Iron Man (2008) | 0.5 | Action|Adventure|Sci-Fi |
| Slumdog Millionaire (2008) | 0.5 | Crime|Drama|Romance |
| Star Trek (2009) | 0.5 | Action|Adventure|Sci-Fi|IMAX |
| Hangover, The (2009) | 0.5 | Comedy|Crime |
| District 9 (2009) | 0.5 | Mystery|Sci-Fi|Thriller |
recommended <- pred@itemLabels[pred@items[[1]]]
recommended <- as.data.frame(as.integer(recommended))
colnames(recommended) <- c("movieId")
data <- recommended %>% inner_join(movies, by = "movieId") %>% select(Movie = "title",genres)
kable((data)) %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive")) %>%
row_spec(0, bold = T, color = "white", background = "#fc5e5e") %>%
scroll_box(width = "100%", height = "200px")| Movie | genres |
|---|---|
| Stargate (1994) | Action|Adventure|Sci-Fi |
| Robin Hood: Men in Tights (1993) | Comedy |
| Schindler’s List (1993) | Drama|War |
| Alien (1979) | Horror|Sci-Fi |
| The Devil’s Advocate (1997) | Drama|Mystery|Thriller |
| Big (1988) | Comedy|Drama|Fantasy|Romance |
User-User Collaborative Filtering
Training the model
## Recommender of type 'UBCF' for 'realRatingMatrix'
## learned using 302 users.
Recommendations using test set
## Recommendations as 'topNList' with n = 6 for 76 users.
# Recommendations for the first user
recommended <- pred@itemLabels[pred@items[[1]]]
recommended <- as.data.frame(as.integer(recommended))
colnames(recommended) <- c("movieId")
data <- recommended %>% inner_join(movies, by = "movieId") %>% select(Movie = "title",genres)
kable((data)) %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive")) %>%
row_spec(0, bold = T, color = "white", background = "#fc5e5e") %>%
scroll_box(width = "100%", height = "200px")| Movie | genres |
|---|---|
| Babe (1995) | Children|Drama |
| Fugitive, The (1993) | Thriller |
| Braveheart (1995) | Action|Drama|War |
| Lion King, The (1994) | Adventure|Animation|Children|Drama|Musical|IMAX |
| Star Wars: Episode IV - A New Hope (1977) | Action|Adventure|Sci-Fi |
| Aladdin (1992) | Adventure|Animation|Children|Comedy|Musical |
Comparison of Recommender Models
## [1] 11
evaluation = evaluationScheme(data = moviematrix, method = "cross-validation", k = 10, given = 10, goodRating = 3.5)
evaluation## Evaluation scheme with 10 items given
## Method: 'cross-validation' with 10 run(s).
## Good ratings: >=3.500000
## Data set: 378 x 436 rating matrix of class 'realRatingMatrix' with 36214 ratings.
ev_train = getData(evaluation, "train")
ev_known = getData(evaluation, "known")
ev_unknown = getData(evaluation, "unknown")# Item
item_model = Recommender(data = ev_train, method = "IBCF", parameter = list(method = "Cosine"))
item_model_pred = predict(object = item_model, newdata = ev_known, n = 10, type = "ratings")
item = calcPredictionAccuracy(x = item_model_pred, data = ev_unknown, byUser = FALSE)
# User
user_model = Recommender(data = ev_train, method = "UBCF", parameter = list(method = "Cosine"))
user_model_pred = predict(object = user_model, newdata = ev_known, n = 10, type = "ratings")
user = calcPredictionAccuracy(x = user_model_pred, data = ev_unknown, byUser = FALSE)
# Comparison
kable(rbind(item, user))%>%
kable_styling(bootstrap_options = c("striped","hover","condensed","responsive"),full_width = F,position = "left",font_size = 12) %>%
row_spec(0, background ="gray")| RMSE | MSE | MAE | |
|---|---|---|---|
| item | 1.2937300 | 1.6737372 | 0.955398 |
| user | 0.9555293 | 0.9130362 | 0.734025 |
eval_sets = evaluationScheme(data = moviematrix, method = "cross-validation", k = 4, given = 10, goodRating = 3.5)
I_results = evaluate(x = eval_sets, method = "IBCF", n = seq(10, 100, 10))## IBCF run fold/sample [model time/prediction time]
## 1 [0.35sec/0.03sec]
## 2 [0.33sec/0.03sec]
## 3 [0.34sec/0.03sec]
## 4 [0.39sec/0.03sec]
kable(head(getConfusionMatrix(I_results)[[1]]))%>%
kable_styling(bootstrap_options = c("striped","hover","condensed","responsive"),full_width = F,position = "left",font_size = 12) %>%
row_spec(0, background ="gray")| TP | FP | FN | TN | precision | recall | TPR | FPR | |
|---|---|---|---|---|---|---|---|---|
| 10 | 1.416667 | 8.583333 | 59.61458 | 356.3854 | 0.1416667 | 0.0291171 | 0.0291171 | 0.0237326 |
| 20 | 2.947917 | 17.052083 | 58.08333 | 347.9167 | 0.1473958 | 0.0564288 | 0.0564288 | 0.0469233 |
| 30 | 4.645833 | 25.354167 | 56.38542 | 339.6146 | 0.1548611 | 0.0836695 | 0.0836695 | 0.0695461 |
| 40 | 6.031250 | 33.968750 | 55.00000 | 331.0000 | 0.1507812 | 0.1065546 | 0.1065546 | 0.0933082 |
| 50 | 7.489583 | 42.510417 | 53.54167 | 322.4583 | 0.1497917 | 0.1313989 | 0.1313989 | 0.1167724 |
| 60 | 8.572917 | 51.427083 | 52.45833 | 313.5417 | 0.1428819 | 0.1471139 | 0.1471139 | 0.1413827 |
## UBCF run fold/sample [model time/prediction time]
## 1 [0sec/0.12sec]
## 2 [0sec/0.14sec]
## 3 [0sec/0.13sec]
## 4 [0sec/0.14sec]
kable(head(getConfusionMatrix(U_results)[[1]]))%>%
kable_styling(bootstrap_options = c("striped","hover","condensed","responsive"),full_width = F,position = "left",font_size = 12) %>%
row_spec(0, background ="gray")| TP | FP | FN | TN | precision | recall | TPR | FPR | |
|---|---|---|---|---|---|---|---|---|
| 10 | 2.854167 | 7.145833 | 58.17708 | 357.8229 | 0.2854167 | 0.0570086 | 0.0570086 | 0.0190046 |
| 20 | 4.875000 | 15.125000 | 56.15625 | 349.8438 | 0.2437500 | 0.0948137 | 0.0948137 | 0.0406465 |
| 30 | 6.687500 | 23.312500 | 54.34375 | 341.6562 | 0.2229167 | 0.1221262 | 0.1221262 | 0.0627652 |
| 40 | 8.645833 | 31.354167 | 52.38542 | 333.6146 | 0.2161458 | 0.1522087 | 0.1522087 | 0.0844114 |
| 50 | 10.510417 | 39.489583 | 50.52083 | 325.4792 | 0.2102083 | 0.1786933 | 0.1786933 | 0.1062689 |
| 60 | 12.197917 | 47.802083 | 48.83333 | 317.1667 | 0.2032986 | 0.2051133 | 0.2051133 | 0.1289054 |
mult_models = list(
IBCF_cos = list(name = "IBCF", param = list(method = "Cosine")),
IBCF_pearson = list(name = "IBCF", param = list(method = "pearson")),
UBCF_cos = list(name = "UBCF", param = list(method = "Cosine")),
UBCF_pearson = list(name = "UBCF", param = list(method = "pearson")),
Random = list(name = "RANDOM", param = NULL),
Popular = list(name = "POPULAR", param = NULL)
)
# Testing models
models = evaluate(eval_sets, mult_models, n= c(1, 5, seq(10, 100, 10)))## IBCF run fold/sample [model time/prediction time]
## 1 [0.42sec/0.05sec]
## 2 [0.41sec/0.02sec]
## 3 [0.57sec/0.01sec]
## 4 [0.35sec/0.03sec]
## IBCF run fold/sample [model time/prediction time]
## 1 [0.41sec/0.03sec]
## 2 [0.41sec/0.03sec]
## 3 [0.42sec/0.03sec]
## 4 [0.41sec/0.03sec]
## UBCF run fold/sample [model time/prediction time]
## 1 [0sec/0.15sec]
## 2 [0.02sec/0.12sec]
## 3 [0sec/0.12sec]
## 4 [0.02sec/0.12sec]
## UBCF run fold/sample [model time/prediction time]
## 1 [0sec/0.16sec]
## 2 [0sec/0.14sec]
## 3 [0sec/0.14sec]
## 4 [0.02sec/0.12sec]
## RANDOM run fold/sample [model time/prediction time]
## 1 [0sec/0.03sec]
## 2 [0sec/0.03sec]
## 3 [0sec/0.03sec]
## 4 [0sec/0.04sec]
## POPULAR run fold/sample [model time/prediction time]
## 1 [0sec/0.18sec]
## 2 [0sec/0.28sec]
## 3 [0sec/0.17sec]
## 4 [0sec/0.17sec]
Summary
By building the movie recommender system, we got a better understanding of how it works. The text book “Building Recommendation System with R” is not clear in some places. So, we had to google, to find out the implementation details.
The pros and cons of User based Collaborative Filtering (UBCF) and Item based Collaborative Filtering (IBCF) approaches.
Recommendations of UBCF complements the item that the user was interactibg with. Since users might not be looking for direct substitutes to a movie, UBCF provides a better recommendation than IBCF.
UBCF is memory intensive. So, with humongous number of users, processing time would be high.
UBCF relies on historial choices of user to make future recommendations. It assumes that users’ preference to be by and large constant.