if (!require("knitr")) install.packages("knitr")
if (!require("tidyverse")) install.packages("tidyverse")
if (!require("kableExtra")) install.packages("kableExtra")
if (!require("dplyr")) install.packages("dplyr")
if (!require("ggrepel")) install.packages("ggrepel")
if (!require("recommenderlab")) install.packages("recommenderlab")
if (!require("tictoc")) install.packages("tictoc")
I used MovieLens small datasets: 100,000 ratings and 3,600 tag applications applied to 9,000 movies by 600 users.
| userId | movieId | rating | timestamp |
|---|---|---|---|
| 1 | 1 | 4 | 964982703 |
| 1 | 3 | 4 | 964981247 |
| 1 | 6 | 4 | 964982224 |
| 1 | 47 | 5 | 964983815 |
| 1 | 50 | 5 | 964982931 |
| 1 | 70 | 3 | 964982400 |
| movieId | title | genres |
|---|---|---|
| 1 | Toy Story (1995) | Adventure|Animation|Children|Comedy|Fantasy |
| 2 | Jumanji (1995) | Adventure|Children|Fantasy |
| 3 | Grumpier Old Men (1995) | Comedy|Romance |
| 4 | Waiting to Exhale (1995) | Comedy|Drama|Romance |
| 5 | Father of the Bride Part II (1995) | Comedy |
| 6 | Heat (1995) | Action|Crime|Thriller |
I used realRatingMatrix from ‘recommenderlab’ to transform data.
ratings_data$userId <- as.factor(ratings_data$userId)
UI <- as(ratings_data, "realRatingMatrix")
dim(UI@data)## [1] 610 9724
610 X 9724 Dimensions
Split the dataset into training set (80%) and testing set (20%).
I build an User Based collaborative filtering Using your training data, let’s create a model using method UBCF .
tic("UBCF Model - Training")
UBCF_model <- Recommender(train, method = "UBCF", parameter = NULL)
toc(log = TRUE, quiet = TRUE)
tic("UBCF Model - Predicting")
pred_UBCF <- predict(UBCF_model, newdata = known, n = 6 ,type = "ratings")
toc(log = TRUE, quiet = TRUE)
(accUBCF <- calcPredictionAccuracy(pred_UBCF, unknown))## RMSE MSE MAE
## 0.9269575 0.8592501 0.7216235
we will build a SVD Model in order to compare this model with UBCF Model
tic("SVD Model - Training")
SVD_model <- Recommender(train, method = "SVD", parameter = list(k = 20))
toc(log = TRUE, quiet = TRUE)
tic("SVD Model - Predicting")
pred_SVD <- predict(SVD_model, newdata = known, type = "ratings")
toc(log = TRUE, quiet = TRUE)
( accSVD <- calcPredictionAccuracy(pred_SVD, unknown) )## RMSE MSE MAE
## 0.9311356 0.8670135 0.7249104
To conclude, RMSE is very similar to the UBCF Model
Now let us evaluate our predictions by seeing the prediction matrix of a particular user.
mov_rated <- as.data.frame(UI@data[c("17"), ])
colnames(mov_rated) <- c("rating")
mov_rated$movieId <- as.integer(rownames(mov_rated))
mov_rated <- mov_rated %>% filter(rating != 0) %>%
inner_join (movie_data, by="movieId") %>%
arrange(rating) %>%
select(Movie = "title", rating)
knitr::kable(mov_rated, format = "html") %>%
kable_styling(bootstrap_options = c("striped", "hover"), full_width = F)| Movie | rating |
|---|---|
| Sling Blade (1996) | 3.0 |
| Donnie Brasco (1997) | 3.0 |
| Mortal Kombat (1995) | 3.5 |
| Apollo 13 (1995) | 3.5 |
| Léon: The Professional (a.k.a. The Professional) (Léon) (1994) | 3.5 |
| Blade Runner (1982) | 3.5 |
| Singin’ in the Rain (1952) | 3.5 |
| Some Like It Hot (1959) | 3.5 |
| Days of Thunder (1990) | 3.5 |
| Princess Bride, The (1987) | 3.5 |
| Once Upon a Time in the West (C’era una volta il West) (1968) | 3.5 |
| Star Trek III: The Search for Spock (1984) | 3.5 |
| Seven Samurai (Shichinin no samurai) (1954) | 3.5 |
| Ghostbusters II (1989) | 3.5 |
| Double Indemnity (1944) | 3.5 |
| Mad Max (1979) | 3.5 |
| Mystic River (2003) | 3.5 |
| WALL·E (2008) | 3.5 |
| Wrestler, The (2008) | 3.5 |
| Exit Through the Gift Shop (2010) | 3.5 |
| Fighter, The (2010) | 3.5 |
| Seven (a.k.a. Se7en) (1995) | 4.0 |
| Taxi Driver (1976) | 4.0 |
| Independence Day (a.k.a. ID4) (1996) | 4.0 |
| Rear Window (1954) | 4.0 |
| Die Hard (1988) | 4.0 |
| Reservoir Dogs (1992) | 4.0 |
| Doors, The (1991) | 4.0 |
| Sting, The (1973) | 4.0 |
| Chinatown (1974) | 4.0 |
| Shining, The (1980) | 4.0 |
| Stand by Me (1986) | 4.0 |
| Deer Hunter, The (1978) | 4.0 |
| Indiana Jones and the Last Crusade (1989) | 4.0 |
| Beavis and Butt-Head Do America (1996) | 4.0 |
| Labyrinth (1986) | 4.0 |
| Goonies, The (1985) | 4.0 |
| Untouchables, The (1987) | 4.0 |
| Sixth Sense, The (1999) | 4.0 |
| American Beauty (1999) | 4.0 |
| White Men Can’t Jump (1992) | 4.0 |
| Outlaw Josey Wales, The (1976) | 4.0 |
| Memento (2000) | 4.0 |
| Bill & Ted’s Excellent Adventure (1989) | 4.0 |
| Professional, The (Le professionnel) (1981) | 4.0 |
| My Neighbor Totoro (Tonari no Totoro) (1988) | 4.0 |
| Enter the Dragon (1973) | 4.0 |
| Old Boy (2003) | 4.0 |
| Batman Begins (2005) | 4.0 |
| Departed, The (2006) | 4.0 |
| Inglourious Basterds (2009) | 4.0 |
| Toy Story 3 (2010) | 4.0 |
| Social Network, The (2010) | 4.0 |
| True Grit (2010) | 4.0 |
| Toy Story (1995) | 4.5 |
| Usual Suspects, The (1995) | 4.5 |
| Braveheart (1995) | 4.5 |
| Jurassic Park (1993) | 4.5 |
| Schindler’s List (1993) | 4.5 |
| Terminator 2: Judgment Day (1991) | 4.5 |
| Dances with Wolves (1990) | 4.5 |
| Batman (1989) | 4.5 |
| Silence of the Lambs, The (1991) | 4.5 |
| Dr. Strangelove or: How I Learned to Stop Worrying and Love the Bomb (1964) | 4.5 |
| Alice in Wonderland (1951) | 4.5 |
| Platoon (1986) | 4.5 |
| One Flew Over the Cuckoo’s Nest (1975) | 4.5 |
| Raiders of the Lost Ark (Indiana Jones and the Raiders of the Lost Ark) (1981) | 4.5 |
| Goodfellas (1990) | 4.5 |
| Godfather: Part II, The (1974) | 4.5 |
| Full Metal Jacket (1987) | 4.5 |
| Raging Bull (1980) | 4.5 |
| Back to the Future (1985) | 4.5 |
| Saving Private Ryan (1998) | 4.5 |
| American History X (1998) | 4.5 |
| Office Space (1999) | 4.5 |
| Fight Club (1999) | 4.5 |
| Trading Places (1983) | 4.5 |
| Green Mile, The (1999) | 4.5 |
| For a Few Dollars More (Per qualche dollaro in più) (1965) | 4.5 |
| Requiem for a Dream (2000) | 4.5 |
| Lord of the Rings: The Fellowship of the Ring, The (2001) | 4.5 |
| City of God (Cidade de Deus) (2002) | 4.5 |
| Lord of the Rings: The Return of the King, The (2003) | 4.5 |
| Howl’s Moving Castle (Hauru no ugoku shiro) (2004) | 4.5 |
| Pan’s Labyrinth (Laberinto del fauno, El) (2006) | 4.5 |
| Dark Knight, The (2008) | 4.5 |
| Star Trek (2009) | 4.5 |
| Inception (2010) | 4.5 |
| Star Wars: Episode IV - A New Hope (1977) | 5.0 |
| Pulp Fiction (1994) | 5.0 |
| Shawshank Redemption, The (1994) | 5.0 |
| Forrest Gump (1994) | 5.0 |
| Godfather, The (1972) | 5.0 |
| Star Wars: Episode V - The Empire Strikes Back (1980) | 5.0 |
| Good, the Bad and the Ugly, The (Buono, il brutto, il cattivo, Il) (1966) | 5.0 |
| Clockwork Orange, A (1971) | 5.0 |
| Star Wars: Episode VI - Return of the Jedi (1983) | 5.0 |
| Terminator, The (1984) | 5.0 |
| Big Lebowski, The (1998) | 5.0 |
| Matrix, The (1999) | 5.0 |
| Spirited Away (Sen to Chihiro no kamikakushi) (2001) | 5.0 |
| Lord of the Rings: The Two Towers, The (2002) | 5.0 |
| Laputa: Castle in the Sky (Tenkû no shiro Rapyuta) (1986) | 5.0 |
| Nausicaä of the Valley of the Wind (Kaze no tani no Naushika) (1984) | 5.0 |
Here, let’s see for user 17th
UCBF_recommend <- as.data.frame(pred_UBCF@data[17, ])
colnames(UCBF_recommend) <- c("Rating")
UCBF_recommend$movieId <- as.integer(rownames(UCBF_recommend))
UCBF_recommend <- UCBF_recommend %>% arrange(desc(Rating)) %>% head(6) %>%
inner_join (movie_data, by="movieId") %>%
select(Movie = "title")
knitr::kable(UCBF_recommend, format = "html") %>%
kable_styling(bootstrap_options = c("striped", "hover"), full_width = F)| Movie |
|---|
| Matrix, The (1999) |
| Lord of the Rings: The Fellowship of the Ring, The (2001) |
| Shawshank Redemption, The (1994) |
| Lord of the Rings: The Two Towers, The (2002) |
| Reservoir Dogs (1992) |
| Silence of the Lambs, The (1991) |
SVD_recommend <- as.data.frame(pred_SVD@data[17, ])
colnames(SVD_recommend) <- c("rating")
SVD_recommend$movieId <- as.integer(rownames(SVD_recommend))
SVD_recommend <- SVD_recommend %>% arrange(desc(rating)) %>% head(6) %>%
inner_join (movie_data, by="movieId") %>%
select(Movie = "title")
knitr::kable(SVD_recommend, format = "html") %>%
kable_styling(bootstrap_options = c("striped", "hover"), full_width = F)| Movie |
|---|
| Terminator 2: Judgment Day (1991) |
| Matrix, The (1999) |
| Star Wars: Episode III - Revenge of the Sith (2005) |
| Star Wars: Episode II - Attack of the Clones (2002) |
| Star Wars: Episode VI - Return of the Jedi (1983) |
| Raiders of the Lost Ark (Indiana Jones and the Raiders of the Lost Ark) (1981) |
One major difference between SVD and UBCF Model is their run-times.
Let’s explore their log displays to individually analyze their run-time.
log <- as.data.frame(unlist(tic.log(format = TRUE)))
colnames(log) <- c("Run Time")
knitr::kable(log, format = "html") %>%
kable_styling(bootstrap_options = c("striped", "hover"), full_width = F)| Run Time |
|---|
| UBCF Model - Training: 0.04 sec elapsed |
| UBCF Model - Predicting: 5.42 sec elapsed |
| SVD Model - Training: 1.57 sec elapsed |
| SVD Model - Predicting: 0.57 sec elapsed |
As we can see from the log display of both the models:
UBCF takes less time to build a model, but takes more resources making predictions while SVD model is the opposite - resource intensive to build a model, but quick to make predictions.