The goal of this assignment is for you to try out different ways of implementing and configuring a recommender, and to evaluate your different approaches.
Implement at least two of these recommendation algorithms
• Content-Based Filtering
• User-User Collaborative Filtering
• Item-Item Collaborative Filtering
data("MovieLense")
dim(MovieLense@data)
## [1] 943 1664
Let's do the Collaborative filtering:
\[User-User\space Similarity\]
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
|---|---|---|---|---|---|---|---|---|---|
| 0.0000000 | 0.9605820 | 0.8339504 | 0.9192637 | 0.9326136 | 0.9541710 | 0.9446653 | 0.9775049 | 0.9764039 | 0.9683044 |
| 0.9605820 | 0.0000000 | 0.9268716 | 0.9370341 | 0.9848027 | 0.9543931 | 0.9670869 | 0.9586588 | 0.9400556 | 0.9826137 |
| 0.8339504 | 0.9268716 | 0.0000000 | 0.9130323 | 1.0000000 | 0.8668857 | 0.8738971 | 0.8958898 | 0.9191450 | 0.9024436 |
| 0.9192637 | 0.9370341 | 0.9130323 | 0.0000000 | 0.9946918 | 0.9238226 | 0.8823323 | 0.9765327 | 0.9938837 | 0.9691166 |
| 0.9326136 | 0.9848027 | 1.0000000 | 0.9946918 | 0.0000000 | 0.9336269 | 0.9081632 | 0.9412276 | 0.8809189 | 0.9401144 |
| 0.9541710 | 0.9543931 | 0.8668857 | 0.9238226 | 0.9336269 | 0.0000000 | 0.9605997 | 0.9775950 | 0.9427809 | 0.9784868 |
| 0.9446653 | 0.9670869 | 0.8738971 | 0.8823323 | 0.9081632 | 0.9605997 | 0.0000000 | 0.9561888 | 0.9394858 | 0.9780770 |
| 0.9775049 | 0.9586588 | 0.8958898 | 0.9765327 | 0.9412276 | 0.9775950 | 0.9561888 | 0.0000000 | 0.8990017 | 0.9752306 |
| 0.9764039 | 0.9400556 | 0.9191450 | 0.9938837 | 0.8809189 | 0.9427809 | 0.9394858 | 0.8990017 | 0.0000000 | 0.9880545 |
| 0.9683044 | 0.9826137 | 0.9024436 | 0.9691166 | 0.9401144 | 0.9784868 | 0.9780770 | 0.9752306 | 0.9880545 | 0.0000000 |
\[Item-Item\space Similarity\]
| Toy Story (1995) | GoldenEye (1995) | Four Rooms (1995) | Get Shorty (1995) | Copycat (1995) | Shanghai Triad (Yao a yao yao dao waipo qiao) (1995) | Twelve Monkeys (1995) | Babe (1995) | Dead Man Walking (1995) | Richard III (1995) | |
|---|---|---|---|---|---|---|---|---|---|---|
| Toy Story (1995) | 0.0000000 | 0.9487374 | 0.9132997 | 0.9429069 | 0.9613638 | 0.9551194 | 0.9489155 | 0.9600459 | 0.9387445 | 0.9430394 |
| GoldenEye (1995) | 0.9487374 | 0.0000000 | 0.9088797 | 0.9394926 | 0.9426876 | 0.9550903 | 0.9411770 | 0.9499076 | 0.9145017 | 0.9389799 |
| Four Rooms (1995) | 0.9132997 | 0.9088797 | 0.0000000 | 0.8991940 | 0.9424719 | 0.9683641 | 0.9208737 | 0.8787096 | 0.9084892 | 0.9269418 |
| Get Shorty (1995) | 0.9429069 | 0.9394926 | 0.8991940 | 0.0000000 | 0.8919936 | 0.9190369 | 0.9484601 | 0.9539981 | 0.9497018 | 0.9582736 |
| Copycat (1995) | 0.9613638 | 0.9426876 | 0.9424719 | 0.8919936 | 0.0000000 | 0.9962406 | 0.9359823 | 0.9452349 | 0.9340369 | 0.9041944 |
| Shanghai Triad (Yao a yao yao dao waipo qiao) (1995) | 0.9551194 | 0.9550903 | 0.9683641 | 0.9190369 | 0.9962406 | 0.0000000 | 0.9072989 | 0.8613908 | 0.9517965 | 0.9405701 |
| Twelve Monkeys (1995) | 0.9489155 | 0.9411770 | 0.9208737 | 0.9484601 | 0.9359823 | 0.9072989 | 0.0000000 | 0.9595148 | 0.9503334 | 0.9494393 |
| Babe (1995) | 0.9600459 | 0.9499076 | 0.8787096 | 0.9539981 | 0.9452349 | 0.8613908 | 0.9595148 | 0.0000000 | 0.9611934 | 0.9681040 |
| Dead Man Walking (1995) | 0.9387445 | 0.9145017 | 0.9084892 | 0.9497018 | 0.9340369 | 0.9517965 | 0.9503334 | 0.9611934 | 0.0000000 | 0.9476115 |
| Richard III (1995) | 0.9430394 | 0.9389799 | 0.9269418 | 0.9582736 | 0.9041944 | 0.9405701 | 0.9494393 | 0.9681040 | 0.9476115 | 0.0000000 |
ratings <- as.vector(MovieLense@data)
unique(ratings)
## [1] 5 4 0 3 1 2
table_ratings <- table(ratings)
table_ratings
## ratings
## 0 1 2 3 4 5
## 1469760 6059 11307 27002 33947 21077
#remove 0s since these are missing data
ratings <- ratings[ratings != 0]
ratings <- factor(ratings)
qplot(ratings) + ggtitle("Distribution of Ratings")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Let's put a threshold of 100 so that we only include the relevant average ratings:
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
working_data <- MovieLense[rowCounts(MovieLense) > 50, colCounts(MovieLense) > 100]
#Normalize the data
working_data <- normalize(working_data)
#split training and test data
set.seed(100)
train_index <- sample(x = c(T,F), size = nrow(working_data), replace = T, prob = c(0.75,0.25))
#set train and test sets
train <- MovieLense[train_index,]
test <- MovieLense[!train_index,]
#k = 30
#create IBCF recommender
rec_IBCF <- Recommender(data = train, method = 'IBCF', parameter = list(k = 30))
#predict
predict_IBCF <- predict(object = rec_IBCF, newdata = test, n=5)
#recommendations for the first 5 people in test set
predict_IBCF %>% as("list") %>% head(5)
## $`7`
## [1] "Shanghai Triad (Yao a yao yao dao waipo qiao) (1995)"
## [2] "Flipper (1996)"
## [3] "Shall We Dance? (1996)"
## [4] "Pillow Book, The (1995)"
## [5] "In the Company of Men (1997)"
##
## $`12`
## [1] "Brother Minister: The Assassination of Malcolm X (1994)"
## [2] "Maya Lin: A Strong Clear Vision (1994)"
## [3] "Blue Angel, The (Blaue Engel, Der) (1930)"
## [4] "Wishmaster (1997)"
## [5] "Leave It to Beaver (1997)"
##
## $`15`
## [1] "Dolores Claiborne (1994)" "Madness of King George, The (1994)"
## [3] "Welcome to the Dollhouse (1995)" "20,000 Leagues Under the Sea (1954)"
## [5] "Bedknobs and Broomsticks (1971)"
##
## $`27`
## [1] "North (1994)"
## [2] "Losing Chase (1996)"
## [3] "Brother Minister: The Assassination of Malcolm X (1994)"
## [4] "Horseman on the Roof, The (Hussard sur le toit, Le) (1995)"
## [5] "Miracle on 34th Street (1994)"
##
## $`28`
## [1] "Playing God (1997)" "Across the Sea of Time (1995)"
## [3] "Boys Life (1995)" "Orlando (1993)"
## [5] "Rent-a-Kid (1995)"
#create UBCF recommender
rec_UBCF <- Recommender(data = train, method = 'UBCF')
#predict
predict_UBCF <- predict(rec_UBCF, newdata = test,n=5)
#recommendations for the first 5 people in test set
predict_UBCF %>% as("list") %>% head(5)
## $`7`
## [1] "Titanic (1997)" "Good Will Hunting (1997)"
## [3] "L.A. Confidential (1997)" "As Good As It Gets (1997)"
## [5] "Apt Pupil (1998)"
##
## $`12`
## [1] "Titanic (1997)" "Scream (1996)"
## [3] "Good Will Hunting (1997)" "As Good As It Gets (1997)"
## [5] "Full Monty, The (1997)"
##
## $`15`
## [1] "Alien (1979)" "Silence of the Lambs, The (1991)"
## [3] "Fargo (1996)" "Empire Strikes Back, The (1980)"
## [5] "Graduate, The (1967)"
##
## $`27`
## [1] "Contact (1997)" "Good Will Hunting (1997)"
## [3] "L.A. Confidential (1997)" "Game, The (1997)"
## [5] "Aliens (1986)"
##
## $`28`
## [1] "Full Monty, The (1997)" "Wag the Dog (1997)"
## [3] "Sense and Sensibility (1995)" "Cold Comfort Farm (1995)"
## [5] "Evita (1996)"
One problem with USER-USER similarity is that the user preferences change over time. If a user liked some item one year ago then chances are, he/she might not like the same item today or in the future. As a workaround, one way to solve it is to use recent ratings, say, ratings that is at least 3 months of data. However, one drawback to only using the recent data is that the USER-ITEM matrix might make it more sparser.
On the the contrary, in the ITEM-ITEM similarity, one key advantage of it is that the ratings on a given item do not change significantly after initial period.
Since in this project, the users are more than items and item ratings do not change much over time after the initial period, ITEM-ITEM similarity based Recommender System is more preferable over USER-USER based Recommender System.