MyAnimeList, also known as MAL, is an anime and manga social networking website which contains a database where users can organize and add different anime to their list. When added to a list the anime items are given a rating after being watched. This process helps in finding users who have similar tastes. This project will explore the contents of this dataset to gain insights. Later on, an item-item collaborative filtering recommeder system will be built to recommend and predict anime for users. Analysis and evaluation will be done on the recommender system to see how well it performs when recommending items.
The data was obtained from Kaggle.com and contains information from 73,516 users who may have given a rating to one of 12,294 anime items. The scores/ratings range from 1 - 10 with 10 being the best. If the rating is -1, it means that the user did not provide a rating for that item.
The goal of this project is to recommend and make predictions about a user’s taste. Specifically what a user will want to watch or buy in the future. In order to do such predictions, large amounts of user data is needed to find patterns and associate prior tastes with future choices. Often times, it is difficult to provide good recommendations when users’ information is limited. Of course it is better when users give their information explicitly but not as much as we’d like. Therefore sparsity is introduced. However, in order to produce meaningful recommendations, I propose three techniques: (1) Item-item collaborative filtering, (2) Single Value Decomposition (SVD) and (3) Hybrid Recommender System. The system will be implemented in R using a training and test set with a ratio of 80%:20% respectively. The error for each model will be reported as root mean square error (RMSE) as a measure for perfomance.
anime_ratings <- read.csv("anime-recommendations-database/rating.csv", header = T)
glimpse(anime_ratings)
## Observations: 7,813,737
## Variables: 3
## $ user_id <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,...
## $ anime_id <int> 20, 24, 79, 226, 241, 355, 356, 442, 487, 846, 936, 1...
## $ rating <int> -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -...
anime_names <- read.csv("anime-recommendations-database/anime.csv", header = T)
glimpse(anime_names)
## Observations: 12,294
## Variables: 7
## $ anime_id <int> 32281, 5114, 28977, 9253, 9969, 32935, 11061, 820, 15...
## $ name <fct> Kimi no Na wa., Fullmetal Alchemist: Brotherhood, Gin...
## $ genre <fct> "Drama, Romance, School, Supernatural", "Action, Adve...
## $ type <fct> Movie, TV, TV, TV, TV, TV, TV, OVA, Movie, TV, TV, Mo...
## $ episodes <fct> 1, 64, 51, 24, 51, 10, 148, 110, 1, 13, 24, 1, 201, 2...
## $ rating <dbl> 9.37, 9.26, 9.25, 9.17, 9.16, 9.15, 9.13, 9.11, 9.10,...
## $ members <int> 200630, 793665, 114262, 673572, 151266, 93351, 425855...
According to the description found with the data, the ratings are from 1 - 10. Notice that if a user did not rate an item, the item received a rating of -1. For simplicity, I will change -1 to NA to indicate the rating is missing. Added to that, I will aslo change the data type for some variables.
anime_ratings$rating[anime_ratings$rating == -1] <- NA
anime_sp <- anime_ratings
anime_ratings$user_id <- as.factor(anime_ratings$user_id)
anime_ratings$anime_id <- as.factor(anime_ratings$anime_id)
anime_names$anime_id <- as.factor(anime_names$anime_id)
anime_names$name <- as.character(anime_names$name)
anime_names$type <- as.character(anime_names$type)
anime_names$genre <- as.character(anime_names$genre)
Before we create a matrix to build the recommenders, let’s gather some insights from the data.
anime_names %>% arrange(desc(rating)) %>%
top_n(7) %>% kable() %>% kable_styling("striped", font_size = 10, full_width = F)
## Selecting by members
anime_id | name | genre | type | episodes | rating | members |
---|---|---|---|---|---|---|
5114 | Fullmetal Alchemist: Brotherhood | Action, Adventure, Drama, Fantasy, Magic, Military, Shounen | TV | 64 | 9.26 | 793665 |
1575 | Code Geass: Hangyaku no Lelouch | Action, Mecha, Military, School, Sci-Fi, Super Power | TV | 25 | 8.83 | 715151 |
1535 | Death Note | Mystery, Police, Psychological, Supernatural, Thriller | TV | 37 | 8.71 | 1013917 |
16498 | Shingeki no Kyojin | Action, Drama, Fantasy, Shounen, Super Power | TV | 25 | 8.54 | 896229 |
6547 | Angel Beats! | Action, Comedy, Drama, School, Supernatural | TV | 13 | 8.39 | 717796 |
11757 | Sword Art Online | Action, Adventure, Fantasy, Game, Romance | TV | 25 | 7.83 | 893100 |
20 | Naruto | Action, Comedy, Martial Arts, Shounen, Super Power | TV | 220 | 7.81 | 683297 |
anime_names %>% count(type)%>%
ggplot(aes(x = type, y = n)) +
geom_bar(stat = "identity", fill = "darkgreen" ) +
geom_text(aes(label=n), vjust= -0.6, color="black", size=3.5) +
theme_minimal()
About 25 anime items type were unknown and most of them are under the TV category.
Note
ONA - Original Net Animation (ONA) is an anime that is directly released onto the Internet
OVA - Original Video Animation (OVA) is an animated film or series made specially for release in home-video formats
anime_names %>% arrange(desc(members)) %>%
top_n(10) %>% kable() %>% kable_styling("striped", font_size = 10, full_width = F)
## Selecting by members
anime_id | name | genre | type | episodes | rating | members |
---|---|---|---|---|---|---|
1535 | Death Note | Mystery, Police, Psychological, Supernatural, Thriller | TV | 37 | 8.71 | 1013917 |
16498 | Shingeki no Kyojin | Action, Drama, Fantasy, Shounen, Super Power | TV | 25 | 8.54 | 896229 |
11757 | Sword Art Online | Action, Adventure, Fantasy, Game, Romance | TV | 25 | 7.83 | 893100 |
5114 | Fullmetal Alchemist: Brotherhood | Action, Adventure, Drama, Fantasy, Magic, Military, Shounen | TV | 64 | 9.26 | 793665 |
6547 | Angel Beats! | Action, Comedy, Drama, School, Supernatural | TV | 13 | 8.39 | 717796 |
1575 | Code Geass: Hangyaku no Lelouch | Action, Mecha, Military, School, Sci-Fi, Super Power | TV | 25 | 8.83 | 715151 |
20 | Naruto | Action, Comedy, Martial Arts, Shounen, Super Power | TV | 220 | 7.81 | 683297 |
9253 | Steins;Gate | Sci-Fi, Thriller | TV | 24 | 9.17 | 673572 |
10620 | Mirai Nikki (TV) | Action, Mystery, Psychological, Shounen, Supernatural, Thriller | TV | 26 | 8.07 | 657190 |
4224 | Toradora! | Comedy, Romance, School, Slice of Life | TV | 25 | 8.45 | 633817 |
Let’s move on to creating a User-Item matrix
## 73515 x 11200 rating matrix of class 'realRatingMatrix' with 7813730 ratings.
A lot of the data is sparse and uses a lot of memory. For instance the size of this matrix is about 99 Mb.
## 99233736 bytes
I will cut the size of the matrix down where it will only contain data for users who rated at least 500 anime shows and shows that were rated at least 1000 times.
## 1843 x 1720 rating matrix of class 'realRatingMatrix' with 967727 ratings.
## 11850056 bytes
a_ratings <- as.data.frame(table(as.vector(a_mat@data@x)))
ggplot(a_ratings, aes(x = Var1, y = Freq, fill = Var1)) +
geom_bar(stat = "identity") +
ggtitle("Distribution of Ratings for Anime Items") +
geom_text(aes(label=Freq), vjust= -0.6, color="black", size=3.5) +
theme(legend.position="none") + xlab("Rating Score") + ylab("Fequency")
Based on the users providing the ratings, it seems the shows are really good because majority are rated 8 and up.
Summary of ratings
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 1.00 7.00 8.00 7.61 9.00 10.00 185510
I will now normalize the data to eliminate bias therefore average rating would be 0.
avg_anime_ratings <- data.frame("avg_rating" = colMeans(a_mat)) %>%
ggplot(aes(x = avg_rating)) +
geom_histogram(color = "black", fill = "steelblue") +
theme( axis.line = element_line(colour = "darkblue", size = 1, linetype = "solid"))+
ggtitle("Distribution of Average Ratings for Anime Shows")
avg_anime_ratings
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
The distribution is left skewed however most of the ratings are 0.
Similarity among the first 50 users
simA <- similarity(a_mat[1:100, ], method = "cosine", which = "users")
image(as.matrix(simA), main = "User Similarity")
Similarity among the first 50 anime items
simB <- similarity(a_mat[, 1:100], method = "cosine", which = "items")
image(as.matrix(simB), main = "Item Similarity")
Based on the similarity plots, items have more in common than users do with each other.
Recommender Systems are systems that aim to predict users’ interests and recommend items that are likely to interest them. They help uers make decisions by discovering new and relevant items. As mentioned earlier, we will look at the way three types of recommenders work.
At first we will divide the data into training and test sets so that the recommender algorithms can learn the data then try to predict releant outcomes.
#min(rowCounts(a_mat)= 4 so we can keep 4 items per user
anime_eval <- evaluationScheme(data = a_mat, method = "split", train = 0.8, given = 4, goodRating = 5, k = 4)
anime_eval
## Evaluation scheme with 4 items given
## Method: 'split' with 4 run(s).
## Training set proportion: 0.800
## Good ratings: >=5.000000
## Data set: 1843 x 1720 rating matrix of class 'realRatingMatrix' with 967727 ratings.
## Normalized using center on rows.
## Warning in .local(x, ...): x was already normalized by row!
## Recommender of type 'IBCF' for 'realRatingMatrix'
## learned using 1474 users.
anime_pred <- predict(object = anime_item_recc, newdata = getData(anime_eval, "known"), n = 10)
anime_predr <- predict(object = anime_item_recc, newdata = getData(anime_eval, "known"), type = "ratings")
Let’s see for the first 4 users.
## $`201`
## [1] 451 479 893 968 978 982 1624 5 77 124
##
## $`392`
## [1] 3 5 10 15 19 22 32 54 56 59
##
## $`446`
## [1] 17 21 26 30 62 63 91 99 114 115
##
## $`661`
## [1] 28 168 250 366 480 982 1008 1016 1061 1191
##
## $`771`
## [1] 1 3 21 22 45 70 79 81 99 123
##
## $`917`
## integer(0)
##
## $`1522`
## [1] 31 71 100 834 965 1072 1237 1277 1380 1387
##
## $`1530`
## [1] 807 1005 1039 1064 1094 1167 1284 1384 25 29
Notice that for some users, items were not recommended to them. Here we have the cold start problem. The recommender does not have adequate information about a user or an item in order to make relevant predictions. This happens often with collaborative filtering recommender systems and such problems reduces performance. The profile of such new user or item will be empty since he has not rated any item hence, their taste is not known to the system.
Let’s see what were actually recommended for the some users.
# function to match anime id with names of anime items
item_recc_anime <- function(i){
p <- anime_pred@items[[i]]
p <- data.frame("guess" = as.factor(p))
p <- inner_join(p, anime_names, by = c("guess" = "anime_id")) %>% select(name, type)
return(as.data.frame(p))
}
## [[1]]
## name type
## 1 Tenshi Kinryouku OVA
## 2 School Rumble TV
## 3 Ai Yori Aoshi TV
## 4 Mobile Suit Gundam ZZ TV
## 5 Mobile Suit Gundam Wing TV
## 6 Futakoi TV
## 7 Tokyo Underground TV
## 8 Angel Heart TV
## 9 Grappler Baki (TV) TV
## 10 Ace wo Nerae! 2 OVA
##
## [[2]]
## name type
## 1 Hungry Heart: Wild Striker TV
## 2 One Piece TV
## 3 Texhnolyze TV
## 4 Neon Genesis Evangelion TV
## 5 D.C.: Da Capo TV
## 6 DearS TV
## 7 Mobile Suit Gundam Wing: Endless Waltz OVA
## 8 Mai-Otome TV
## 9 Sakigake!! Cromartie Koukou TV
## 10 El Hazard: The Alternative World TV
##
## [[3]]
## name type
## 1 Cowboy Bebop: Tengoku no Tobira Movie
## 2 Eyeshield 21 TV
## 3 Monster TV
## 4 Prince of Tennis TV
## 5 Neon Genesis Evangelion: The End of Evangelion Movie
## 6 Appleseed (Movie) Movie
## 7 Avenger TV
## 8 Chobits TV
##
## [[4]]
## name type
## 1 Mahou Shoujo Lyrical Nanoha TV
## 2 Shakugan no Shana TV
## 3 Burn Up! OVA
## 4 Street Fighter II V TV
## 5 Ginga Densetsu Weed TV
## 6 The Third: Aoi Hitomi no Shoujo TV
## 7 Tokyo Babylon OVA
## 8 Blame! ONA
## 9 Melty Lancer OVA
## Warning in .local(x, ...): x was already normalized by row!
## Recommender of type 'SVD' for 'realRatingMatrix'
## learned using 1474 users.
anime_svd_pred <- predict(object = anime_SVD_recc, newdata = getData(anime_eval, "known"), n = 10)
anime_svd_predr <- predict(object = anime_SVD_recc, newdata = getData(anime_eval, "known"), type = "ratings")
Lets see what SVD recommends
## $`201`
## [1] 689 343 787 536 660 742 1149 728 732 646
##
## $`392`
## [1] 467 9 458 761 70 335 159 482 590 133
##
## $`446`
## [1] 482 590 41 467 558 146 574 22 79 154
##
## $`661`
## [1] 208 759 1024 1415 570 973 495 490 852 135
##
## $`771`
## [1] 467 590 482 852 869 698 70 1030 1041 1102
##
## $`917`
## [1] 852 482 590 698 759 551 1158 1102 1115 185
##
## $`1522`
## [1] 1431 1359 590 1158 1262 1186 1326 1233 1230 1268
##
## $`1530`
## [1] 590 1262 482 996 1230 852 1268 1233 1166 1179
# function to match anime id with names of anime items
svd_recc_anime <- function(i){
p <- anime_svd_pred@items[[i]]
p <- data.frame("guess" = as.factor(p))
p <- inner_join(p, anime_names, by = c("guess" = "anime_id")) %>% select(name, type)
return(as.data.frame(p))
}
Unlike Item recommender, the SVD algorithm provided a recommendation for every user. In general, SVD is a commonly used method to estimate missing data in a data matrix. When you consider that recommender systems are essentially trying to estimate missing ratings for users, the use of SVD makes sense. Comparing to the IBCF, some are the same.
Now let’s have a look at what the numbers match to.
## Warning: Column `guess`/`anime_id` joining factors with different levels,
## coercing to character vector
## Warning: Column `guess`/`anime_id` joining factors with different levels,
## coercing to character vector
## Warning: Column `guess`/`anime_id` joining factors with different levels,
## coercing to character vector
## Warning: Column `guess`/`anime_id` joining factors with different levels,
## coercing to character vector
## [[1]]
## name type
## 1 Hanbun no Tsuki ga Noboru Sora TV
## 2 Naruto TV
## 3 Musekinin Kanchou Tylor TV
## 4 Mousou Dairinin TV
## 5 Kidou Senkan Nadesico: The Prince of Darkness Movie
## 6 Mousou Kagaku Series: Wandaba Style TV
## 7 Boukyaku no Senritsu TV
##
## [[2]]
## name type
## 1 Yuâ\230†Giâ\230†Oh!: Duel Monsters GX TV
## 2 Kage kara Mamoru! TV
## 3 Ghost in the Shell: Stand Alone Complex TV
## 4 Major S2 TV
## 5 Kono Minikuku mo Utsukushii Sekai TV
## 6 Rean no Tsubasa ONA
## 7 Prince of Tennis TV
## 8 Shuffle! TV
## 9 Shaman King TV
##
## [[3]]
## name type
## 1 Ghost in the Shell: Stand Alone Complex TV
## 2 Buttobi!! CPU OVA
## 3 Naruto: Akaki Yotsuba no Clover wo Sagase Special
## 4 Matantei Loki Ragnarok TV
## 5 Boukyaku no Senritsu TV
## 6 Yuâ\230†Giâ\230†Oh!: Duel Monsters GX TV
## 7 Kage kara Mamoru! TV
## 8 Green Green TV
##
## [[4]]
## name type
## 1 Gunslinger Girl TV
## 2 Geobreeders: File-X Chibi Neko Dakkan OVA
## 3 Lemon Angel Project TV
## 4 Yuâ\230†Giâ\230†Oh!: Duel Monsters GX TV
## 5 Burn Up! W OVA
## 6 Macross Flash Back 2012 OVA
## 7 Kage kara Mamoru! TV
## 8 Mujin Wakusei Survive TV
The ultimate hybrid recommender containing Item-Item CF, grouped with what the user previously liked, diversity and popular options.
anime_hybrid_recc <- HybridRecommender(
Recommender(data = getData(anime_eval, "train"), method = "IBCF"),
Recommender(data = getData(anime_eval, "train"), method = "POPULAR"),
Recommender(data = getData(anime_eval, "train"), method = "RERECOMMEND"),
Recommender(data = getData(anime_eval, "train"), method = "RANDOM"), #diversity
weights = c(0.5, 0.3, 0.1, 0.1)
)
## Warning in .local(x, ...): x was already normalized by row!
## Warning in .local(x, ...): x was already normalized by row!
## Recommender of type 'HYBRID' for 'ratingMatrix'
## learned using NA users.
anime_hybrid_pred <- predict(object = anime_hybrid_recc, newdata = getData(anime_eval, "known"), n = 10)
anime_hybrid_predr <- predict(object = anime_hybrid_recc, newdata = getData(anime_eval, "known"), type = "ratings")
## $`201`
## [1] 451 1624 479 893 968 982 798 677 349 978
##
## $`392`
## [1] 378 19 1125 1423 154 1161 754 5 823 458
##
## $`446`
## [1] 823 555 588 21 773 1080 567 209 1623 671
##
## $`661`
## [1] 982 1399 168 1421 996 1717 1481 1536 627 1474
##
## $`771`
## [1] 154 99 21 996 214 1290 1024 529 1523 1
##
## $`917`
## [1] 1125 761 810 482 555 1290 773 1029 45 1185
##
## $`1522`
## [1] 1583 965 71 1072 100 834 1387 1656 1446 1380
##
## $`1530`
## [1] 807 1284 1167 1064 1005 1039 1094 1384 378 1038
Some of the items recommended by IBCF
and SVD
did repeat in the hybrid recommeder.
Let’s see the actual items recommended
# function to match anime id with names of anime items
hybrid_recc_anime <- function(i){
p <- anime_hybrid_pred@items[[i]]
p <- data.frame("guess" = as.factor(p))
p <- inner_join(p, anime_names, by = c("guess" = "anime_id")) %>% select(name, type)
return(as.data.frame(p))
}
## Warning: Column `guess`/`anime_id` joining factors with different levels,
## coercing to character vector
## Warning: Column `guess`/`anime_id` joining factors with different levels,
## coercing to character vector
## Warning: Column `guess`/`anime_id` joining factors with different levels,
## coercing to character vector
## Warning: Column `guess`/`anime_id` joining factors with different levels,
## coercing to character vector
## [[1]]
## name type
## 1 Odin: Koushi Hansen Starlight Movie
## 2 Kyattou Ninden Teyandee TV
## 3 Ace wo Nerae! 2 OVA
## 4 School Rumble TV
## 5 Genma Taisen Movie
## 6 Green Legend Ran OVA
## 7 Gift: Eternal Rainbow TV
## 8 Aria The Animation TV
## 9 Harlock Saga: Nibelung no Yubiwa OVA
##
## [[2]]
## name type
## 1 Virgin Night OVA
## 2 Koutetsu Tenshi Kurumi 2 TV
## 3 Itsudatte My Santa! OVA
## 4 One Piece TV
## 5 Tenamonya Voyagers OVA
## 6 Sentou Yousei Yukikaze OVA
## 7 The Big O TV
## 8 R.O.D the TV TV
## 9 G-On Riders TV
## 10 Lemon Angel Project TV
##
## [[3]]
## name type
## 1 eX-Driver the Movie Movie
## 2 Monster TV
## 3 Lupin III: Napoleon no Jisho wo Ubae Special
## 4 Shaman King TV
## 5 Mazeâ\230†Bakunetsu Jikuu (TV) TV
## 6 Yuki no Joou TV
## 7 Cowboy Bebop: Tengoku no Tobira Movie
## 8 Virgin Night OVA
## 9 Buttobi!! CPU OVA
##
## [[4]]
## name type
## 1 Melty Lancer OVA
## 2 Katekyo Hitman Reborn! TV
## 3 Macross Flash Back 2012 OVA
## 4 Blame! ONA
## 5 The Third: Aoi Hitomi no Shoujo TV
## 6 Lupin III: Ikiteita Majutsushi OVA
## 7 Haru no Ashioto The Movie: Ourin Dakkan Movie
## 8 Ginga Densetsu Weed TV
## 9 Street Fighter II V TV
ITEM
anime_item_acc1 <- calcPredictionAccuracy(x = anime_pred, data = getData(anime_eval, "unknown"), given = 4, goodRating = 5)
anime_item_acc2 <- calcPredictionAccuracy(x = anime_predr, data = getData(anime_eval, "unknown"))
SVD
anime_svd_acc1 <- calcPredictionAccuracy(x = anime_svd_pred, data = getData(anime_eval, "unknown"), given = 4, goodRating = 5)
anime_svd_acc2 <- calcPredictionAccuracy(x = anime_svd_predr, data = getData(anime_eval, "unknown"))
HYBRID
anime_hy_acc1 <- calcPredictionAccuracy(x = anime_hybrid_pred, data = getData(anime_eval, "unknown"), given = 4, goodRating = 5)
anime_hy_acc2 <- calcPredictionAccuracy(x = anime_hybrid_predr, data = getData(anime_eval, "unknown"))
TopN
TP | FP | FN | TN | precision | recall | TPR | FPR | |
---|---|---|---|---|---|---|---|---|
anime_item_acc1 | 0.3035230 | 7.967480 | 114.7751 | 1592.954 | 0.0385451 | 0.0035823 | 0.0035823 | 0.0047436 |
anime_svd_acc1 | 1.4336043 | 8.566396 | 113.6450 | 1592.355 | 0.1433604 | 0.0156471 | 0.0156471 | 0.0051708 |
anime_hy_acc1 | 0.9403794 | 9.059621 | 114.1382 | 1591.862 | 0.0940379 | 0.0053911 | 0.0053911 | 0.0060231 |
Ratings
RMSE | MSE | MAE | |
---|---|---|---|
anime_item_acc2 | 1.596660 | 2.549324 | 1.107453 |
anime_svd_acc2 | 1.688743 | 2.851854 | 1.235027 |
anime_hy_acc2 | 1.583035 | 2.506001 | 1.124243 |
To sum up this table, the lower the numbers, the better the performance of the model.
models_to_evaluate <- list(
IBCF = list(name = "IBCF", param = list(method = "cosine")),
SVD = list(name = "SVD", param = list(k = 30)),
POPULAR = list(name = "POPULAR", param = NULL),
RANDOM = list(name = "RANDOM", param = NULL)
)
results <- evaluate(anime_eval, method = models_to_evaluate, n = c(1, 3, 5, 15, 20))
## IBCF run fold/sample [model time/prediction time]
## 1
## Warning in .local(x, ...): x was already normalized by row!
## [60.31sec/0.15sec]
## 2
## Warning in .local(x, ...): x was already normalized by row!
## [57.41sec/0.17sec]
## 3
## Warning in .local(x, ...): x was already normalized by row!
## [58.83sec/0.19sec]
## 4
## Warning in .local(x, ...): x was already normalized by row!
## [56.03sec/0.2sec]
## SVD run fold/sample [model time/prediction time]
## 1
## Warning in .local(x, ...): x was already normalized by row!
## [0.94sec/0.56sec]
## 2
## Warning in .local(x, ...): x was already normalized by row!
## [1.01sec/0.54sec]
## 3
## Warning in .local(x, ...): x was already normalized by row!
## [0.92sec/0.6sec]
## 4
## Warning in .local(x, ...): x was already normalized by row!
## [1.15sec/0.58sec]
## POPULAR run fold/sample [model time/prediction time]
## 1
## Warning in .local(x, ...): x was already normalized by row!
## [0.02sec/2.43sec]
## 2
## Warning in .local(x, ...): x was already normalized by row!
## [0.03sec/2.14sec]
## 3
## Warning in .local(x, ...): x was already normalized by row!
## [0.03sec/2sec]
## 4
## Warning in .local(x, ...): x was already normalized by row!
## [0.03sec/2.06sec]
## RANDOM run fold/sample [model time/prediction time]
## 1 [0.01sec/0.64sec]
## 2 [0sec/0.7sec]
## 3 [0.01sec/0.61sec]
## 4 [0.01sec/0.63sec]
ROC Curve
The closer the curve is to the top right, it indicates a better performance.
Precision-Recall
The closer the curve is to the top left, the better the performance. In this case, the Singular Value Decomposition algorithm performed best.
Overall, the Hybrid Recommender performed best due to it having the lowest error score. This was expected because when you have a hybrid recommender, the algorithms make up for the shortcomings of each other. As mentioned earlier, Item based recommender had the trouble of recommending items for some new users. This is a problem for collaborative filtering recommenders due to a of lack of enough information where only a few of the total number of items available in a database rated by users. Therefore, there comes the inability to locate successful neighbors and finally, the generation of weak recommendations.
To conclude, recommender systems open new opportunities of retrieving personalized information on the web. It also helps to alleviate the problem of information overload which is a very common circumstance with information retrieval systems and enables users to have access to products and services which are not readily available to users on the system. This prject discussed the three recommendation techniques and highlighted their strengths and weaknesses. Various learning algorithms used in generating the recommendation models and evaluation metrics were used to measure the quality and performance of the algorithms discussed.