DATA612 Final Project - Anime Recommendation System

1. Introduction

MyAnimeList, also known as MAL, is an anime and manga social networking website which contains a database where users can organize and add different anime to their list. When added to a list the anime items are given a rating after being watched. This process helps in finding users who have similar tastes. This project will explore the contents of this dataset to gain insights. Later on, an item-item collaborative filtering recommeder system will be built to recommend and predict anime for users. Analysis and evaluation will be done on the recommender system to see how well it performs when recommending items.

The data was obtained from Kaggle.com and contains information from 73,516 users who may have given a rating to one of 12,294 anime items. The scores/ratings range from 1 - 10 with 10 being the best. If the rating is -1, it means that the user did not provide a rating for that item.

2. Objective / Motivation

The goal of this project is to recommend and make predictions about a user’s taste. Specifically what a user will want to watch or buy in the future. In order to do such predictions, large amounts of user data is needed to find patterns and associate prior tastes with future choices. Often times, it is difficult to provide good recommendations when users’ information is limited. Of course it is better when users give their information explicitly but not as much as we’d like. Therefore sparsity is introduced. However, in order to produce meaningful recommendations, I propose three techniques: (1) Item-item collaborative filtering, (2) Single Value Decomposition (SVD) and (3) Hybrid Recommender System. The system will be implemented in R using a training and test set with a ratio of 80%:20% respectively. The error for each model will be reported as root mean square error (RMSE) as a measure for perfomance.

3. My Anime List Recommender System

3.1 Data Pre-processing

3.1.1 Import files

anime_ratings <- read.csv("anime-recommendations-database/rating.csv", header = T)
glimpse(anime_ratings)

## Observations: 7,813,737
## Variables: 3
## $ user_id  <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,...
## $ anime_id <int> 20, 24, 79, 226, 241, 355, 356, 442, 487, 846, 936, 1...
## $ rating   <int> -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -...

anime_names <- read.csv("anime-recommendations-database/anime.csv", header = T)
glimpse(anime_names)

## Observations: 12,294
## Variables: 7
## $ anime_id <int> 32281, 5114, 28977, 9253, 9969, 32935, 11061, 820, 15...
## $ name     <fct> Kimi no Na wa., Fullmetal Alchemist: Brotherhood, Gin...
## $ genre    <fct> "Drama, Romance, School, Supernatural", "Action, Adve...
## $ type     <fct> Movie, TV, TV, TV, TV, TV, TV, OVA, Movie, TV, TV, Mo...
## $ episodes <fct> 1, 64, 51, 24, 51, 10, 148, 110, 1, 13, 24, 1, 201, 2...
## $ rating   <dbl> 9.37, 9.26, 9.25, 9.17, 9.16, 9.15, 9.13, 9.11, 9.10,...
## $ members  <int> 200630, 793665, 114262, 673572, 151266, 93351, 425855...

3.1.2 Clean Data

According to the description found with the data, the ratings are from 1 - 10. Notice that if a user did not rate an item, the item received a rating of -1. For simplicity, I will change -1 to NA to indicate the rating is missing. Added to that, I will aslo change the data type for some variables.

anime_ratings$rating[anime_ratings$rating == -1] <- NA
anime_sp <- anime_ratings
anime_ratings$user_id <- as.factor(anime_ratings$user_id)
anime_ratings$anime_id <- as.factor(anime_ratings$anime_id)
anime_names$anime_id <- as.factor(anime_names$anime_id)
anime_names$name <- as.character(anime_names$name)
anime_names$type <- as.character(anime_names$type)
anime_names$genre <- as.character(anime_names$genre)

3.2 Exploratory Data Analysis

Before we create a matrix to build the recommenders, let’s gather some insights from the data.

3.2.1 Highest rated items

anime_names %>% arrange(desc(rating)) %>% 
  top_n(7) %>% kable() %>% kable_styling("striped", font_size = 10, full_width = F)

## Selecting by members

anime_id	name	genre	type	episodes	rating	members
5114	Fullmetal Alchemist: Brotherhood	Action, Adventure, Drama, Fantasy, Magic, Military, Shounen	TV	64	9.26	793665
1575	Code Geass: Hangyaku no Lelouch	Action, Mecha, Military, School, Sci-Fi, Super Power	TV	25	8.83	715151
1535	Death Note	Mystery, Police, Psychological, Supernatural, Thriller	TV	37	8.71	1013917
16498	Shingeki no Kyojin	Action, Drama, Fantasy, Shounen, Super Power	TV	25	8.54	896229
6547	Angel Beats!	Action, Comedy, Drama, School, Supernatural	TV	13	8.39	717796
11757	Sword Art Online	Action, Adventure, Fantasy, Game, Romance	TV	25	7.83	893100
20	Naruto	Action, Comedy, Martial Arts, Shounen, Super Power	TV	220	7.81	683297

3.2.2 Most watched type of show

anime_names %>% count(type)%>% 
  ggplot(aes(x = type, y = n)) + 
  geom_bar(stat = "identity", fill = "darkgreen" ) + 
  geom_text(aes(label=n), vjust= -0.6, color="black", size=3.5) +
  theme_minimal()

  #kable_styling("striped", font_size = 13, full_width = F)

About 25 anime items type were unknown and most of them are under the TV category.

Note

ONA - Original Net Animation (ONA) is an anime that is directly released onto the Internet
OVA - Original Video Animation (OVA) is an animated film or series made specially for release in home-video formats

3.2.3 Anime with the most members

anime_names %>% arrange(desc(members)) %>% 
  top_n(10) %>% kable() %>% kable_styling("striped", font_size = 10, full_width = F)

## Selecting by members

anime_id	name	genre	type	episodes	rating	members
1535	Death Note	Mystery, Police, Psychological, Supernatural, Thriller	TV	37	8.71	1013917
16498	Shingeki no Kyojin	Action, Drama, Fantasy, Shounen, Super Power	TV	25	8.54	896229
11757	Sword Art Online	Action, Adventure, Fantasy, Game, Romance	TV	25	7.83	893100
5114	Fullmetal Alchemist: Brotherhood	Action, Adventure, Drama, Fantasy, Magic, Military, Shounen	TV	64	9.26	793665
6547	Angel Beats!	Action, Comedy, Drama, School, Supernatural	TV	13	8.39	717796
1575	Code Geass: Hangyaku no Lelouch	Action, Mecha, Military, School, Sci-Fi, Super Power	TV	25	8.83	715151
20	Naruto	Action, Comedy, Martial Arts, Shounen, Super Power	TV	220	7.81	683297
9253	Steins;Gate	Sci-Fi, Thriller	TV	24	9.17	673572
10620	Mirai Nikki (TV)	Action, Mystery, Psychological, Shounen, Supernatural, Thriller	TV	26	8.07	657190
4224	Toradora!	Comedy, Romance, School, Slice of Life	TV	25	8.45	633817

Let’s move on to creating a User-Item matrix

3.3 User-Item Matrix

#convert anime matrix to a real rating matrix
a_mat <- as(anime_ratings, "realRatingMatrix")
a_mat

## 73515 x 11200 rating matrix of class 'realRatingMatrix' with 7813730 ratings.

A lot of the data is sparse and uses a lot of memory. For instance the size of this matrix is about 99 Mb.

object.size(a_mat)

## 99233736 bytes

I will cut the size of the matrix down where it will only contain data for users who rated at least 500 anime shows and shows that were rated at least 1000 times.

a_mat <- a_mat[rowCounts(a_mat) > 500, colCounts(a_mat) > 1000]
a_mat

## 1843 x 1720 rating matrix of class 'realRatingMatrix' with 967727 ratings.

object.size(a_mat)

## 11850056 bytes

How ratings are distributed

a_ratings <- as.data.frame(table(as.vector(a_mat@data@x)))


ggplot(a_ratings, aes(x = Var1, y = Freq, fill = Var1)) + 
  geom_bar(stat = "identity") + 
  ggtitle("Distribution of Ratings for Anime Items") +
  geom_text(aes(label=Freq), vjust= -0.6, color="black", size=3.5) +
  theme(legend.position="none") + xlab("Rating Score") + ylab("Fequency")

Based on the users providing the ratings, it seems the shows are really good because majority are rated 8 and up.

Summary of ratings

summary(a_mat@data@x)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##    1.00    7.00    8.00    7.61    9.00   10.00  185510

I will now normalize the data to eliminate bias therefore average rating would be 0.

#normalize
a_mat <- normalize(a_mat)

image(a_mat[1:100, 1:100], main = "First 100 users and anime items: Top Anime")

avg_anime_ratings <- data.frame("avg_rating" = colMeans(a_mat)) %>% 
  ggplot(aes(x = avg_rating)) + 
  geom_histogram(color = "black", fill = "steelblue") + 
  theme( axis.line = element_line(colour = "darkblue", size = 1, linetype = "solid"))+
  ggtitle("Distribution of Average Ratings for Anime Shows")
avg_anime_ratings

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

The distribution is left skewed however most of the ratings are 0.

3.4 Similarity

Similarity among the first 50 users

simA <- similarity(a_mat[1:100, ], method = "cosine", which = "users")
image(as.matrix(simA), main = "User Similarity")

Similarity among the first 50 anime items

simB <- similarity(a_mat[, 1:100], method = "cosine", which = "items")
image(as.matrix(simB), main = "Item Similarity")

Based on the similarity plots, items have more in common than users do with each other.

3.5 Building Recommender Systems

Recommender Systems are systems that aim to predict users’ interests and recommend items that are likely to interest them. They help uers make decisions by discovering new and relevant items. As mentioned earlier, we will look at the way three types of recommenders work.

At first we will divide the data into training and test sets so that the recommender algorithms can learn the data then try to predict releant outcomes.

Training and Test sets

#min(rowCounts(a_mat)= 4 so we can keep 4 items per user
anime_eval <- evaluationScheme(data = a_mat, method = "split", train = 0.8, given = 4, goodRating = 5, k = 4) 
anime_eval

## Evaluation scheme with 4 items given
## Method: 'split' with 4 run(s).
## Training set proportion: 0.800
## Good ratings: >=5.000000
## Data set: 1843 x 1720 rating matrix of class 'realRatingMatrix' with 967727 ratings.
## Normalized using center on rows.

3.5.1 Item-Item Collaborative Filtering

Item based recommender

anime_item_recc <- Recommender(data = getData(anime_eval, "train"), method = "IBCF")

## Warning in .local(x, ...): x was already normalized by row!

anime_item_recc

## Recommender of type 'IBCF' for 'realRatingMatrix' 
## learned using 1474 users.

Predict

anime_pred <- predict(object = anime_item_recc, newdata = getData(anime_eval, "known"), n = 10)

anime_predr <- predict(object = anime_item_recc, newdata = getData(anime_eval, "known"), type = "ratings")

Let’s see for the first 4 users.

# first 4 users recommendations
anime_pred@items[1:8]

## $`201`
##  [1]  451  479  893  968  978  982 1624    5   77  124
## 
## $`392`
##  [1]  3  5 10 15 19 22 32 54 56 59
## 
## $`446`
##  [1]  17  21  26  30  62  63  91  99 114 115
## 
## $`661`
##  [1]   28  168  250  366  480  982 1008 1016 1061 1191
## 
## $`771`
##  [1]   1   3  21  22  45  70  79  81  99 123
## 
## $`917`
## integer(0)
## 
## $`1522`
##  [1]   31   71  100  834  965 1072 1237 1277 1380 1387
## 
## $`1530`
##  [1]  807 1005 1039 1064 1094 1167 1284 1384   25   29

Notice that for some users, items were not recommended to them. Here we have the cold start problem. The recommender does not have adequate information about a user or an item in order to make relevant predictions. This happens often with collaborative filtering recommender systems and such problems reduces performance. The profile of such new user or item will be empty since he has not rated any item hence, their taste is not known to the system.

Let’s see what were actually recommended for the some users.

# function to match anime id with names of anime items
item_recc_anime <- function(i){
p <- anime_pred@items[[i]]
p <- data.frame("guess" = as.factor(p))
p <- inner_join(p, anime_names, by = c("guess" = "anime_id")) %>% select(name, type)
return(as.data.frame(p))
}

 for_users <- c(20, 3, 2, 10)
lapply(for_users, item_recc_anime)

## [[1]]
##                       name type
## 1         Tenshi Kinryouku  OVA
## 2            School Rumble   TV
## 3            Ai Yori Aoshi   TV
## 4    Mobile Suit Gundam ZZ   TV
## 5  Mobile Suit Gundam Wing   TV
## 6                  Futakoi   TV
## 7        Tokyo Underground   TV
## 8              Angel Heart   TV
## 9       Grappler Baki (TV)   TV
## 10         Ace wo Nerae! 2  OVA
## 
## [[2]]
##                                      name type
## 1              Hungry Heart: Wild Striker   TV
## 2                               One Piece   TV
## 3                              Texhnolyze   TV
## 4                 Neon Genesis Evangelion   TV
## 5                           D.C.: Da Capo   TV
## 6                                   DearS   TV
## 7  Mobile Suit Gundam Wing: Endless Waltz  OVA
## 8                               Mai-Otome   TV
## 9             Sakigake!! Cromartie Koukou   TV
## 10       El Hazard: The Alternative World   TV
## 
## [[3]]
##                                             name  type
## 1                Cowboy Bebop: Tengoku no Tobira Movie
## 2                                   Eyeshield 21    TV
## 3                                        Monster    TV
## 4                               Prince of Tennis    TV
## 5 Neon Genesis Evangelion: The End of Evangelion Movie
## 6                              Appleseed (Movie) Movie
## 7                                        Avenger    TV
## 8                                        Chobits    TV
## 
## [[4]]
##                              name type
## 1     Mahou Shoujo Lyrical Nanoha   TV
## 2               Shakugan no Shana   TV
## 3                        Burn Up!  OVA
## 4             Street Fighter II V   TV
## 5             Ginga Densetsu Weed   TV
## 6 The Third: Aoi Hitomi no Shoujo   TV
## 7                   Tokyo Babylon  OVA
## 8                          Blame!  ONA
## 9                    Melty Lancer  OVA

3.5.2 Single Value Decomposition

anime_SVD_recc <- Recommender(data = getData(anime_eval, "train"), method = "SVD")

## Warning in .local(x, ...): x was already normalized by row!

anime_SVD_recc

## Recommender of type 'SVD' for 'realRatingMatrix' 
## learned using 1474 users.

Predict

anime_svd_pred <- predict(object = anime_SVD_recc, newdata = getData(anime_eval, "known"), n = 10) 

anime_svd_predr <- predict(object = anime_SVD_recc, newdata = getData(anime_eval, "known"), type = "ratings")

Lets see what SVD recommends

# first 4 users recommendations
anime_svd_pred@items[1:8]

## $`201`
##  [1]  689  343  787  536  660  742 1149  728  732  646
## 
## $`392`
##  [1] 467   9 458 761  70 335 159 482 590 133
## 
## $`446`
##  [1] 482 590  41 467 558 146 574  22  79 154
## 
## $`661`
##  [1]  208  759 1024 1415  570  973  495  490  852  135
## 
## $`771`
##  [1]  467  590  482  852  869  698   70 1030 1041 1102
## 
## $`917`
##  [1]  852  482  590  698  759  551 1158 1102 1115  185
## 
## $`1522`
##  [1] 1431 1359  590 1158 1262 1186 1326 1233 1230 1268
## 
## $`1530`
##  [1]  590 1262  482  996 1230  852 1268 1233 1166 1179

# function to match anime id with names of anime items
svd_recc_anime <- function(i){
p <- anime_svd_pred@items[[i]]
p <- data.frame("guess" = as.factor(p))
p <- inner_join(p, anime_names, by = c("guess" = "anime_id")) %>% select(name, type)
return(as.data.frame(p))
}

Unlike Item recommender, the SVD algorithm provided a recommendation for every user. In general, SVD is a commonly used method to estimate missing data in a data matrix. When you consider that recommender systems are essentially trying to estimate missing ratings for users, the use of SVD makes sense. Comparing to the IBCF, some are the same.

Now let’s have a look at what the numbers match to.

for_users <- c(20, 3, 2, 10)
lapply(for_users, svd_recc_anime)

## Warning: Column `guess`/`anime_id` joining factors with different levels,
## coercing to character vector

## Warning: Column `guess`/`anime_id` joining factors with different levels,
## coercing to character vector

## Warning: Column `guess`/`anime_id` joining factors with different levels,
## coercing to character vector

## Warning: Column `guess`/`anime_id` joining factors with different levels,
## coercing to character vector

## [[1]]
##                                            name  type
## 1                Hanbun no Tsuki ga Noboru Sora    TV
## 2                                        Naruto    TV
## 3                       Musekinin Kanchou Tylor    TV
## 4                               Mousou Dairinin    TV
## 5 Kidou Senkan Nadesico: The Prince of Darkness Movie
## 6           Mousou Kagaku Series: Wandaba Style    TV
## 7                          Boukyaku no Senritsu    TV
## 
## [[2]]
##                                      name type
## 1         Yuâ\230†Giâ\230†Oh!: Duel Monsters GX   TV
## 2                       Kage kara Mamoru!   TV
## 3 Ghost in the Shell: Stand Alone Complex   TV
## 4                                Major S2   TV
## 5       Kono Minikuku mo Utsukushii Sekai   TV
## 6                         Rean no Tsubasa  ONA
## 7                        Prince of Tennis   TV
## 8                                Shuffle!   TV
## 9                             Shaman King   TV
## 
## [[3]]
##                                        name    type
## 1   Ghost in the Shell: Stand Alone Complex      TV
## 2                             Buttobi!! CPU     OVA
## 3 Naruto: Akaki Yotsuba no Clover wo Sagase Special
## 4                    Matantei Loki Ragnarok      TV
## 5                      Boukyaku no Senritsu      TV
## 6           Yuâ\230†Giâ\230†Oh!: Duel Monsters GX      TV
## 7                         Kage kara Mamoru!      TV
## 8                               Green Green      TV
## 
## [[4]]
##                                    name type
## 1                       Gunslinger Girl   TV
## 2 Geobreeders: File-X Chibi Neko Dakkan  OVA
## 3                   Lemon Angel Project   TV
## 4       Yuâ\230†Giâ\230†Oh!: Duel Monsters GX   TV
## 5                            Burn Up! W  OVA
## 6               Macross Flash Back 2012  OVA
## 7                     Kage kara Mamoru!   TV
## 8                 Mujin Wakusei Survive   TV

3.5.2 Hybrid Recommender

The ultimate hybrid recommender containing Item-Item CF, grouped with what the user previously liked, diversity and popular options.

anime_hybrid_recc <- HybridRecommender(
  Recommender(data = getData(anime_eval, "train"), method = "IBCF"),
  Recommender(data = getData(anime_eval, "train"), method = "POPULAR"),
  Recommender(data = getData(anime_eval, "train"), method = "RERECOMMEND"),
  Recommender(data = getData(anime_eval, "train"), method = "RANDOM"), #diversity
  weights = c(0.5, 0.3, 0.1, 0.1)
)

## Warning in .local(x, ...): x was already normalized by row!

## Warning in .local(x, ...): x was already normalized by row!

anime_hybrid_recc

## Recommender of type 'HYBRID' for 'ratingMatrix' 
## learned using NA users.

anime_hybrid_pred <- predict(object = anime_hybrid_recc, newdata = getData(anime_eval, "known"), n = 10) 

anime_hybrid_predr <- predict(object = anime_hybrid_recc, newdata = getData(anime_eval, "known"), type = "ratings")

# first 4 users recommendations
anime_hybrid_pred@items[1:8]

## $`201`
##  [1]  451 1624  479  893  968  982  798  677  349  978
## 
## $`392`
##  [1]  378   19 1125 1423  154 1161  754    5  823  458
## 
## $`446`
##  [1]  823  555  588   21  773 1080  567  209 1623  671
## 
## $`661`
##  [1]  982 1399  168 1421  996 1717 1481 1536  627 1474
## 
## $`771`
##  [1]  154   99   21  996  214 1290 1024  529 1523    1
## 
## $`917`
##  [1] 1125  761  810  482  555 1290  773 1029   45 1185
## 
## $`1522`
##  [1] 1583  965   71 1072  100  834 1387 1656 1446 1380
## 
## $`1530`
##  [1]  807 1284 1167 1064 1005 1039 1094 1384  378 1038

Some of the items recommended by IBCF and SVD did repeat in the hybrid recommeder.

Let’s see the actual items recommended

# function to match anime id with names of anime items
hybrid_recc_anime <- function(i){
p <- anime_hybrid_pred@items[[i]]
p <- data.frame("guess" = as.factor(p))
p <- inner_join(p, anime_names, by = c("guess" = "anime_id")) %>% select(name, type)
return(as.data.frame(p))
}

for_users <- c(20, 3, 2, 10)
lapply(for_users, hybrid_recc_anime)

## Warning: Column `guess`/`anime_id` joining factors with different levels,
## coercing to character vector

## Warning: Column `guess`/`anime_id` joining factors with different levels,
## coercing to character vector

## Warning: Column `guess`/`anime_id` joining factors with different levels,
## coercing to character vector

## Warning: Column `guess`/`anime_id` joining factors with different levels,
## coercing to character vector

## [[1]]
##                               name  type
## 1    Odin: Koushi Hansen Starlight Movie
## 2          Kyattou Ninden Teyandee    TV
## 3                  Ace wo Nerae! 2   OVA
## 4                    School Rumble    TV
## 5                     Genma Taisen Movie
## 6                 Green Legend Ran   OVA
## 7            Gift: Eternal Rainbow    TV
## 8               Aria The Animation    TV
## 9 Harlock Saga: Nibelung no Yubiwa   OVA
## 
## [[2]]
##                        name type
## 1              Virgin Night  OVA
## 2  Koutetsu Tenshi Kurumi 2   TV
## 3       Itsudatte My Santa!  OVA
## 4                 One Piece   TV
## 5        Tenamonya Voyagers  OVA
## 6    Sentou Yousei Yukikaze  OVA
## 7                 The Big O   TV
## 8              R.O.D the TV   TV
## 9               G-On Riders   TV
## 10      Lemon Angel Project   TV
## 
## [[3]]
##                                   name    type
## 1                  eX-Driver the Movie   Movie
## 2                              Monster      TV
## 3 Lupin III: Napoleon no Jisho wo Ubae Special
## 4                          Shaman King      TV
## 5          Mazeâ\230†Bakunetsu Jikuu (TV)      TV
## 6                         Yuki no Joou      TV
## 7      Cowboy Bebop: Tengoku no Tobira   Movie
## 8                         Virgin Night     OVA
## 9                        Buttobi!! CPU     OVA
## 
## [[4]]
##                                      name  type
## 1                            Melty Lancer   OVA
## 2                  Katekyo Hitman Reborn!    TV
## 3                 Macross Flash Back 2012   OVA
## 4                                  Blame!   ONA
## 5         The Third: Aoi Hitomi no Shoujo    TV
## 6          Lupin III: Ikiteita Majutsushi   OVA
## 7 Haru no Ashioto The Movie: Ourin Dakkan Movie
## 8                     Ginga Densetsu Weed    TV
## 9                     Street Fighter II V    TV

3.7 Evaluation

ITEM

anime_item_acc1 <- calcPredictionAccuracy(x = anime_pred, data = getData(anime_eval, "unknown"), given = 4, goodRating = 5)
anime_item_acc2 <- calcPredictionAccuracy(x = anime_predr, data = getData(anime_eval, "unknown"))

SVD

anime_svd_acc1 <- calcPredictionAccuracy(x = anime_svd_pred, data = getData(anime_eval, "unknown"), given = 4, goodRating = 5)
anime_svd_acc2 <- calcPredictionAccuracy(x = anime_svd_predr, data = getData(anime_eval, "unknown"))

HYBRID

anime_hy_acc1 <- calcPredictionAccuracy(x = anime_hybrid_pred, data = getData(anime_eval, "unknown"), given = 4, goodRating = 5)
anime_hy_acc2 <- calcPredictionAccuracy(x = anime_hybrid_predr, data = getData(anime_eval, "unknown"))

TopN

kable(rbind(anime_item_acc1, anime_svd_acc1, anime_hy_acc1)) %>% kable_styling(c("striped", "hovered", "bordered"), font_size = 12, full_width = F) %>% add_header_above(c("Recommender", "TopN Accuracy" = 8))

Recommender	TopN Accuracy
	TP	FP	FN	TN	precision	recall	TPR	FPR
anime_item_acc1	0.3035230	7.967480	114.7751	1592.954	0.0385451	0.0035823	0.0035823	0.0047436
anime_svd_acc1	1.4336043	8.566396	113.6450	1592.355	0.1433604	0.0156471	0.0156471	0.0051708
anime_hy_acc1	0.9403794	9.059621	114.1382	1591.862	0.0940379	0.0053911	0.0053911	0.0060231

Ratings

kable(rbind(anime_item_acc2, anime_svd_acc2, anime_hy_acc2)) %>% kable_styling(c("striped", "hovered", "bordered"), font_size = 12, full_width = 80) %>% add_header_above(c("Recommender", "Ratings Accuracy" = 3))

Recommender	Ratings Accuracy
	RMSE	MSE	MAE
anime_item_acc2	1.596660	2.549324	1.107453
anime_svd_acc2	1.688743	2.851854	1.235027
anime_hy_acc2	1.583035	2.506001	1.124243

To sum up this table, the lower the numbers, the better the performance of the model.

Comparing Models

models_to_evaluate <- list(
  IBCF = list(name = "IBCF", param = list(method = "cosine")),
  SVD = list(name = "SVD", param = list(k = 30)),
  POPULAR = list(name = "POPULAR", param = NULL),
  RANDOM = list(name = "RANDOM", param = NULL)
)
results <- evaluate(anime_eval, method = models_to_evaluate, n = c(1, 3, 5, 15, 20))

## IBCF run fold/sample [model time/prediction time]
##   1

## Warning in .local(x, ...): x was already normalized by row!

## [60.31sec/0.15sec] 
##   2

## Warning in .local(x, ...): x was already normalized by row!

## [57.41sec/0.17sec] 
##   3

## Warning in .local(x, ...): x was already normalized by row!

## [58.83sec/0.19sec] 
##   4

## Warning in .local(x, ...): x was already normalized by row!

## [56.03sec/0.2sec] 
## SVD run fold/sample [model time/prediction time]
##   1

## Warning in .local(x, ...): x was already normalized by row!

## [0.94sec/0.56sec] 
##   2

## Warning in .local(x, ...): x was already normalized by row!

## [1.01sec/0.54sec] 
##   3

## Warning in .local(x, ...): x was already normalized by row!

## [0.92sec/0.6sec] 
##   4

## Warning in .local(x, ...): x was already normalized by row!

## [1.15sec/0.58sec] 
## POPULAR run fold/sample [model time/prediction time]
##   1

## Warning in .local(x, ...): x was already normalized by row!

## [0.02sec/2.43sec] 
##   2

## Warning in .local(x, ...): x was already normalized by row!

## [0.03sec/2.14sec] 
##   3

## Warning in .local(x, ...): x was already normalized by row!

## [0.03sec/2sec] 
##   4

## Warning in .local(x, ...): x was already normalized by row!

## [0.03sec/2.06sec] 
## RANDOM run fold/sample [model time/prediction time]
##   1  [0.01sec/0.64sec] 
##   2  [0sec/0.7sec] 
##   3  [0.01sec/0.61sec] 
##   4  [0.01sec/0.63sec]

ROC Curve

plot(results, annotate = T, legend = "topleft") 
title("ROC Curve")

The closer the curve is to the top right, it indicates a better performance.

Precision-Recall

plot(results, "prec/rec", annotate = T, legend = "bottomright")
title("Pecision-Recall")

The closer the curve is to the top left, the better the performance. In this case, the Singular Value Decomposition algorithm performed best.

4. Conclusion

Overall, the Hybrid Recommender performed best due to it having the lowest error score. This was expected because when you have a hybrid recommender, the algorithms make up for the shortcomings of each other. As mentioned earlier, Item based recommender had the trouble of recommending items for some new users. This is a problem for collaborative filtering recommenders due to a of lack of enough information where only a few of the total number of items available in a database rated by users. Therefore, there comes the inability to locate successful neighbors and finally, the generation of weak recommendations.

To conclude, recommender systems open new opportunities of retrieving personalized information on the web. It also helps to alleviate the problem of information overload which is a very common circumstance with information retrieval systems and enables users to have access to products and services which are not readily available to users on the system. This prject discussed the three recommendation techniques and highlighted their strengths and weaknesses. Various learning algorithms used in generating the recommendation models and evaluation metrics were used to measure the quality and performance of the algorithms discussed.

5. References

https://www.scottfreitas.com/assets/papers/Recommender_System.pdf
- https://www.sciencedirect.com/science/article/pii/S1110866515000341
- Gorakala, Suresh K., and Michele Usuelli. Building a Recommendation System with R: Learn the Art of Building Robust and Powerful Recommendation Engines Using R. Packt Publishing, 2015.

DATA612 Final Project - Anime Recommendation System

Javern Wilson

July 8, 2019

1. Introduction

2. Objective / Motivation

3. My Anime List Recommender System

3.1 Data Pre-processing

3.1.1 Import files

3.1.2 Clean Data

3.2 Exploratory Data Analysis

3.2.1 Highest rated items

3.2.2 Most watched type of show

3.2.3 Anime with the most members

3.3 User-Item Matrix

How ratings are distributed

3.4 Similarity

3.5 Building Recommender Systems

Training and Test sets

3.5.1 Item-Item Collaborative Filtering

Item based recommender

Predict

3.5.2 Single Value Decomposition

Predict

3.5.2 Hybrid Recommender

3.7 Evaluation

Comparing Models

4. Conclusion

5. References