We will create a collaborative filtering and content-based recommendation system for anime programming (TV shows, movies, on-line, etc.). Anime is defined by Merriam-Webster dictionary as “a style of animation originating in Japan that is characterized by stark colorful graphics depicting vibrant characters in action-filled plots often with fantastic or futuristic themes.”
Anime is an industry that exceeds USD $19.1 billion. Anime - already huge in Japan - is experiencing a surge of popularity in the west, which is contributing to its record sales (https://www.nerdly.co.uk/2020/03/12/9-reasons-why-anime-is-experiencing-a-huge-growth-in-popularity/).
The dataset we will be use was extracted from the web site https://myanimelist.net/, which is the world’s largest anime and manga community.
This dataset is available in kaggle: https://www.kaggle.com/CooperUnion/anime-recommendations-database.
There are two datasets - one containing user preference data (73,516 users on 12,294 anime shows) and a file containing the details pertaining to each anime show. Users add anime shows they watched to their “completed list” and then give it a rating. These datasets are a compilation of those ratings.
Anime.csv
Rating.csv
We will use the ratings for collaborative filtering and genre plus type for content-based recommendation.
\(\textbf{Software:}\)
We plan to store the dataset in Spark, leveraging the work done in project 5. We will be using Rstudio, and packages like recommenderlab and sparklyr.
\(\textbf{Methodogies:}\)
We plan to use any or all of IBCF, UBCF, SVD, and ALS. We will explore multiple similarity methods, including Pearson, Jaccard, and Cosine. We will also explore normalization methods like center, z-score.
\(\textbf{Visualization:}\)
If time allows for it, we will build a shiny app.