MyAnimeList, often abbreviated as MAL, is an anime and manga social networking and social cataloging application website. The site provides its users with a list-like system to organize and score anime and manga. It facilitates finding users who share similar tastes and provides a large database on anime and manga.
Anime without rankings or popularity scores were excluded. Producers, genre, and studio were converted from lists to tidy observations, so there will be repetitions of shows with multiple producers, genres, etc.
This development has been carried out to analyse the various factors that influence the popularity or rank of a particular anime.
The data was cleaned and shaped accordingly to carry out the analysis and infer the results.
library(tidyr)
library(DT)
library(ggplot2)
library(dplyr)
library(tidyverse)
library(kableExtra)
library(lubridate)
library(readxl)
library(highcharter)
library(lubridate)
library(scales)
library(RColorBrewer)
library(wesanderson)
library(plotly)
library(shiny)
library(readxl)
| Package | Description |
|---|---|
| library(tidyr) | For changing the layout of your data sets, to convert data into the tidy format |
| library(DT) | For HTML display of data |
| library(ggplot2) | For customizable graphical representation |
| library(dplyr) | For data manipulation |
| library(tidyverse) | Collection of R packages designed for data science that works harmoniously with other packages |
| library(kableExtra) | To display table in a fancy way |
| library(lubridate) | Lubridate makes it easier to do the things R does with date-times and possible to do the things R does not |
| library(readxl) | The readxl package makes it easy to get data out of Excel and into R |
| library(highcharter) | Highcharter is a R wrapper for Highcharts javascript libray and its modules |
| library(scales) | The idea of the scales package is to implement scales in a way that is graphics system agnostic |
| library(RColorBrewer) | RColorBrewer is an R package that allows users to create colourful graphs with pre-made color palettes that visualize data in a clear and distinguishable manner |
| library(wesanderson) | A Wes Anderson is color palette for R |
| library(plotly) | Plotly’s R graphing library makes interactive, publication-quality graphs |
| library(shiny) | Shiny is an R package that makes it easy to build interactive web apps straight from R |
The data used in the analysis can be found here
MyAnimeList, often abbreviated as MAL, is an anime and manga social networking and social cataloging application website. The site provides its users with a list-like system to organize and score anime and manga. It facilitates finding users who share similar tastes and provides a large database on anime and manga.
Anime without rankings or popularity scores were excluded. Producers, genre, and studio were converted from lists to tidy observations, so there will be repetitions of shows with multiple producers, genres, etc.
The original Dataset that has been used for this project can be found here
The column Premiered has both the season and the year combined. We are splitting this column into two columns, Premiered Season and Premiered Year as now both the columns will have information about one entity and the analysis can be done based on both season as well as year.
For a similar reason, we are splitting the Broadcast column into Day_of_week and Time to help in our analysis.
anime_clean <- tidy_anime %>%
separate(premiered, c("Premiered_Season", "Premiered_Year")) %>%
separate(broadcast, c("Day_of_Week", "Not_Needed1", "Time", "Not_Needed_2"), sep = " " ) %>%
select(-c(Not_Needed1,Not_Needed_2))
A lot of columns does not provide any useful information for us in our analysis. Thus, going forward, its better to filter out those columns and restrict our analysis to the columns of our interest or the columns which provide valuable insights from the given data.
For this, we are removing the following columns from our dataset:
anime_final <- select(anime_clean, -c(title_english, title_japanese, title_synonyms, background, synopsis, related,status,end_date))
After removing the unneccesary columns, we rename all the column names with appropiate names using the snake_case
names(anime_final) <- c("anime_id", "anime_name", "anime_type", "source", "producers", "genre", "studio", "no_of_episodes", "airing_status", "start_date", "episode_duration", "MPAA_rating", "viewers_rating", "rated_by_no_of_viewers", "rankings", "popularity_index", "wishlisted_members", "favorites", "premiered_season", "premiered_year", "broadcast_day", "broadcast_time")
Now, we try to replace the missing values in premiered_season. For this, we extract the month value from start_date, and categorize it with the 4 seasons respectively. Wherever it is not possible to replace because of insufficient data, we replace them with NA.
anime_final$premiered_season <- ifelse(as.numeric(format.Date(anime_final$start_date, "%m")) %in% c(3,4,5), "Spring",
ifelse(as.numeric(format.Date(anime_final$start_date, "%m")) %in% c(6,7,8), "Summer",
ifelse(as.numeric(format.Date(anime_final$start_date, "%m")) %in% c(9,10,11), "Fall",
ifelse(as.numeric(format.Date(anime_final$start_date, "%m")) %in% c(12,1,2), "Winter",
no = NA ))))
After checking the summary of the data, we observe that we need to encode the Unknown values of Anime Type to NA
anime_final$anime_type[anime_final$anime_type == "Unknown"] <- NA
For the broadcast day column, we need to encode (Other) and Unknown to NA.
anime_final$broadcast_day[anime_final$broadcast_day == "Not"] <- NA
anime_final$broadcast_day[anime_final$broadcast_day == "Unknown"] <- NA
It would be better for our analysis to make the following variables as categorical variables instead of character variables: * Type * Genre * Rating * Premiered Season * Day of Week
anime_final %>% mutate_at(.vars = c("anime_type", "genre", "MPAA_rating", "premiered_season", "broadcast_day"), .funs = as.factor)
The column Start_Date is a character variable. Converting them to Date variables would help in further analysis.
anime_final$start_date <- as.Date(anime_final$start_date)
anime_final$premiered_year <- as.numeric(anime_final$premiered_year)
The final cleaned dataset can be found below in an interactive table.
datatable(anime_final, filter = 'top')