MyAnimeList, frequently truncated as MAL, is an anime and manga social indexing application site. The site furnishes its clients with a rundown like a framework to arrange and score anime and manga. It encourages discovering clients who share comparative tastes and gives an enormous database of anime and manga.
Anime without rankings or popularity scores were rejected. Producers, genre, and studio were converted from lists to tidy observations, so there will be reiterations of shows with multiple producers, genres and so on.
This analysis has been done to investigate the different components that impact the prominence or rank of a specific anime.
The data was cleaned and shaped accordingly to carry out the analysis and infer the results.
library(tidyr)
library(DT)
library(ggplot2)
library(dplyr)
library(tidyverse)
library(kableExtra)
library(lubridate)
library(readxl)
library(highcharter)
library(lubridate)
library(scales)
library(RColorBrewer)
library(wesanderson)
library(plotly)
library(shiny)
library(readxl)
| Package | Description |
|---|---|
| library(tidyr) | For changing the layout of your data sets, to convert data into the tidy format |
| library(DT) | For HTML display of data |
| library(ggplot2) | For customizable graphical representation |
| library(dplyr) | For data manipulation |
| library(tidyverse) | Collection of R packages designed for data science that works harmoniously with other packages |
| library(kableExtra) | To display table in a fancy way |
| library(lubridate) | Lubridate makes it easier to do the things R does with date-times and possible to do the things R does not |
| library(readxl) | The readxl package makes it easy to get data out of Excel and into R |
| library(highcharter) | Highcharter is a R wrapper for Highcharts javascript libray and its modules |
| library(scales) | The idea of the scales package is to implement scales in a way that is graphics system agnostic |
| library(RColorBrewer) | RColorBrewer is an R package that allows users to create colourful graphs with pre-made color palettes that visualize data in a clear and distinguishable manner |
| library(wesanderson) | A Wes Anderson is color palette for R |
| library(plotly) | Plotly’s R graphing library makes interactive, publication-quality graphs |
| library(shiny) | Shiny is an R package that makes it easy to build interactive web apps straight from R |
The data used in the analysis can be found here. MyAnimeList, frequently truncated as MAL, is an anime and manga social indexing application site. The site furnishes its clients with a rundown like a framework to arrange and score anime and manga. It encourages discovering clients who share comparative tastes and gives an enormous database of anime and manga.
Anime without rankings or popularity scores were rejected. Producers, genre, and studio were converted from lists to tidy observations, so there will be reiterations of shows with multiple producers, genres and so on.
The original Dataset that has been used for this project can be found here
The column start_date has the date, month and year combined. We are extracting the year from this, naming it as premiered_year so that the analysis can be done based on year.
For a similar reason, we are splitting the Broadcast column into Day_of_week and Time to help in our analysis.
anime_clean <- tidy_anime %>%
mutate(premiered_year=(year(mdy(start_date)))) %>%
separate(broadcast, c("Day_of_Week", "Not_Needed1", "Time", "Not_Needed_2"), sep = " " ) %>%
select(-c(Not_Needed1,Not_Needed_2))
A lot of columns doesn’t give any valuable data to us in our investigation. In this way, going ahead, it is better to remove those columns and confine our investigation to the columns which give significant bits of knowledge from the given information.
For this, we are removing the following columns from our dataset:
anime_final <- select(anime_clean, -c(title_english, title_japanese, title_synonyms, background, synopsis,premiered, related,status,end_date))
After removing the unnecessary columns, we rename all the column names with appropriate names using the snake_case.
names(anime_final) <- c("anime_id", "anime_name", "anime_type", "source", "producers", "genre", "studio", "no_of_episodes", "airing_status", "start_date", "episode_duration", "MPAA_rating", "viewers_rating", "rated_by_no_of_viewers", "rankings", "popularity_index", "wishlisted_members", "favorites", "broadcast_day", "broadcast_time", "premiered_year")
After checking the summary of the data, we observe that we need to encode the Unknown values of Anime Type to NA
anime_final$anime_type[anime_final$anime_type == "Unknown"] <- NA
For the broadcast day column, we need to encode (Other) and Unknown to NA.
anime_final$broadcast_day[anime_final$broadcast_day == "Not"] <- NA
anime_final$broadcast_day[anime_final$broadcast_day == "Unknown"] <- NA
It would be helpful in our analysis to change the following variables from character as categorical variables: * Type * Genre * Rating * Premiered Season * Day of Week
anime_final %>% mutate_at(.vars = c("anime_type", "genre", "MPAA_rating", "broadcast_day"), .funs = as.factor)
The column Start_Date is a character variable. Converting them to Date variables would help in further analysis.
anime_final$start_date <- as.Date(anime_final$start_date)
anime_final$premiered_year <- as.numeric(anime_final$premiered_year)
The final cleaned dataset can be found below in an interactive table.
datatable(anime_final, filter = 'top')