2026-03-27

Introduction

  • Anime ratings can vary widely between shows and it is based on multiple factors
  • This project focuses on what could influence these ratings

Dataset Overview

  • Dataset: MyAnimeList Anime Dataset
  • MyAnimeList is a website that is used to track, categorize, rate, and discuss anime
  • It contains information on anime ratings, popularity, and other characteristics
  • The dataset was found on Kaggle
  • Key variables used in this analysis:
    • score (rating)
    • episodes (number of episodes)
    • members (popularity)
    • type (TV, Movie, OVA)

Data Cleaning

library(tidyverse)
library(knitr)


data = read_csv("anime.csv")

#cleaning the dataset
data = data %>%
  select(title, score, episodes, members, type, sfw) %>%
  filter(sfw) %>%
  select(-sfw) %>%
  filter(type != "music") %>%
  filter(type != "special") %>%
  drop_na()

Quick Preview of the Data

  • The “Members” column shows how many people have it on their list
    • A member could put an anime in one of the following categories on their list: Watching, Completed, On Hold, Dropped, and Plan to Watch
Preview of the Dataset
title score episodes members type
Fullmetal Alchemist: Brotherhood 9.10 64 3206028 tv
Hunter x Hunter (2011) 9.04 148 2688079 tv
Shingeki no Kyojin Season 3 Part 2 9.05 10 2133927 tv
Steins;Gate 9.07 24 2463954 tv

Score by Type

  • ONA: Original Net Animation (straight to streaming services)
  • OVA: Original Video Animation (it could extra episodes that are not from the source material, or its not based on existing media)

Insight from Boxplot

  • OVAs and ONAs have a slightly lower average rating
    • For OVAs, it could be attributed to the fact that it is not adapted from canon material and won’t be as good as the original story.
    • For ONAs, it could be due to the lack of marketing. Fewer viewers could lead to fewer ratings.
  • OVAs and ONAs also have outliers at the top and bottom which shows that there are some exceptional OVAs and ONAs out there
  • Movies and TV shows having a higher rating makes sense as they have higher budgets and more backing from different companies.
  • It makes sense that the highest rated media are TV shows since they can run for much longer and have more character development.

Score Distribution of Ratings

Insights from the Score Distribution

  • The data is approximately normal, skewing slightly positive
  • Since the peak is above 5, it could indicate that users are more generous when it comes to a media they like
  • The reason for a staggeringly low amount of ratings below 5 could be due to the loss of interest
    • If an anime is not that good, someone could just drop it entirely and move on to something different
  • The lack of ratings below 3.5 could suggest selection bias
    • Viewers are much less likely to watch low rated anime and it could lead to the under representation of scores

Scatterplot of Number of Episodes vs Score

Scatterplot Analysis

  • The graph shows that having more episodes does not lead to a higher score
  • There are plenty of anime with episode counts in the hundreds that have an average score
  • There are some outliers which have more than 1000 episodes, but these are very long running anime from franchises such as Pokemon, Doraemon (a popular kids show in Asia), etc.
  • These are mostly kids shows which have been airing for decades
  • Overall, this graph tells us that the people behind the anime matter just as much as the number of episodes
  • The right team can make a ground breaking anime whether it is only 12 episodes long, or even 200+

3D Plot: Episodes vs Score vs Members

Pearson Correlation between the Score and Members

#Pearson correlation between score and members
cor_test <- cor.test(data$score, data$members, method = "pearson")

cor_test
## 
##  Pearson's product-moment correlation
## 
## data:  data$score and data$members
## t = 46.162, df = 10865, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.3890906 0.4205304
## sample estimates:
##       cor 
## 0.4049301

Analysis of Pearson Correlation

  • The result of 0.405 shows a moderate positive correlation
  • Significance: p-value < 0.001 shows that the correlation is statistically significant
  • Based on the confidence interval, I am confident that the true correlation is positive
  • Interpretation: More popular anime tend to have higher ratings, but the relationship is moderate, meaning that other factors also affect scores

Thank You