2026-03-27

Dataset Overview and Source

Manga Dataset

The analysis is based on data on manga pulled from a popular site called MyAnimeList. The dataset includes information on the top 10,000 manga entries in 2024

Data Source: Kaggle

Introduction: In this project, I will be exploring: the types of manga that exist, who manga is for, the factors that drive manga score, and how well even small manga are loved.

Dataset Overview and Source (cont.)

Key Variables:

  • Title: Title of the Manga, in both Japanese and English
  • Score: Score or rating given by users out of 10
  • Members: Number of community members who added the manga to their reading or recommended lists.
  • Favorite: Number of community members who added the manga to their favorites
  • Chapters: Number of chapters
  • Status: Publishing status, such as Finished, Publishing, On Hiatus, etc.
  • Genres: Genres of the manga
  • Demographics: Target audience of the manga

What Types of Manga Exist?

To start off, we will look at the kinds of manga that exist by exploring the most common manga genres and the percentages of manga at different publishing statuses.

To prepare the data for the following plot_ly plots, I organized the Genres data out of lists and filtered out empty publish statuses.

Top Genres by Count

Status Breakdown

What Affects a Manga’s Score?

Next, we will look into how the score of a manga is distributed based on the demographic a manga is dedicated towards.

We will also explore how score compares with the number of chapters a manga has to see if longer manga scores better or if readers lose interest over time..

The 3d plot of Score x Members x Favorite will give us insight into how popularity compares with quality and whether the most popular manga are actually the most highly rated.

Score Distribution

Score Distribution - ANOVA

## 
## Call:
## lm(formula = Score ~ Demographics, data = manga_demo)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.51760 -0.27556 -0.09556  0.18240  2.05320 
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)          7.31676    0.01950 375.217  < 2e-16 ***
## DemographicsKids    -0.01373    0.06789  -0.202    0.840    
## DemographicsSeinen   0.10005    0.02158   4.635 3.64e-06 ***
## DemographicsShoujo  -0.03119    0.02101  -1.484    0.138    
## DemographicsShounen  0.10084    0.02138   4.717 2.45e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3736 on 6117 degrees of freedom
## Multiple R-squared:  0.02808,    Adjusted R-squared:  0.02745 
## F-statistic: 44.19 on 4 and 6117 DF,  p-value: < 2.2e-16

ANOVA Analysis

One-Way ANOVA: Score by Demographics
Df Sum Sq Mean Sq F value p value
Demographics 4 24.67 6.17 44.188 0 ***
Residuals 6117 853.64 0.14 NA —

Commentary Because of the F-value and p-values, we can say that demographics significantly predict score with Seinen and Shounen manga works scoring around 0.10 points higher on average than Josei. However, the model only explains 2.75% of variance which suggests that more than just demographics plays into the score a title receives.

Score vs Chapter

Score x Members x Favorite

Hidden Gems

For the last plot, I wanted to explore the top 15 manga based on the ratio between members who saved the manga and members who favorited the manga as an indicator of how well-loved these titles are.

These hidden gems could be lower on overall ranking or score but are shown to be well-loved by their readers with high favoriting rates.

Hidden Gems - Bar Plot