library(rvest)
library(tidyverse)
library(dplyr)Stuff for Weebs
Question
How does the popularity (measured by total members) of different manga types (e.g., Manga, Light Novel, Manhwa) compare on MyAnimeList’s top manga rankings?
Why this?
As a manga lover there is a lot of different types in the manga industry. I want to know which type dominates the community
Ethical Stuff
MyAnimeList won’t let you scrape unless you pass yourself off as a bot.
Some Packages Used
Methodology
We’ll scrape data from MyAnimeList’s “Top Manga” pages, which gives us ranking information for different manga titles, including their scores, member counts, and volumes.
Approach:
Web Scraping: I used rvest library to collect title, score, type, and member count information for the top-ranked manga from the MyAnimeList pages.
Data Wrangling: I plan on cleaning the scraped data and processing it making sure all the values (e.g., numeric conversion for member counts, volumes) are consistent.
Analysis: I grouped the data by manga type, calculated the total members for each type, and found the proportions
Visualization: Created a bar chart to visualize the proportions of members with each manga type.
Data Wrangling
Scraping Process: wrote a for statement that goes through multiple pages of the MyAnimeList rankings, capturing:
Titles: Name of the manga.
Scores: Average rating scores.
Information: Text containing the manga type, volumes, and member count.
Manga <- data.frame()
base_url <- "https://myanimelist.net/topmanga.php?limit="
for (i in seq(0, 450, by = 50)) {
page_url <- paste0(base_url, i)
weeb_page <- read_html(page_url)
titles <- weeb_page %>%
html_elements("a.hoverinfo_trigger.fs14.fw-b") %>%
html_text2()
scores <- weeb_page %>%
html_elements("div.js-top-ranking-score-col.di-ib.al") %>%
html_text2()
information <- weeb_page %>%
html_elements("div.information.di-ib.mt4") %>%
html_text2()
data <- data.frame(
Title = titles,
Score = scores,
Information = information,
stringsAsFactors = FALSE
) %>%
mutate(
Type = sub(" \\(.*", "", Information), # Extract type (e.g., Manga, Novel)
Volumes = sub(".*\\((.*? vols)\\).*", "\\1", Information), # Extract volumes
Members = str_trim(str_extract(Information, "\\d{1,3}(,\\d{3})* members")) # Extract members using regex
) %>%
select(-Information) # Drop unnecessary columns
# Combine the current page data with the main data frame
Manga <- bind_rows(Manga, data)
print(paste("Page with limit:", i, "has been fully scraped"))
}[1] "Page with limit: 0 has been fully scraped"
[1] "Page with limit: 50 has been fully scraped"
[1] "Page with limit: 100 has been fully scraped"
[1] "Page with limit: 150 has been fully scraped"
[1] "Page with limit: 200 has been fully scraped"
[1] "Page with limit: 250 has been fully scraped"
[1] "Page with limit: 300 has been fully scraped"
[1] "Page with limit: 350 has been fully scraped"
[1] "Page with limit: 400 has been fully scraped"
[1] "Page with limit: 450 has been fully scraped"
Data Cleaning:
Extract manga type, volumes, and members from the “Information” column.
Handle missing values and format issues, e.g., replacing “?” with
NA, removing commas, and ensuring numeric conversions.
Manga$Members <- Manga$Members %>%
gsub(",", "", .) %>%
str_replace(., " members", "") %>%
as.numeric(Manga$Members)
Manga$Volumes <- Manga$Volumes %>%
gsub("\\?", NA, .) %>%
str_replace(., " vols", "") %>%
as.numeric(Manga$Volumes)
Manga$Score <- as.numeric(Manga$Score)Analysis & Visualization
Analysis:
The dataset is grouped by manga type. For each type:
Total Members: Sum up the number of members.
Proportion: Compute the share of members relative to the total across all types.
Visualization:
The bar chart shows the proportion of members per manga type.
Manga <- Manga %>%
group_by(Type) %>%
summarize(Total_Members = sum(Members, na.rm = TRUE)) %>%
mutate(Proportion = Total_Members / sum(Total_Members))
# Plot proportions
Manga %>%
ggplot(aes(x = Type, y = Proportion)) +
geom_bar(stat = "identity", fill = "skyblue") +
theme_minimal() +
labs(
title = "Proportion of Members per Type of Manga",
x = "Type",
y = "Proportion of Members"
) +
scale_y_continuous(labels = scales::percent_format())The result are kind of underwhelming I was expecting more in manhwa, but I guess it makes sense since most anime originates from manga making manga more main stream.