Stuff for Weebs

Question

How does the popularity (measured by total members) of different manga types (e.g., Manga, Light Novel, Manhwa) compare on MyAnimeList’s top manga rankings?

Why this?

As a manga lover there is a lot of different types in the manga industry. I want to know which type dominates the community

Ethical Stuff

MyAnimeList won’t let you scrape unless you pass yourself off as a bot.

Some Packages Used

library(rvest)
library(tidyverse)
library(dplyr)

Methodology

We’ll scrape data from MyAnimeList’s “Top Manga” pages, which gives us ranking information for different manga titles, including their scores, member counts, and volumes.

Approach:

  1. Web Scraping: I used rvest library to collect title, score, type, and member count information for the top-ranked manga from the MyAnimeList pages.

  2. Data Wrangling: I plan on cleaning the scraped data and processing it making sure all the values (e.g., numeric conversion for member counts, volumes) are consistent.

  3. Analysis: I grouped the data by manga type, calculated the total members for each type, and found the proportions

  4. Visualization: Created a bar chart to visualize the proportions of members with each manga type.

Data Wrangling

  • Scraping Process: wrote a for statement that goes through multiple pages of the MyAnimeList rankings, capturing:

    • Titles: Name of the manga.

    • Scores: Average rating scores.

    • Information: Text containing the manga type, volumes, and member count.

Manga <- data.frame()

base_url <- "https://myanimelist.net/topmanga.php?limit="

for (i in seq(0, 450, by = 50)) {
  page_url <- paste0(base_url, i)

  weeb_page <- read_html(page_url)
  
  titles <- weeb_page %>% 
    html_elements("a.hoverinfo_trigger.fs14.fw-b") %>% 
    html_text2()
  
  scores <- weeb_page %>% 
    html_elements("div.js-top-ranking-score-col.di-ib.al") %>% 
    html_text2()
  
  information <- weeb_page %>% 
    html_elements("div.information.di-ib.mt4") %>% 
    html_text2()
  
  data <- data.frame(
    Title = titles,
    Score = scores,
    Information = information,
    stringsAsFactors = FALSE
  ) %>%
    mutate(
      Type = sub(" \\(.*", "", Information),  # Extract type (e.g., Manga, Novel)
      Volumes = sub(".*\\((.*? vols)\\).*", "\\1", Information),  # Extract volumes
      Members = str_trim(str_extract(Information, "\\d{1,3}(,\\d{3})* members"))  # Extract members using regex
    ) %>%
    select(-Information)  # Drop unnecessary columns
  
  # Combine the current page data with the main data frame
  Manga <- bind_rows(Manga, data)
  
  print(paste("Page with limit:", i, "has been fully scraped"))
  
}
[1] "Page with limit: 0 has been fully scraped"
[1] "Page with limit: 50 has been fully scraped"
[1] "Page with limit: 100 has been fully scraped"
[1] "Page with limit: 150 has been fully scraped"
[1] "Page with limit: 200 has been fully scraped"
[1] "Page with limit: 250 has been fully scraped"
[1] "Page with limit: 300 has been fully scraped"
[1] "Page with limit: 350 has been fully scraped"
[1] "Page with limit: 400 has been fully scraped"
[1] "Page with limit: 450 has been fully scraped"
  • Data Cleaning:

    • Extract manga type, volumes, and members from the “Information” column.

    • Handle missing values and format issues, e.g., replacing “?” with NA, removing commas, and ensuring numeric conversions.

Manga$Members <- Manga$Members %>% 
  gsub(",", "", .) %>%
  str_replace(., " members", "") %>% 
  as.numeric(Manga$Members)
  

Manga$Volumes <- Manga$Volumes %>% 
  gsub("\\?", NA, .) %>%            
  str_replace(., " vols", "") %>% 
  as.numeric(Manga$Volumes)

Manga$Score <- as.numeric(Manga$Score)

Analysis & Visualization

Analysis:
The dataset is grouped by manga type. For each type:

  • Total Members: Sum up the number of members.

  • Proportion: Compute the share of members relative to the total across all types.

Visualization:
The bar chart shows the proportion of members per manga type.

Manga <- Manga %>%
  group_by(Type) %>%
  summarize(Total_Members = sum(Members, na.rm = TRUE)) %>%
  mutate(Proportion = Total_Members / sum(Total_Members))

# Plot proportions
Manga %>%
  ggplot(aes(x = Type, y = Proportion)) +
  geom_bar(stat = "identity", fill = "skyblue") +
  theme_minimal() +
  labs(
    title = "Proportion of Members per Type of Manga",
    x = "Type",
    y = "Proportion of Members"
  ) +
  scale_y_continuous(labels = scales::percent_format())

The result are kind of underwhelming I was expecting more in manhwa, but I guess it makes sense since most anime originates from manga making manga more main stream.