My goal in this analysis is to explore the growth in the number of yuri manga titles (series or one-shots) over time.
For those readers not familiar with the R statistical software and the additional Tidyverse software I use to manipulate and plot data, check out the various ways to learn more about the Tidyverse.
I load the following R libraries, for the purposes listed:
library("tidyverse")
library("tools")
I use a local copy of a data file containing the number of yuri manga titles released in each year through 2022. See the “References” section below for more information.
I check the MD5 hash values for the file, and stop if the contents are not what are expected.
stopifnot(md5sum("yuri-manga-per-year.csv") == "bcf4da8233ec0bc93e3e5e0e8bfdf443")
I load the data from the CSV file (filtering out any rows with invalid data) and sort the date in ascending order by year.
mpy_tb <- read_csv("yuri-manga-per-year.csv", col_types = "ii") %>%
filter(!is.na(Year) & !is.na(Num_Manga)) %>%
arrange(Year)
I do some basic exploratory data analysis, starting the total number of yuri manga in the dataset, the earliest year covered, and the maximum number of manga released in any year.
total_manga <- sum(mpy_tb$Num_Manga)
earliest_year <- min(mpy_tb$Year)
max_in_year <- max(mpy_tb$Num_Manga)
year_of_max <- mpy_tb$Year[mpy_tb$Num_Manga == max_in_year]
There are a total of 1,704 yuri manga in the dataset. The earliest year for which there is data is 1957. The most yuri manga released in any year was 192, in 2018.
Now that I have my dataset of interest, I can continue my analysis, this time by plotting the number of yuri manga released in each year.
mpy_tb %>%
ggplot(mapping=aes(x = Year, y = Num_Manga)) +
geom_point() +
geom_line() +
scale_x_continuous(breaks = c(1960, 1970, 1980, 1990, 2000, 2010, 2020)) +
scale_y_continuous(labels = scales::label_comma()) +
xlab("Year") +
ylab("New Yuri Manga") +
labs(
title = "New Yuri Manga Titles by Year (All Time)",
subtitle = "Number of Series or One-Shots First Released in Year",
caption = "Data source: Manga tagged GL on Anime Planet site"
) +
theme_gray() +
theme(axis.title.x = element_text(margin = margin(t = 5))) +
theme(axis.title.y = element_text(margin = margin(r = 10))) +
theme(plot.caption = element_text(margin = margin(t = 15), hjust = 0))
Since very few yuri manga were published until the 1990s, I re-do the plot to focus on the time period from 1990 on.
mpy_tb %>%
filter(Year >= 1990) %>%
ggplot(mapping=aes(x = Year, y = Num_Manga)) +
geom_point() +
geom_line() +
scale_x_continuous(breaks = c(1990, 1995, 2000, 2005, 2010, 2015, 2020)) +
scale_y_continuous(labels = scales::label_comma()) +
xlab("Year") +
ylab("New Yuri Manga") +
labs(
title = "New Yuri Manga Titles by Year (Since 1990)",
subtitle = "Number of Series or One-Shots First Released in Year",
caption = "Data source: Manga tagged GL on Anime Planet site"
) +
theme_gray() +
theme(axis.title.x = element_text(margin = margin(t = 5))) +
theme(axis.title.y = element_text(margin = margin(r = 10))) +
theme(plot.caption = element_text(margin = margin(t = 15), hjust = 0))
Another interesting statistic to look at is the cumulative number of yuri manga released by year.
mpy_tb <- mpy_tb %>%
mutate(Cumul_Manga = cumsum(Num_Manga))
cumul_90 <- min(mpy_tb$Year[mpy_tb$Cumul_Manga > 0.1 * total_manga]) - 1
cumul_75 <- min(mpy_tb$Year[mpy_tb$Cumul_Manga > 0.25 * total_manga]) - 1
cumul_50 <- min(mpy_tb$Year[mpy_tb$Cumul_Manga > 0.5 * total_manga]) - 1
I plot this statistic, again looking at the period from 1990 on.
mpy_tb %>%
filter(Year >= 1990) %>%
ggplot(mapping=aes(x = Year, y = Cumul_Manga)) +
geom_line() +
geom_point() +
scale_x_continuous(breaks = c(1990, 1995, 2000, 2005, 2010, 2015, 2020)) +
scale_y_continuous(labels = scales::label_comma()) +
xlab("Year") +
ylab("Cumulative Yuri Manga") +
labs(
title = "Cumulative Yuri Manga Titles by Year",
subtitle = "Number of Series or One-Shots First Released in or before Year",
caption = "Data source: Manga tagged GL on Anime Planet site"
) +
theme_gray() +
theme(axis.title.x = element_text(margin = margin(t = 5))) +
theme(axis.title.y = element_text(margin = margin(r = 10))) +
theme(plot.caption = element_text(margin = margin(t = 15), hjust = 0))
More than 90% of yuri manga titles were published after 2005, more than three-quarters after 2009, and more than half after 2015.
The accuracy of the dataset, and thus of the analysis, depends on the following factors:
In particular, the source of the data uses a scheme devised by Western fans in which the tag “yuri“ is used to mark works more sexual in nature and the tag “shoujo-ai” is used to mark works focused more on emotional relationships. The tag “GL” should include both of these categories, but I have not confirmed this.
Data on the number of yuri manga titles released per year were obtained from the Anime Planet web site using the following procedure:
I used the following R environment in doing the analysis above:
sessionInfo()
## R version 4.2.1 (2022-06-23)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Big Sur ... 10.16
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] tools stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] forcats_0.5.2 stringr_1.5.0 dplyr_1.0.10 purrr_1.0.1
## [5] readr_2.1.3 tidyr_1.2.1 tibble_3.1.8 ggplot2_3.4.0
## [9] tidyverse_1.3.2
##
## loaded via a namespace (and not attached):
## [1] lubridate_1.9.0 assertthat_0.2.1 digest_0.6.31
## [4] utf8_1.2.2 R6_2.5.1 cellranger_1.1.0
## [7] backports_1.4.1 reprex_2.0.2 evaluate_0.19
## [10] highr_0.10 httr_1.4.4 pillar_1.8.1
## [13] rlang_1.0.6 googlesheets4_1.0.1 readxl_1.4.1
## [16] rstudioapi_0.14 jquerylib_0.1.4 rmarkdown_2.19
## [19] labeling_0.4.2 googledrive_2.0.0 bit_4.0.5
## [22] munsell_0.5.0 broom_1.0.2 compiler_4.2.1
## [25] modelr_0.1.10 xfun_0.36 pkgconfig_2.0.3
## [28] htmltools_0.5.4 tidyselect_1.2.0 fansi_1.0.3
## [31] crayon_1.5.2 tzdb_0.3.0 dbplyr_2.2.1
## [34] withr_2.5.0 grid_4.2.1 jsonlite_1.8.4
## [37] gtable_0.3.1 lifecycle_1.0.3 DBI_1.1.3
## [40] magrittr_2.0.3 scales_1.2.1 cli_3.6.0
## [43] stringi_1.7.12 vroom_1.6.0 cachem_1.0.6
## [46] farver_2.1.1 fs_1.5.2 xml2_1.3.3
## [49] bslib_0.4.2 ellipsis_0.3.2 generics_0.1.3
## [52] vctrs_0.5.1 bit64_4.0.5 glue_1.6.2
## [55] hms_1.1.2 parallel_4.2.1 fastmap_1.1.0
## [58] yaml_2.3.6 timechange_0.2.0 colorspace_2.0-3
## [61] gargle_1.2.1 rvest_1.0.3 knitr_1.41
## [64] haven_2.5.1 sass_0.4.4
The source code for this analysis can be found in the public code
repository https://gitlab.com/frankhecker/misc-analysis in the
manga
subdirectory.
This document and its source code are available for unrestricted use, distribution and modification under the terms of the Creative Commons CC0 1.0 Universal (CC0 1.0) Public Domain Dedication. Stated more simply, you’re free to do whatever you’d like with it.