Final Project

Intro

For my final project, I focused on manga, Japanese comic books or graphic novels. Every year, there are lots of manga that come out of Japan either brand-new or continuing from the years prior. While some manga succeed or stay succeeding, a lot of other manga are cancelled or left on hiatus early in its run. This could be due to poor writing or story, howver, it can also be caused by factors out of the manga authors’ hands. Using data, my goal is to find out what variables makes one manga sell better than another and to see if I can predict how success a new manga will be.

My motivation for this project is that I am a big manga reader and while I love/read some of the most popular titles, I prefer to find up-and-coming or new manga. However, this has lent to many disappointments where they get cancelled way before the story can unfold. With this analyzation, I hope to be able to predict the success of a manga or the likelihood of it being cancelled prematurely to pick manga to read/get invested in.

Data

I will be using three datasets from two sources. The first source is from Kaggle (https://www.kaggle.com/datasets/andreuvallhernndez/myanimelist). This is published by Andreu Vall Hernàndez and contains two datasets that take in anime and manga data from myAnimeList: a website known in the manga/anime western community as the best/biggest database for anime/manga and where a lot of people go to rate them. There was 64,833 rows in this manga dataset and I mostly used this dataset to get specific/more information on different mangas i.e. genres, themes, start_date, etc. The second source is also from Kaggle (https://www.kaggle.com/datasets/drahulsingh/best-selling-manga) by D Rahulsingh. This dataset holds the best-selling manga of all time (~187) and contains sales numbers.

myanimelist_manga <- read.csv("myanimelist/manga.csv")
head(myanimelist_manga, 1)

##   manga_id   title  type score scored_by               status volumes chapters
## 1        2 Berserk manga  9.47    319696 currently_publishing      NA       NA
##   start_date end_date members favorites  sfw approved         created_at_before
## 1 1989-08-25           643969    119470 True     True 2007-07-17 20:14:45+00:00
##                  updated_at real_start_date real_end_date
## 1 2023-04-01 00:19:31+00:00      1989-08-25              
##                                                                                   genres
## 1 ['Action', 'Adventure', 'Award Winning', 'Drama', 'Fantasy', 'Horror', 'Supernatural']
##                                               themes demographics
## 1 ['Gore', 'Military', 'Mythology', 'Psychological']   ['Seinen']
##                                                                                                                                                             authors
## 1 [{'id': 1868, 'first_name': 'Kentarou', 'last_name': 'Miura', 'role': 'Story & Art'}, {'id': 49592, 'first_name': '', 'last_name': 'Studio Gaga', 'role': 'Art'}]
##     serializations
## 1 ['Young Animal']
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         synopsis
## 1 Guts, a former mercenary now known as the "Black Swordsman," is out for revenge. After a tumultuous childhood, he finally finds someone he respects and believes he can trust, only to have everything fall apart when this person takes away everything important to Guts for the purpose of fulfilling his own desires. Now marked for death, Guts becomes condemned to a fate in which he is relentlessly pursued by demonic beings.\n\nSetting out on a dreadful quest riddled with misfortune, Guts, armed with a massive sword and monstrous strength, will let nothing stop him, not even death itself, until he is finally able to take the head of the one who stripped him—and his loved one—of their humanity.\n\n[Written by MAL Rewrite]\n\nIncluded one-shot:\nVolume 14: Berserk: The Prototype
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     background
## 1 Berserk won the Award for Excellence at the sixth installment of Tezuka Osamu Cultural Prize in 2002. The series has over 50 million copies in print worldwide and has been published in English by Dark Horse since November 4, 2003. It is also published in Italy, Germany, Spain, France, Brazil, South Korea, Hong Kong, Taiwan, Thailand, Poland, México and Turkey. In May 2021, the author Kentaro Miura suddenly died at the age of 54. Chapter 364 of Berserk was published posthumously on September 10, 2021. Miura would often share details about the series' story with his childhood friend and fellow mangaka Kouji Mori. Berserk resumed on June 24, 2022, with Studio Gaga handling the art and Kouji Mori's supervision.
##                                             main_picture
## 1 https://cdn.myanimelist.net/images/manga/1/157897l.jpg
##                                       url title_english title_japanese
## 1 https://myanimelist.net/manga/2/Berserk       Berserk     ベルセルク
##               title_synonyms
## 1 ['Berserk: The Prototype']

colnames(myanimelist_manga)

##  [1] "manga_id"          "title"             "type"             
##  [4] "score"             "scored_by"         "status"           
##  [7] "volumes"           "chapters"          "start_date"       
## [10] "end_date"          "members"           "favorites"        
## [13] "sfw"               "approved"          "created_at_before"
## [16] "updated_at"        "real_start_date"   "real_end_date"    
## [19] "genres"            "themes"            "demographics"     
## [22] "authors"           "serializations"    "synopsis"         
## [25] "background"        "main_picture"      "url"              
## [28] "title_english"     "title_japanese"    "title_synonyms"

nrow(myanimelist_manga)

## [1] 64833

myanimelist_anime <- read.csv("myanimelist/anime.csv")
head(myanimelist_anime, 1)

##   anime_id                            title type score scored_by
## 1     5114 Fullmetal Alchemist: Brotherhood   tv   9.1   2037075
##            status episodes start_date   end_date source members favorites
## 1 finished_airing       64 2009-04-05 2010-07-04  manga 3206028    219036
##   episode_duration  total_duration rating  sfw approved
## 1  0 days 00:24:20 1 days 01:57:20      r True     True
##                  created_at                updated_at start_year start_season
## 1 2008-08-21 03:35:22+00:00 2023-04-02 18:07:03+00:00       2009       spring
##   real_start_date real_end_date broadcast_day broadcast_time
## 1      2009-04-05    2010-07-04        sunday       17:00:00
##                                        genres       themes demographics
## 1 ['Action', 'Adventure', 'Drama', 'Fantasy'] ['Military']  ['Shounen']
##     studios
## 1 ['Bones']
##                                                                      producers
## 1 ['Aniplex', 'Square Enix', 'Mainichi Broadcasting System', 'Studio Moriken']
##                              licensors
## 1 ['Funimation', 'Aniplex of America']
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            synopsis
## 1 After a horrific alchemy experiment goes wrong in the Elric household, brothers Edward and Alphonse are left in a catastrophic new reality. Ignoring the alchemical principle banning human transmutation, the boys attempted to bring their recently deceased mother back to life. Instead, they suffered brutal personal loss: Alphonse's body disintegrated while Edward lost a leg and then sacrificed an arm to keep Alphonse's soul in the physical realm by binding it to a hulking suit of armor.\n\nThe brothers are rescued by their neighbor Pinako Rockbell and her granddaughter Winry. Known as a bio-mechanical engineering prodigy, Winry creates prosthetic limbs for Edward by utilizing "automail," a tough, versatile metal used in robots and combat armor. After years of training, the Elric brothers set off on a quest to restore their bodies by locating the Philosopher's Stone—a powerful gem that allows an alchemist to defy the traditional laws of Equivalent Exchange.\n\nAs Edward becomes an infamous alchemist and gains the nickname "Fullmetal," the boys' journey embroils them in a growing conspiracy that threatens the fate of the world.\n\n[Written by MAL Rewrite]
##   background                                             main_picture
## 1            https://cdn.myanimelist.net/images/anime/1208/94745l.jpg
##                                                                   url
## 1 https://myanimelist.net/anime/5114/Fullmetal_Alchemist__Brotherhood
##                                   trailer_url                    title_english
## 1 https://www.youtube.com/watch?v=--IcmZkvL0Q Fullmetal Alchemist: Brotherhood
##                     title_japanese
## 1 鋼の錬金術師 FULLMETAL ALCHEMIST
##                                                                                   title_synonyms
## 1 ['Hagane no Renkinjutsushi: Fullmetal Alchemist', 'Fullmetal Alchemist (2009)', 'FMA', 'FMAB']

colnames(myanimelist_anime)

##  [1] "anime_id"         "title"            "type"             "score"           
##  [5] "scored_by"        "status"           "episodes"         "start_date"      
##  [9] "end_date"         "source"           "members"          "favorites"       
## [13] "episode_duration" "total_duration"   "rating"           "sfw"             
## [17] "approved"         "created_at"       "updated_at"       "start_year"      
## [21] "start_season"     "real_start_date"  "real_end_date"    "broadcast_day"   
## [25] "broadcast_time"   "genres"           "themes"           "demographics"    
## [29] "studios"          "producers"        "licensors"        "synopsis"        
## [33] "background"       "main_picture"     "url"              "trailer_url"     
## [37] "title_english"    "title_japanese"   "title_synonyms"

nrow(myanimelist_anime)

## [1] 24985

bestsellingmanga <- read.csv("best-selling-manga.csv")
head(bestsellingmanga, 1)

##   Manga.series    Author.s. Publisher Demographic No..of.collected.volumes
## 1    One Piece Eiichiro Oda  Shueisha      Shōnen                      104
##     Serialized Approximate.sales.in.million.s.
## 1 1997–present                           516.6
##   Average.sales.per.volume.in.million.s.
## 1                                   4.97

colnames(bestsellingmanga)

## [1] "Manga.series"                          
## [2] "Author.s."                             
## [3] "Publisher"                             
## [4] "Demographic"                           
## [5] "No..of.collected.volumes"              
## [6] "Serialized"                            
## [7] "Approximate.sales.in.million.s."       
## [8] "Average.sales.per.volume.in.million.s."

nrow(bestsellingmanga)

## [1] 187

##Data Cleaning/Tidying

I first loaded in the myAnimelist datasets to myanimelist_manga & myanimelist_anime and cleaned up. This includes getting rid of any unnecessary columns (unnecessary for my analysis). I initially wanted to join them together to get a list of which mangas had anime adaptions. When I tried to join them by title, I got a ‘many to many’ relationship errors which was still there even after cleaning up the duplicate names. Then, I decide all I need for my analysis was a yes or no column: does this manga have an anime adaption. So I created myanimelist which was myanimelist_manga plus this anime_adpation column. I ran into some problems involving for loops and timing out for setting the yes or no’s in the column. Finally, I found out what I wished to achieved can be done in one function.

myanimelist_manga <- subset(myanimelist_manga, select=c(1,2,3,4,5,6,7,8,9,10,11,12,13,19,20,21,23))
colnames(myanimelist_manga)

##  [1] "manga_id"       "title"          "type"           "score"         
##  [5] "scored_by"      "status"         "volumes"        "chapters"      
##  [9] "start_date"     "end_date"       "members"        "favorites"     
## [13] "sfw"            "genres"         "themes"         "demographics"  
## [17] "serializations"

#clean columns: end_date (fix blanks -> NA), genres?, themes?, demographics, serializations
#end date add NA
myanimelist_manga$end_date[myanimelist_manga$end_date==""] <- NA
myanimelist_manga$demographics <- str_extract(myanimelist_manga$demographics, "[A-Z]+[a-z]+")
myanimelist_manga$serializations <- str_extract(myanimelist_manga$serializations, "[A-Z]+[a-z]+")
#weird titles
myanimelist_manga$title[myanimelist_manga$title=="One Punch-Man"] <- "One Punch Man"

#myanimelist_anime clean up - this was not used too much after
myanimelist_anime <- subset(myanimelist_anime, select=c(1,2,3,4,5,6,7,8,9,10,11,12,15,29))
colnames(myanimelist_anime)

##  [1] "anime_id"   "title"      "type"       "score"      "scored_by" 
##  [6] "status"     "episodes"   "start_date" "end_date"   "source"    
## [11] "members"    "favorites"  "rating"     "studios"

#clean columns: end_date (fix blanks -> NA), studios
#end date add NA
myanimelist_anime$end_date[myanimelist_anime$end_date==""] <- NA
myanimelist_anime$studios <- str_extract(myanimelist_anime$studios, "[A-Z]+[a-z]+")
#get rid of dup roles for join
colnames(myanimelist_anime)[3] = "type_anime"
colnames(myanimelist_anime)[4] = "score_anime"
colnames(myanimelist_anime)[5] = "scored_by_anime"
colnames(myanimelist_anime)[6] = "status_anime"
colnames(myanimelist_anime)[8] = "start_date_anime"
colnames(myanimelist_anime)[9] = "end_date_anime"
colnames(myanimelist_anime)[11] = "members_anime"
colnames(myanimelist_anime)[12] = "favorites_anime"

myanimelist <- myanimelist_manga
myanimelist$anime_adaption <- 'No'
#talk about my for loop problem
myanimelist$anime_adaption[myanimelist$title %in% myanimelist_anime$title] <- "Yes"

For the second dataset, bestsellingmanga, I inner joined it with myanimelist_manga after renaming the Manga.series to match myanimelist_manga’s title. This gave me back a dataset of bestsellingmanga with all the extra information of myanimelist_manga.

colnames(bestsellingmanga)[1]="title"
bestsellingmanga <- inner_join(bestsellingmanga, myanimelist_manga, by = "title")
#get rid of non manga rows if type != "manga" for dups
bestsellingmanga <- bestsellingmanga[bestsellingmanga$type == 'manga', ]
head(bestsellingmanga)

##         title                     Author.s.  Publisher Demographic
## 1   One Piece                  Eiichiro Oda   Shueisha      Shōnen
## 2    Golgo 13 Takao Saito, Saito Production Shogakukan      Seinen
## 3 Dragon Ball                Akira Toriyama   Shueisha      Shōnen
## 4    Doraemon               Fujiko F. Fujio Shogakukan    Children
## 6      Naruto             Masashi Kishimoto   Shueisha      Shōnen
## 8   Slam Dunk                Takehiko Inoue   Shueisha      Shōnen
##   No..of.collected.volumes   Serialized Approximate.sales.in.million.s.
## 1                      104 1997–present                           516.6
## 2                      207 1968–present                           300.0
## 3                       42    1984–1995                           260.0
## 4                       45    1969–1996                           250.0
## 6                       72    1999–2014                           250.0
## 8                       31    1990–1996                           170.0
##   Average.sales.per.volume.in.million.s. manga_id  type score scored_by
## 1                                   4.97       13 manga  9.22    355375
## 2                                   1.45     1298 manga  7.85       854
## 3                                   6.19       42 manga  8.41     92616
## 4                                   4.71     1032 manga  8.44      6919
## 6                                   3.47       11 manga  8.07    264788
## 8                                   5.48       51 manga  9.08     70877
##                 status volumes chapters start_date   end_date members favorites
## 1 currently_publishing      NA       NA 1997-07-22       <NA>  579557    111462
## 2 currently_publishing      NA       NA 1968-11-29       <NA>    5715        86
## 3             finished      42      520 1984-11-20 1995-05-23  151685     13965
## 4             finished      45      821 1969-12-01 1996-01-01   13837       873
## 6             finished      72      700 1999-09-21 2014-11-10  402677     43311
## 8             finished      31      276 1990-09-18 1996-06-04  157962     14970
##    sfw                                                              genres
## 1 True                                  ['Action', 'Adventure', 'Fantasy']
## 2 True        ['Action', 'Adventure', 'Award Winning', 'Drama', 'Mystery']
## 3 True                         ['Action', 'Adventure', 'Comedy', 'Sci-Fi']
## 4 True ['Adventure', 'Award Winning', 'Comedy', 'Sci-Fi', 'Slice of Life']
## 6 True                                  ['Action', 'Adventure', 'Fantasy']
## 8 True                                         ['Award Winning', 'Sports']
##                            themes demographics serializations
## 1                              []      Shounen        Shounen
## 2    ['Adult Cast', 'Historical']       Seinen            Big
## 3 ['Martial Arts', 'Super Power']      Shounen        Shounen
## 4   ['Anthropomorphic', 'School']         Kids           <NA>
## 6                ['Martial Arts']      Shounen        Shounen
## 8       ['School', 'Team Sports']      Shounen        Shounen

nrow(bestsellingmanga)

## [1] 99

bestsellingmanga$anime_adaption <- 'No'
bestsellingmanga$anime_adaption[bestsellingmanga$title %in% myanimelist_anime$title] <- "Yes"
head(bestsellingmanga)

##         title                     Author.s.  Publisher Demographic
## 1   One Piece                  Eiichiro Oda   Shueisha      Shōnen
## 2    Golgo 13 Takao Saito, Saito Production Shogakukan      Seinen
## 3 Dragon Ball                Akira Toriyama   Shueisha      Shōnen
## 4    Doraemon               Fujiko F. Fujio Shogakukan    Children
## 6      Naruto             Masashi Kishimoto   Shueisha      Shōnen
## 8   Slam Dunk                Takehiko Inoue   Shueisha      Shōnen
##   No..of.collected.volumes   Serialized Approximate.sales.in.million.s.
## 1                      104 1997–present                           516.6
## 2                      207 1968–present                           300.0
## 3                       42    1984–1995                           260.0
## 4                       45    1969–1996                           250.0
## 6                       72    1999–2014                           250.0
## 8                       31    1990–1996                           170.0
##   Average.sales.per.volume.in.million.s. manga_id  type score scored_by
## 1                                   4.97       13 manga  9.22    355375
## 2                                   1.45     1298 manga  7.85       854
## 3                                   6.19       42 manga  8.41     92616
## 4                                   4.71     1032 manga  8.44      6919
## 6                                   3.47       11 manga  8.07    264788
## 8                                   5.48       51 manga  9.08     70877
##                 status volumes chapters start_date   end_date members favorites
## 1 currently_publishing      NA       NA 1997-07-22       <NA>  579557    111462
## 2 currently_publishing      NA       NA 1968-11-29       <NA>    5715        86
## 3             finished      42      520 1984-11-20 1995-05-23  151685     13965
## 4             finished      45      821 1969-12-01 1996-01-01   13837       873
## 6             finished      72      700 1999-09-21 2014-11-10  402677     43311
## 8             finished      31      276 1990-09-18 1996-06-04  157962     14970
##    sfw                                                              genres
## 1 True                                  ['Action', 'Adventure', 'Fantasy']
## 2 True        ['Action', 'Adventure', 'Award Winning', 'Drama', 'Mystery']
## 3 True                         ['Action', 'Adventure', 'Comedy', 'Sci-Fi']
## 4 True ['Adventure', 'Award Winning', 'Comedy', 'Sci-Fi', 'Slice of Life']
## 6 True                                  ['Action', 'Adventure', 'Fantasy']
## 8 True                                         ['Award Winning', 'Sports']
##                            themes demographics serializations anime_adaption
## 1                              []      Shounen        Shounen            Yes
## 2    ['Adult Cast', 'Historical']       Seinen            Big            Yes
## 3 ['Martial Arts', 'Super Power']      Shounen        Shounen            Yes
## 4   ['Anthropomorphic', 'School']         Kids           <NA>            Yes
## 6                ['Martial Arts']      Shounen        Shounen            Yes
## 8       ['School', 'Team Sports']      Shounen        Shounen            Yes

There was two columns in myanimelist_manga where the data were stored in string vectors: genres and themes. In order to explore these two, I decided to turn them into long datasets (one row per observation, one row per genre/theme). I was not going from wide to long like in our previous classwork so I could not use pivot_longer. Instead, I created a for loop through the rows that will then take and split the genres/themes column into proper vector of strings. Another for loop is added to them add that manga’s name and genre/theme (one at a time) to a new dataset: manga_genres or manga_themes.

manga_genres <- data.frame(matrix(ncol=2,nrow=0))
colnames(manga_genres) <- c('title','genre')
manga_genres

## [1] title genre
## <0 rows> (or 0-length row.names)

for (i in 1:nrow(bestsellingmanga)){
  genre_manga <- str_extract_all(bestsellingmanga$genres[i],"[A-Za-z]+(-[A-Za-z]+)?( [A-Za-z]+ [A-Za-z]+)?( [A-Za-z]+)?")
  for(m in genre_manga){
    for(n in m){
      manga_genres[nrow(manga_genres) + 1, ] = c(bestsellingmanga$title[i], n)
    }
  }
}
head(manga_genres)

##       title         genre
## 1 One Piece        Action
## 2 One Piece     Adventure
## 3 One Piece       Fantasy
## 4  Golgo 13        Action
## 5  Golgo 13     Adventure
## 6  Golgo 13 Award Winning

nrow(manga_genres)

## [1] 295

manga_themes <- data.frame(matrix(ncol=2,nrow=0))
colnames(manga_themes) <- c('title','theme')
manga_themes

## [1] title theme
## <0 rows> (or 0-length row.names)

for (i in 1:nrow(bestsellingmanga)){
  theme_manga <- str_extract_all(bestsellingmanga$themes[i],"[A-Za-z]+(-[A-Za-z]+)?( [A-Za-z]+ [A-Za-z]+)?( [A-Za-z]+)?")
  for(m in theme_manga){
    for(n in m){
      manga_themes[nrow(manga_themes) + 1, ] = c(bestsellingmanga$title[i], n)
    }
  }
}
head(manga_themes)

##         title           theme
## 1    Golgo 13      Adult Cast
## 2    Golgo 13      Historical
## 3 Dragon Ball    Martial Arts
## 4 Dragon Ball     Super Power
## 5    Doraemon Anthropomorphic
## 6    Doraemon          School

nrow(manga_themes)

## [1] 126

Data Anaylsis

I plotted some bar graph based on the datasets I cleaned up and created below.

animelist <- data.frame(matrix(ncol=2,nrow=0))
colnames(animelist) <- c('anime_adaption','count')
animelist[nrow(animelist) + 1, ] = c('Yes', nrow(bestsellingmanga[bestsellingmanga$anime_adaption == "Yes", ]))
animelist[nrow(animelist) + 1, ] = c('No', nrow(bestsellingmanga[bestsellingmanga$anime_adaption == "No", ]))

ggplot(data=animelist, aes(x=anime_adaption, y=count)) +
  geom_bar(stat="identity") +
  theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1))

The first one shows that how many of the manga in the bestsellingmanga dataset have an anime adaption. An adaption could help get more eyes on your manga so I thought it might be a good variable for this. As shown, 82 out of the 99 best selling mangas did have some anime adaption. With such a high percent of them having one, I think it must be a clear indicator for success.

manga_genres$count <- 1
top_genres <- 
  manga_genres %>% group_by(genre) %>% 
  summarise(count=sum(count),
            .groups = 'drop')
top_genres <- top_genres[order(top_genres$count, decreasing = TRUE), ]

ggplot(data=top_genres, aes(x=genre, y=count)) +
  geom_bar(stat="identity") +
  theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1))

manga_themes$count <- 1
top_themes <- 
  manga_themes %>% group_by(theme) %>% 
  summarise(count=sum(count),
            .groups = 'drop')
top_themes <- top_themes[order(top_themes$count, decreasing = TRUE), ]


ggplot(data=top_themes, aes(x=theme, y=count)) +
  geom_bar(stat="identity") +
  theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1))

The two graphs above are bar graphs that show how many times different genres and themes appeared in the bestsellingmanga dataset. There are clear topics that pop up a lot.
For genres, the top 5 are: Award Winning, Comedy, Action, Drama, Romance.
For themes, the top 5 are: School, Team Sports, Historical, Delinquents, Psychological.

For my last analysis, I took sample sizes of the big myanimelist dataset and compare the probability that those randomly choose shared genres and themes with the top five from the bestsellingmanga. The top genres and themes of the bestsellingmanga can be an indictor for success as a lot of them seems to have some in common.

Below are my code and probability results for three sample sizes: 10, 50 and 100. In order to calculate this, I first created the sample for the size I wanted. Then, I created two long datasets for the sample’s genres and themes like I did for bestsellingmanga. I used theses to create other two dataset that held the tally of what genres and themes showed up in the sample plus how many times. Then to find the probability that the sample contained the same genres and themes as bestsellingmanga, I for looped through the sample tally dataset and kept a counter for how many times the top 5 genres/mangas (from bestsellingmanga) popped up. After, I divided it by how many genre/themes the sample had in it.

set.seed(49568)
samp <- myanimelist %>%
  sample_n(10)
samp

##    manga_id                    title     type score scored_by   status volumes
## 1     61503        Hanazono no Kioku    manga  7.41      1094 finished       1
## 2     56513 Fudatsuki no Kyouko-chan    manga  7.05      6942 finished       7
## 3      1820    Hoshigari Love Dollar    manga  6.91       837 finished       3
## 4    103209                  Ajuutan    manga    NA        13 finished       8
## 5    108492     Busu ni Hanataba wo.    manga  7.24       626 finished      12
## 6    145530             Tonda Couple    manga    NA         4 finished      15
## 7      8286               Mizugokoro one_shot    NA        89 finished      NA
## 8    157476     A Handsome Swordsman   manhwa    NA        19 finished      NA
## 9     22455                    Toxic    manga  6.90       636 finished       3
## 10    81843              Money♥Honey one_shot    NA        71 finished      NA
##    chapters start_date   end_date members favorites   sfw
## 1         7 2013-03-07 2013-07-05    2121         8 False
## 2        37 2013-08-12 2016-06-11   16100        69  True
## 3         9 2003-01-01       <NA>    1751         1  True
## 4        81 2016-12-11 2020-01-26     197         0  True
## 5        74 2016-04-04 2022-09-02    2672        20  True
## 6        NA 1978-02-22 1981-02-25      38         0  True
## 7         1 2002-06-20 2002-06-20     170         0 False
## 8       100 2020-12-09 2022-06-28      91         0  True
## 9        15 2010-06-14 2011-01-01    1821        11  True
## 10        1 2007-04-11 2007-04-11     173         0 False
##                                      genres                     themes
## 1         ['Boys Love', 'Drama', 'Erotica']                         []
## 2     ['Comedy', 'Romance', 'Supernatural']      ['School', 'Vampire']
## 3                                        []                         []
## 4  ['Action', 'Drama', 'Fantasy', 'Horror']                         []
## 5                     ['Comedy', 'Romance']                 ['School']
## 6              ['Award Winning', 'Romance'] ['Love Polygon', 'School']
## 7                  ['Boys Love', 'Erotica']                         []
## 8                     ['Action', 'Fantasy']           ['Martial Arts']
## 9                                ['Action']               ['Military']
## 10                               ['Hentai']                         []
##    demographics serializations anime_adaption
## 1          <NA>       Magazine             No
## 2       Shounen         Gessan             No
## 3        Shoujo            Sho             No
## 4          <NA>            Ura             No
## 5        Seinen          Young            Yes
## 6       Shounen        Shounen             No
## 7          <NA>           <NA>             No
## 8          <NA>          Kakao             No
## 9        Shoujo          Comic             No
## 10         <NA>          Comic             No

samp_genres <- data.frame(matrix(ncol=2,nrow=0))
colnames(samp_genres) <- c('title','genre')

for (i in 1:nrow(samp)){
  t <- str_extract_all(samp$genres[i],"[A-Za-z]+(-[A-Za-z]+)?( [A-Za-z]+ [A-Za-z]+)?( [A-Za-z]+)?")
  for(m in t){
    for(n in m){
      samp_genres[nrow(samp_genres) + 1, ] = c(samp$title[i], n)
    }
  }
}
samp_genres <- na.omit(samp_genres)

samp_genres$count <- 1
samp_genres_stats10 <- samp_genres %>% group_by(genre) %>%
  summarise(count=sum(count),
            .groups = 'drop')
samp_genres_stats10 <- samp_genres_stats10[order(samp_genres_stats10$count, decreasing = TRUE), ]
samp_genres_stats10

## # A tibble: 11 × 2
##    genre         count
##    <chr>         <dbl>
##  1 Action            3
##  2 Romance           3
##  3 Boys Love         2
##  4 Comedy            2
##  5 Drama             2
##  6 Erotica           2
##  7 Fantasy           2
##  8 Award Winning     1
##  9 Hentai            1
## 10 Horror            1
## 11 Supernatural      1

t <- 0 
for(x in 1:nrow(samp_genres_stats10)){
  if(samp_genres_stats10$genre[x] %in% head(top_genres$genre, 5)){
    t <- t + samp_genres_stats10$count[x]
  }
}

samp_themes <- data.frame(matrix(ncol=2,nrow=0))
colnames(samp_themes) <- c('title','theme')

for (i in 1:nrow(samp)){
  t <- str_extract_all(samp$themes[i],"[A-Za-z]+(-[A-Za-z]+)?( [A-Za-z]+ [A-Za-z]+)?( [A-Za-z]+)?")
  for(m in t){
    for(n in m){
      samp_themes[nrow(samp_themes) + 1, ] = c(samp$title[i], n)
    }
  }
}
samp_themes <- na.omit(samp_themes)

samp_themes$count <- 1
samp_themes_stats10 <- samp_themes %>% group_by(theme) %>%
  summarise(count=sum(count),
            .groups = 'drop')
samp_themes_stats10 <- samp_themes_stats10[order(samp_themes_stats10$count, decreasing = TRUE), ]
samp_themes_stats10

## # A tibble: 5 × 2
##   theme        count
##   <chr>        <dbl>
## 1 School           3
## 2 Love Polygon     1
## 3 Martial Arts     1
## 4 Military         1
## 5 Vampire          1

t <- 0 
for(x in 1:nrow(samp_themes_stats10)){
  if(samp_themes_stats10$theme[x] %in% head(top_themes$theme, 5)){
    t <- t + samp_themes_stats10$count[x]
  }
}

Above is the code for sample size 10.
The results show that sample’s genres matches 15% of the bestsellingmanga.
The results show that sample’s themes matches 42.8571429% of the bestsellingmanga.

set.seed(493954)
samp <- myanimelist %>%
  sample_n(50)

samp_genres <- data.frame(matrix(ncol=2,nrow=0))
colnames(samp_genres) <- c('title','genre')

for (i in 1:nrow(samp)){
  t <- str_extract_all(samp$genres[i],"[A-Za-z]+(-[A-Za-z]+)?( [A-Za-z]+ [A-Za-z]+)?( [A-Za-z]+)?")
  for(m in t){
    for(n in m){
      samp_genres[nrow(samp_genres) + 1, ] = c(samp$title[i], n)
    }
  }
}
samp_genres <- na.omit(samp_genres)

samp_genres$count <- 1
samp_genres_stats50 <- samp_genres %>% group_by(genre) %>%
  summarise(count=sum(count),
            .groups = 'drop')
samp_genres_stats50 <- samp_genres_stats50[order(samp_genres_stats50$count, decreasing = TRUE), ]

t <- 0 
for(x in 1:nrow(samp_genres_stats50)){
  if(samp_genres_stats50$genre[x] %in% head(top_genres$genre, 5)){
    t <- t + samp_genres_stats50$count[x]
  }
}

samp_themes <- data.frame(matrix(ncol=2,nrow=0))
colnames(samp_themes) <- c('title','theme')

for (i in 1:nrow(samp)){
  t <- str_extract_all(samp$themes[i],"[A-Za-z]+(-[A-Za-z]+)?( [A-Za-z]+ [A-Za-z]+)?( [A-Za-z]+)?")
  for(m in t){
    for(n in m){
      samp_themes[nrow(samp_themes) + 1, ] = c(samp$title[i], n)
    }
  }
}
samp_themes <- na.omit(samp_themes)

samp_themes$count <- 1
samp_themes_stats50 <- samp_themes %>% group_by(theme) %>%
  summarise(count=sum(count),
            .groups = 'drop')
samp_themes_stats50 <- samp_themes_stats50[order(samp_themes_stats50$count, decreasing = TRUE), ]

t <- 0 
for(x in 1:nrow(samp_themes_stats50)){
  if(samp_themes_stats50$theme[x] %in% head(top_themes$theme, 5)){
    t <- t + samp_themes_stats50$count[x]
  }
}

Above is the code for sample size 50.
The results show that sample’s genres matches 15% of the bestsellingmanga.
The results show that sample’s themes matches 68.1818182% of the bestsellingmanga.

set.seed(493024)
samp <- myanimelist %>%
  sample_n(100)

samp_genres <- data.frame(matrix(ncol=2,nrow=0))
colnames(samp_genres) <- c('title','genre')

for (i in 1:nrow(samp)){
  t <- str_extract_all(samp$genres[i],"[A-Za-z]+(-[A-Za-z]+)?( [A-Za-z]+ [A-Za-z]+)?( [A-Za-z]+)?")
  for(m in t){
    for(n in m){
      samp_genres[nrow(samp_genres) + 1, ] = c(samp$title[i], n)
    }
  }
}
samp_genres <- na.omit(samp_genres)

samp_genres$count <- 1
samp_genres_stats100 <- samp_genres %>% group_by(genre) %>%
  summarise(count=sum(count),
            .groups = 'drop')
samp_genres_stats100 <- samp_genres_stats100[order(samp_genres_stats100$count, decreasing = TRUE), ]

t <- 0 
for(x in 1:nrow(samp_genres_stats100)){
  if(samp_genres_stats100$genre[x] %in% head(top_genres$genre, 5)){
    t <- t + samp_genres_stats100$count[x]
  }
}

samp_themes <- data.frame(matrix(ncol=2,nrow=0))
colnames(samp_themes) <- c('title','theme')

for (i in 1:nrow(samp)){
  t <- str_extract_all(samp$themes[i],"[A-Za-z]+(-[A-Za-z]+)?( [A-Za-z]+ [A-Za-z]+)?( [A-Za-z]+)?")
  for(m in t){
    for(n in m){
      samp_themes[nrow(samp_themes) + 1, ] = c(samp$title[i], n)
    }
  }
}
samp_themes <- na.omit(samp_themes)

samp_themes$count <- 1
samp_themes_stats100 <- samp_themes %>% group_by(theme) %>%
  summarise(count=sum(count),
            .groups = 'drop')
samp_themes_stats100 <- samp_themes_stats100[order(samp_themes_stats100$count, decreasing = TRUE), ]

t <- 0 
for(x in 1:nrow(samp_themes_stats100)){
  if(samp_themes_stats100$theme[x] %in% head(top_themes$theme, 5)){
    t <- t + samp_themes_stats100$count[x]
  }
}

Above is the code for sample size 100.
The results show that sample’s genres matches 13.6612022% of the bestsellingmanga.
The results show that sample’s themes matches 47.1698113% of the bestsellingmanga.

With the results from the three sample sizes, we can see that the higher the sample size, the lower the percentage of genres/themes that match the top five in bestsellingmanga. None of these samples reach past 70% meaning there are a lot of genres/themes that manga can fit into and the odds of writing for something more likely to get popular is low.

Conclusion

The three variables I decided to look: if there’s an anime adaptation, it’s genres, and it’s themes where all shown to have an affect on the success of a new manga to varies degree.

Anime adaptations was the strongest factor as it has a 17:82 (without/with an adaptation) for best selling mangas which resulted in a ~82.82% chance that a best-selling manga had an anime. This is probably due to people who watch a lot of anime will see this new show/movie and if they like it, will check out the manga behind it.

Genres & Themes had clear subject topics that a lot of the best selling mangas fall under. This can not be a __. There are certain genres that the public or the manga reading community gravitate_ towards more. So if your new manga happens to fall in those topics, it could have more eyes/readers looking at it because it’s writing about something popular.

Lessons learned

For my one feature that we did not talk about in class, I created my presentation through RMarkdown presentation. I did not know that was something you can do until I saw it as an example. It was super easy to use especially since I can just transfer my code directy from my RMarkdown document. My big takeaway/next step would be to figure out how to change font size or crop words/code because some of my slides had lines go off the page.

For my clean up and analysis, I used a lot of for loops that slowed my computer down especially my failed attempt at incorporating myanimelist_anime where I tried to for loop twice through the 64,833 rows of myanimelist_manga. My takeway for this is learning how to do the calculations I did without for loops. For example, I realized after but for the sample size percentag calculations, could have done sum(str_detect(samp\(genres, head(top_genres\)genre, 5)) for each genre (or theme) in the sample size and added it up.

Next steps in general for this project would be diving more into testing samples using probability and testing more variables that might have also be big success factors (i.e. start_date, anime_length).