PANELS & PAGES

A Visual Analysis of the Global Comic Book Industry

Isaac Djabate — April 2026


Exploring 10,000 comics across 21 countries

This report explores a dataset of comic books sourced from Kaggle, spanning from 2000 to 2026 across 21 countries. This report will analyze the comic book industry to answer the following questions:

#Loading the data set 
comics <- read.csv("comic_books_10000_dataset.csv")




comics <-  comics%>%
  #Splitting the Genre into two seperates Genres(Primary and Secondary)
  separate(Genre, into = c("Primary.Genre", "Secondary.Genre"), sep = " / ", extra = "merge", fill = "right") %>%
  # Convert numeric column from char to int/num type
  mutate(Release.Year = as.numeric(Release.Year),
         Page.Count = as.numeric(Page.Count),
         Rating..out.of.10. = as.numeric(Rating..out.of.10.),
         Volume.Count = as.numeric(Volume.Count)) %>%
  # Simply Column into Winner, Nominee or None
  mutate(Award.Status = case_when(
  str_detect(Awards, "Winner") ~ "Winner",
  str_detect(Awards, "Nominee") ~ "Nominee",
  TRUE ~ "None")) %>%
  # Group Year of release into decade Eras(2000s, 2010s, 2020s)
  mutate(Era = case_when(
  Release.Year < 2010 ~ "2000s",
  Release.Year < 2020 ~ "2010s",
  TRUE ~ "2020s"
)) %>%

  # Order Age from least mature to most 
mutate(Age.Rating = factor(Age.Rating, levels = c("All Ages", "Teen+", "Young Adult", "Mature", "Mature 17+")))%>%
  # Drop column with no value
  filter(!is.na(Rating..out.of.10.), !is.na(Page.Count), !is.na(Release.Year))

Global Genre Distribution

# count comic from Genre and keep top 10
genre_counts <- comics %>%
  count(Primary.Genre) %>%
  slice_max(n, n = 10)

#Horizontal bar charts 
bar_plot <- ggplot(genre_counts, aes(x = n, y = reorder(Primary.Genre, n),
                         text = paste("Genre:", Primary.Genre,
                                      "<br>Count:", n))) +
  geom_col(fill = "#2E5FA3")  +  #blue bars 
  labs(title = "Top 10 Comic Book Genres",
       x = "Number of Comics",
       y = "Genre") +
  scale_x_continuous(expand = expansion(mult= c(0,0.1))) +  # Make space for labels
  theme_comic() 

ggplotly(bar_plot, tooltip = "text") %>%
  layout(
    paper_bgcolor = "white",
    plot_bgcolor = "white"
  )

The superhero and Action genres have the most comic numbers. The Golden and Silver Ages of comics played a massive role in their rise. However, which countries are responsible for producing these comics?



Geographic Distribution of Comic Production

#Loading the World Map using the SF library 
world <- ne_countries(scale = "medium", returnclass = "sf")


# Counts the comics per country and map names to match the country
country_counts <- comics %>%
  count(Country.of.Origin) %>%
  mutate(Country.of.Origin = recode(Country.of.Origin,
  "USA" = "United States of America",
  "UK" = "United Kingdom",
  "South Korea" = "South Korea"))

# Join comics onto the map by left join
world_comics <- world %>%
  left_join(country_counts, by = c("name_en" = "Country.of.Origin"))

ggplot(world_comics) +
  geom_sf(aes(fill = n), color = "white", size = 0.1) +
  scale_fill_gradient(low = "#B8D4F5", high = "#1B2A4A", 
                      na.value = "#E2E8F0",
                      trans = "log10",
                      name = "Number of Comics") +
  labs(title = "Global Comic Book Production by Country",
       subtitle = "Countries shaded by numbers of comics produced") +
  theme_comic()

Japan leads with the most comics, followed by the USA, creating a close gap. Comic production is significant across Asia, while the rest of the world produces little. Given the global popularity of Manga, the industry underwent a massive shift toward it. However, when exactly did that shift occur?



Comic Production Over Time

# Counting comics per year and combine them by era
year_counts <- comics %>%
  count(Release.Year, Era)
year_counts$Era <- factor(year_counts$Era, levels = c("2000s", "2010s", "2020s"))
#Stacked Area chart showing comic production  over time in each era 
area_plot <- ggplot(year_counts, aes(x = Release.Year, y = n , fill = Era)) +
  geom_area(alpha = 0.8, stat = "identity") +
  scale_fill_manual(values = c("2000s" = "#89CFF0",
                               "2010s" = "#2E5FA3",
                               "2020s" = "#7C5CBF"
                               ),
                    drop = FALSE)+ # Unique Color per era
  scale_x_continuous(breaks = seq(2000, 2026, by = 5)) + # Year axis break
  labs(title = "Comic Production Over Time", 
       subtitle = "Number of comics published per year by era",
       x = "Year",
       y = "Number of Comics") +
  theme_comic() + 
  transition_reveal(Release.Year)

animate(area_plot, duration = 12, fps = 60, renderer = gifski_renderer())

anim_save("area_chart.gif")

A steady increase in production occurred from the 2000s to the 2010s. Comic book production not only grew significantly during the COVID-19 Pandemic but also peaked in 2023. With such growth, which demographic invested more time in comics?



Rating Distribution by Age Rating

# Violin Plot showing rating distribution by age rating
ggplot(comics, aes(x = Age.Rating, y= Rating..out.of.10., fill = Age.Rating))+
  geom_violin(alpha= 0.7) +  # Violion showing distribution shape 
  geom_boxplot(width=0.1, alpha = 0.5) +  # Boxplot showing median and quartiles 
  scale_fill_manual(values = c("All Ages" = "#B8D4F5",
                              "Teen+" = "#5B8DD9",
                              "Young Adult" = "#2E5FA3",
                              "Mature" = "#7C5CBF",
                              "Mature 17+" = "#1B2A4A")) +    # Color Brewer  for distinct color palettes
  labs(title = "Comic Rating Distribution by Age Rating",
       x = "Age Rating",
       y = "Rating (out of 10)") +
  theme_comic()

Ratings do not differ across age groups and do not affect the quality of the comics. “Teens” comics have the most consistent ratings.



Genre Production by Country

# Defining the top countries and genre to avoid data errors 
top_countries <- c("Japan", "USA", "South Korea", "UK", "Canada", "Australia")
top_genres <- c("Superhero", "Action", "Horror", "Romance", "Fantasy", "Sci-Fi", "Comedy")

# filter and count comics per country and genre combination
heatmap_data <- comics %>%
  filter(Country.of.Origin %in% top_countries,
         Primary.Genre %in% top_genres) %>%
  count(Country.of.Origin, Primary.Genre)

# Heatmap shows genre production by country 
heatmap_plot <- ggplot(heatmap_data, aes(x = Country.of.Origin, y = Primary.Genre, fill = n,
                                         text = paste("Country:", Country.of.Origin,
                                     "<br>Genre:", Primary.Genre,
                                     "<br>Count:", n))) +
  geom_tile(color = "white")+ # White border between tiles
  geom_text(aes(label = n), color = "white", size = 3)+ # Count labels on tiles
  scale_fill_gradient(low = "#B8D4F5",
                      high = "#1B2A4A",
                      name = "Number of Comics") + # Color scale 
  labs(title= "Comic Genre by Country",
       x = "Country of Origin",
       y = "Primary Genre") + 
  theme_comic()

ggplotly(heatmap_plot,tooltip = "text") %>%
  layout(
    paper_bgcolor ="white",
    plot_bgcolor = "white"
  )

The USA’s main comic genre is superheroes, which, given how that genre pioneered comics. While the USA has massive success in one genre, Japan offers a wider variety. The West put all its eggs in one basket, and Japan focused more on genre diversity, thereby influencing the rise of Manga worldwide.



Average Rating by Country

# Calculating average rating per country(excluding combined entries) 
# keeping only top 10 rated countries
country_ratings <- comics %>%
  filter(!str_detect(Country.of.Origin, "/")) %>% #Remove combined entries like France/Iran
  group_by(Country.of.Origin) %>%
  summarize(Avg.Rating = mean(Rating..out.of.10., na.rm = TRUE)) %>%
  arrange(desc(Avg.Rating)) %>%
  slice_max(Avg.Rating, n = 10)

# Lollipop Chart showing averages raintg per country 
lolipop_chart <- ggplot(country_ratings, aes(x = Avg.Rating, y = reorder(Country.of.Origin, Avg.Rating),
                                             text = paste("Country:", Country.of.Origin,
                                                          "<br>Avg Rating:", round(Avg.Rating, 2)))) +
geom_segment(aes (x = 7.8, xend = Avg.Rating,  #drawing the stick
             y = reorder(Country.of.Origin, Avg.Rating),
             yend = reorder(Country.of.Origin, Avg.Rating)),
             color = "#5B8DD9", size = 1) +
  geom_point(color = "#1B2A4A", size = 4)+ # Draw the dot
  scale_x_continuous(limits = c(7.8,8.3),
                     breaks = seq(7.8,8.3, by = 0.1)) + # Custom Axis Range
  labs(title = "Average Comic Rating by Country",
       x = "Average Rating(out of 10)",
       y = "Country")+
  theme_comic()

ggplotly(lolipop_chart, tooltip = "text") %>%
  layout(
    paper_bgcolor = "white",
    plot_bgcolor = "white"
  )

From Japan’s point of view, production volume correlates with the overall quality of comics, while the USA lags in ratings.



Comic Book Color Styles

# Counting Comics per coloring style, sorted by frequency
color_counts <- comics %>%
  count(Theme..Color.Style.) %>%
  arrange(desc(n))

#Convert to named vector and scale to waffle rendering 
#( Dividing by 50 so each square represent 50 comics)
color_vec <- setNames(round(color_counts$n/ 50), color_counts$Theme..Color.Style.)

#Waffle charts showing proportion of colors 
waffle(color_vec, rows = 8,
       title = "Comic Book Color Styles",
       xlab = "1 square = ~50 comics",
       colors = c("#1B2A4A", "#2E5FA3", "#5B8DD9", 
                  "#7C5CBF", "#A78BDB", "#3ABFA8",
                  "#E8C547", "#E8543A"))

The black-and-white colour scheme dominates the medium. Given the growth of Manga, it is no surprise that black-and-white is dominant.



Publication Status by Country

# Filtering to top 6 countries and their comic counts per publication status
sankey_data <- comics %>%
  filter(Country.of.Origin %in% c("Japan", "USA", "South Korea", "UK", "Canada", "Australia")) %>%
  count(Country.of.Origin, Status)
# Sankey diagram showing  flow from country to population 
ggplot(sankey_data, aes(axis1 = Country.of.Origin, axis2= Status, y = n)) +
  geom_stratum() +   # Draw the blocks
  geom_alluvium(aes(fill = Country.of.Origin), alpha = 0.7) + # Draw the flow
  geom_text(stat = "stratum", aes(label= after_stat(stratum))) + # label blocks
  scale_x_discrete(limits = c("Country", "Status")) +# Labeling the axes
  scale_fill_manual(values = c("Japan" = "#1B2A4A",        # Ink Black
                              "USA" = "#2E5FA3",          # Hero Blue
                              "South Korea" = "#7C5CBF",  # Villain Purple
                              "UK" = "#3ABFA8",           # Power Teal
                              "Canada" = "#E8C547",       # Action Gold
                              "Australia" = "#5B8DD9")) +    # Sky Blue
  labs(title= "Comic Publication Status By Country",
       y = "Number of Comics") +
  theme_comic()

The USA has plenty of ongoing comics. However, Japan still has many comics that are either cancelled or on hiatus, which is likely due to the harsh working conditions and standards in Manga magazines.



Page Count vs Rating

# Scatterplot  showing relationship between page count and rating 
# colored by age rating with a linear trend line 
set.seed(42)
comics_sample <- comics %>% slice_sample(n = 500)
# Calculating Linear Trend

lm_fit <- lm(Rating..out.of.10. ~ Page.Count, data = comics_sample)
trend_line <- data.frame(
  Page.Count= seq(0, 1000, length.out = 100)
)

trend_line$Rating..out.of.10. <- predict(lm_fit, newdata = trend_line)
scatter_plot <- ggplot(comics_sample, aes(x = Page.Count, y = Rating..out.of.10. , color = Age.Rating,
                          text = paste("Page Count:", Page.Count,
                                        "<br>Rating:", Rating..out.of.10.,
                                        "<br>Age Rating:", Age.Rating))) + 
  geom_point(alpha = 0.6, size = 2.5) + # Transparent points for overplotting 
   coord_cartesian(xlim = c(0, 1000)) +
  scale_color_manual(values = c("All Ages" = "#3ABFA8",
                               "Teen+" = "#2E5FA3",
                               "Young Adult" = "#E8C547",
                               "Mature" = "#7C5CBF",
                               "Mature 17+" = "#E8543A"),
                     name = "Age Rating") +
  geom_line(data = trend_line, aes(x = Page.Count, y = Rating..out.of.10.), color = "black", linewidth = 1, inherit.aes = FALSE) + # Linear Line 
  labs(title = "Page Count vs Rating by Age Rating",
       x = "Page Count",
       y = "Rating(out of 10)") +
  theme_comic()

ggplotly(scatter_plot, tooltip = "text") %>%
  layout(
    paper_bgcolor = "white",
    plot_bgcolor = "white",
    font = list(family = "nunito")
  )

The graph demonstrates a positive relationship between page count and comic rating: comics with more pages tend to be rated higher. However, the slope is gentle, indicating that page count is not a particularly strong predictor of quality.



Award Winners vs Nominees

# Calculate Average rating per country and award status 
dumbbell_data <- comics %>%
  filter(Country.of.Origin %in% c("Japan", "USA", "South Korea", "UK", "Canada", "Australia")) %>%
  group_by(Country.of.Origin, Award.Status) %>%
  summarize(Avg.Rating = mean(Rating..out.of.10., na.rm = TRUE)) 

# Pivot to wide format so Winner and Nominee are seperate columns
dumbbell_wide <- dumbbell_data %>%
  filter(Award.Status != "None") %>%
  pivot_wider(names_from = Award.Status, values_from = Avg.Rating)

# Dumbbell chart comparing Winner vs Nominee ratings by country
dumbbell_plot <- ggplot(dumbbell_wide, aes(y = reorder(Country.of.Origin, Winner),
                                           text = paste(
                                             "Country:", Country.of.Origin,
                                             "<br>Winner Avg:", round(Winner,2),
                                             "<br>Nominee Avg:", round(Nominee, 2))
                                           )) +
  geom_segment(aes(x = Nominee, xend = Winner, # draw line between points 
                   yend = reorder(Country.of.Origin, Winner)),
               color ="grey50", size = 1) +
  geom_point(aes(x = Nominee, color = "Nominee"),size = 4) + # Nominee dot 
  geom_point(aes(x = Winner, color = "Winner"), size = 4) +
  scale_color_manual( values = c("Winner" = "#1B2A4A", "Nominee" = "#5B8DD9"), # Winner dot 
                      name = "Award Status") + 
  labs(title = "Average Rating: Awards Winners vs Nominees by Country",
       x = "Average Rating",
       y = "Country")+
  theme_comic()

ggplotly(dumbbell_plot, tooltip ="text") %>%
  layout(
    paper_bgcolor = "white",
    plot_bgcolor = "white"
  )

Winners and nominees have similar ratings, except in Canada. This analysis shows that comic production has grown significantly, that Japan and the USA dominate the industry, and neither age rating nor page count significantly affects comic quality.

References

AI Prompts

The following prompts were used for code optimization and problem solving: