Introduction

The Marvel Universe is home to thousands of unique characters, of iconic superheroes and villains, many of who appear together across various different mediums such as comics, films, and other media types. These character connections play a key role in storytelling, fan engagement, and marketing. To conduct this analysis, we’re using the hero-network.csv dataset from the Marvel Universe Social Network from Kaggle. This data set represents the social network of Marvel Comics Characters between the 1960’s to around the early 2000’s. This dataset contains around 574,467 edges and 224,181 directed ties. This data was uploaded and collected by Mark Newman. By understanding how these characters are linked together, we can provide valuable insights for creating compelling product bundles that resonate with fans. We can also understand which characters are central to the Universe, which allows us to make appropriate decisions about investing in a storyline or character.

This project aims to use network analysis to explore the relationship between Marvel characters, focusing on how their connections can inform marketing strategies. By identifying these closely connected groups with the Marvel network, we aim to determine which characters are best suited to be promoted or sold together, whether that means through investments of movies, merchandise, collectables, or other products. This data-driven approach offers a new lens for understanding which characters are most appropriate to be grouped together to be optimized for commercial decisions in a vast and interconnected fictional universe. The objectives for this project are to find out which Marvel characters will bring in the highest potential revenue. It will allow us to understand pairings of characters, as well as popular individual characters. As for pairings, we can understand how they should be advertised together through their highest connections. As for individual characters, if we can see that they are highly connected to the Marvel Universe, but we have yet to expand on their franchise, merchandise, or collectables, we can begin to consider investing into these characters. This could potentially increase the income rates for different toy set collaborations. It could also potentially set up a new side of the Marvel Universe that fans have yet to experience, thus further expanding our fanbase and acknowledging our current one. We hope to learn more about the character connections we see in Marvel and how that would expand the Marvel income through movies, merchandise, collectables, and other products.

Code Setup

Packages used: tidyverse, tidygraph, igraph, readr, ggraph, and dplyr

For our code, we used hero-network.csv from the site Kaggle: https://www.kaggle.com/datasets/csucu/marvel-comics-network

The edges in this network represent the connections between two Marvel comic characters. If they have appeared in a comic or a storyline, an edge exists between them. The nodes in our network is a unique character from Marvel. While we don’t know how the data is collected (besides it is from one individual), this data has been used by many because how huge the dataset is.

hero_network <- read_csv("hero-network.csv")
## Rows: 574467 Columns: 2
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): hero1, hero2
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# Ensures there are no missing values for hero1 or hero2, and no self-loops, I alse just named them by superhero name for the sake of visual
df_clean <- hero_network %>%
  filter(!is.na(hero1), !is.na(hero2), hero1 != hero2) %>%
  mutate(
    hero1 = sub("/.*", "", hero1), # Remove everything after the first /
    hero2 = sub("/.*", "", hero2)  # Remove everything after the first /
  )

hero_graph <- graph_from_data_frame(df_clean, directed = FALSE)
hero_tidygraph <- as_tbl_graph(hero_graph)
deg <- degree(hero_graph)
top_nodes_names <- names(sort(deg, decreasing = TRUE))[1:min(20, length(deg))] # top 20
subgraph_top20 <- induced_subgraph(hero_graph, vids = top_nodes_names)

subgraph_tidy <- as_tbl_graph(subgraph_top20) %>%
  mutate(degree_centrality = centrality_degree())

Visual

For the creation of the network, we decided to only include the top 20 connected heroes because on how vast the datasheet is.

set.seed(40)
ggraph(subgraph_tidy, layout = "fr") + 
  geom_edge_link(alpha = 0.25, # lines/edges
                 color = "grey70", 
                 width = 0.5) +
  geom_node_point(aes(fill = degree_centrality), # nodes
                  shape = 21, # circle so I can fill
                  color = "black", 
                  stroke = 0.5, 
                  size = 8) +
  scale_fill_gradient(low = "lightblue", 
                      high = "dodgerblue4") + # hero colors
  geom_node_text(aes(label = ifelse(rank(-degree_centrality), name, NA)), # labels
                 repel = TRUE,
                 size = 3.5,
                 bg.colour = "white",
                 bg.r = 0.1,
                 fontface = "bold") +
  theme_void(base_size = 12) +
  labs(title = "Top 20 Marvel Heroes by Degree Centrality",
       subtitle = "Node color intensity indicates degree within the subgraph. Top 20 labeled.", 
       fill = "Degree (in subgraph)") +
  theme(
    plot.title = element_text(hjust = 0.5, 
                              face = "bold", 
                              size = 16),
    plot.subtitle = element_text(hjust = 0.5, 
                                 size = 10),
  )

Network Analysis Code

For our analysis, we decided to collect data on using all of the network, we tried only using the top 20 but it gave us results we couldn’t compare and use in our work.

Before we collect the data, it is important to reflect on what each line of code does. Degree Centrality measures the number of connections. These will be central figures overall.

Betweenness Centrality measures how often the node lies on the shortest path. High betweenness means different characters act as bridges and connect others. For example, if Wolverine appears often with X-Men and the Avengers (high betweenness), he would be a good choice because he appeals to both possible groups of fans.

Closeness Centrality measures how close each of the nodes are with each other. These are core characters that can reach vast amounts of others directly.

# Top Betweenness
betweenness_cent <- betweenness(hero_graph, normalized = TRUE)
top_betweenness <- sort(betweenness_cent, decreasing = TRUE) %>% head(10)

# Top Closeness
closeness_cent_subgraph <- closeness(hero_graph, normalized = TRUE)
top_closeness_subgraph <- sort(closeness_cent_subgraph, decreasing = TRUE) %>% head(10)

# Top Degree
top_degree_subgraph <- sort(degree(hero_graph), decreasing = TRUE) %>% head(10)

Results from Network Analysis

## [1] "Top 10 Characters by Betweenness Centrality (within hero-network):"
##      SPIDER-MAN CAPTAIN AMERICA        IRON MAN       WOLVERINE     DR. STRANGE 
##      0.12972827      0.09383915      0.05194811      0.05154794      0.04108419 
##            THOR           HAVOK            HULK           THING       DAREDEVIL 
##      0.03745689      0.03691251      0.03609159      0.03306679      0.03286158
## [1] "Top 10 Characters by Closeness Centrality (within hero-network):"
##     LUDLUM, ROSS           ORWELL   ASHER, MICHAEL            FAGIN 
##                1                1                1                1 
##          HOFFMAN           OSWALD      PANTHER CUB         STERLING 
##                1                1                1                1 
##       MISS THING AMAZO-MAXI-WOMAN 
##                1                1
## [1] "Top 10 Characters by Degree Centrality (within hero-network):"
## CAPTAIN AMERICA      SPIDER-MAN        IRON MAN            THOR           THING 
##           16259           13717           11817           11427           10681 
##       WOLVERINE     HUMAN TORCH   SCARLET WITCH   MR. FANTASTIC          VISION 
##           10353           10237            9911            9775            9696

Conclusion

In summary, we have found within our network analysis using betweenness, closeness, and degree measures of centrality the top ten highlighted superheroes that could be used for Marvel income through movies, merchandise, collectables, and other products. As both the measures of degree and betweenness were insightful to find out the potential top sellers as Marvel Characters, closeness seen to focus on the characters that may have been influenced by a smaller subgraph to measure their fastest route to access other characters, so that data wasn’t highlighted in our final conclusion due to the insignificance of information in regards to the other repeated mentioned heroes.

With this analysis, we have learned that there are three heroes that made a significant impact in the network to influence each other and the connections within the Marvel universe. First, Spiderman has the highest betweenness and second highest degree centrality, resulting in him being very well connected and a connector to other heroes. Next was Captain America is our second highest overall as he has the highest degree and second highest betweenness centrality, resulting in him being highly connected throughout the Marvel universe but slightly less of a connector. Our third highlighted hero is Iron Man, as he resulted in third in both betweenness and degree centrality which makes him a significant hero to consider when marketing as well. We had the limitation of highlighting the top 20 heroes due to the immensely large dataset which then dropped it to our top ten contenders which ultimately ended with a cohesive three across our analysis. With more time, I think we would like to explore the influence of heroes in their own franchise and take a deeper look into the dataset in a higher degree. Overall, we have learned that Spiderman, Iron Man, and Captain America are the top three that should be considered when Marvel needs high marketing heroes.