1 Narrative Summary

This project explores character social relationships within the Marvel Universe network analysis techniques. The data set, which contains multitudes of characters that have appeared in comic books throughout the years, was used to make an undirected network where the top 100 Marvel characters with the most appearances with other Marvel characters. The goal of this project was to find central characters in the network and to find communities that could indicate potential superhero teams or affiliations.

Since the data set was so huge, I limited the network visualization to the top 100 Marvel characters with the most appearances to get a better picture of key characters. I also calculated some centrality measures to measure how important a character is for the plot line but also the network structure. The top 3 heroes for both measures consisted of Spider-Man, Iron-Man, and Captain America. I also did apply some community detection to see how many different communities there are within the network which could be to already established hero or villain teams.

2 Setup Code

The dataset used was from Kaggle (https://www.kaggle.com/datasets/csanhueza/the-marvel-universe-social-network). Specifically the hero-network.csv which was collected by César Sanhueza. The edges represent a co-appearance in a comic book issue and the vertices represent a Marvel character.

knitr::opts_chunk$set(echo = TRUE)
library(igraph)
library(tidyverse)
install.packages("igraph")
getwd()
## [1] "C:/Users/jayso/OneDrive/Documents/UWB/Network Analysis"
setwd("C:/Users/jayso/OneDrive/Documents/UWB/Network Analysis")

3 Network Visualizations

#reading in the dataset
hero_df <- read.csv("hero-network.csv", header = T, as.is = T)
head(hero_df)
##                  hero1                hero2
## 1        LITTLE, ABNER       PRINCESS ZANDA
## 2        LITTLE, ABNER BLACK PANTHER/T'CHAL
## 3 BLACK PANTHER/T'CHAL       PRINCESS ZANDA
## 4        LITTLE, ABNER       PRINCESS ZANDA
## 5        LITTLE, ABNER BLACK PANTHER/T'CHAL
## 6 BLACK PANTHER/T'CHAL       PRINCESS ZANDA
#create full graph object(undirected)
hero_graph <- graph_from_data_frame(hero_df, directed = F)


# Use gather() to make the data long to count character appearances
hero_long <- hero_df %>%
  gather(position,hero, "hero1", "hero2")

# Count appearances for 100
top_heroes <- hero_long %>%
  group_by(hero) %>%
  summarize(count = n()) %>%
  arrange(desc(count)) %>%
  slice_max(count, n = 100) #only keeping the top 100


#filtering top 100 in the graph object
hero_df_small <- hero_df %>%
  filter(hero1 %in% top_heroes$hero & hero2 %in% top_heroes$hero) 

#create graph object with top 100 characters
hero_graph_small <- graph_from_data_frame(hero_df_small, directed = FALSE)
#simplifying graphs by removing loops
hero_graph_small <- igraph::simplify(hero_graph_small, remove.multiple = T, remove.loops = T)
hero_graph <- igraph::simplify(hero_graph, remove.multiple = T, remove.loops = T)

#plotting top 100 
plot(hero_graph_small,
     vertex.label.cex = 1,
     vertex.size = 7,
     vertex.color = "tomato",
     edge.color = "gray70",
     main = "Top 100 Marvel Characters Network")

# Count appearances for 100
heroes_25 <- hero_long %>%
  group_by(hero) %>%
  summarize(count = n()) %>%
  arrange(desc(count)) %>%
  slice_max(count, n = 25) #only keeping the top 100

4 Network Analysis

The network visualization showcases Marvel characters with the most appearances in the comic books. Some of these characters are lesser known characters, such as Patriot or Miss America, that do not necessarily appear in films or have their respective merchandise like Spider-Man.

The degree centrality shows how many direct connections that a Marvel character has to other characters. The characters with high degree centrality tend to be central characters in the Marvel Universe and usually appear with other characters whether they are popular or not. The top 3 characters with degree centrality are Captain America, Spider-Man, and Iron Man. These characters can help introduce other lesser known characters.

The betweenness centrality shows what characters act as bridges to other characters and other parts of the network. The top 3 characters with betweenness centrality are Spider-Man, Captain America, and Iron Man. These characters can help bridge other parts of the social network.

Community detection was used to find group structures within the network. Fast Greedy and Louvain are algoriths used to find these group structures. Within this Marvel social network, we can see that there are several communities that could correspond to hero or villain teams and alliances.

#Centrality Measures for both the top 100 and full dataset
hero_degree <- degree(hero_graph_small, mode="total")
hero_between <- betweenness(hero_graph_small, directed = F, normalized = F)
hero_between_big <- betweenness(hero_graph, directed = F, normalized = T)
hero_degree_big <- degree(hero_graph, mode="total")

#tidy data frame of centrality measures for the full dataset
hero_dat <- data.frame(
  hero = names(hero_degree_big),
  hero_degree_big = hero_degree_big,
  hero_between_big = hero_between_big)

#top 10 heroes by degrees
hero_dat %>%
  slice_max(hero_degree_big, n = 10) %>%
  ggplot(aes(x = reorder(hero, hero_degree_big), y = hero_degree_big)) +
  geom_col(fill = "steelblue") +
  coord_flip() +
  labs(
    title = "Top 10 Heroes by Degree Centrality",
    x = "Hero",
    y = "Degree"
  ) +
  theme_minimal()

#top 10 heroes by betweenness
hero_dat %>%
  slice_max(hero_between_big, n = 10) %>%
  ggplot(aes(x = reorder(hero, hero_between_big), y = hero_between_big)) +
  geom_col(fill = "steelblue") +
  coord_flip() +
  labs(
    title = "Top 10 Heroes by Betweenness Centrality",
    x = "Hero",
    y = "Betweenness"
  ) +
  theme_minimal()

# community detection
fast_big<- cluster_fast_greedy(hero_graph)
modularity(fast_big)
## [1] 0.3583415
length(fast_big)
## [1] 68
sizes(fast_big)
## Community sizes
##    1    2    3    4    5    6    7    8    9   10   11   12   13   14   15   16 
##  266 1368 1716 2368   21    8   10   68   52    9    6    6    5    3    9    8 
##   17   18   19   20   21   22   23   24   25   26   27   28   29   30   31   32 
##    6   13   10   17    8    7  118    6    5   40    8    8    7    9    5   17 
##   33   34   35   36   37   38   39   40   41   42   43   44   45   46   47   48 
##    7    9    4    8    5    5    8    4    7    5    6   31    5    5    4   32 
##   49   50   51   52   53   54   55   56   57   58   59   60   61   62   63   64 
##    5    5    4    4    4    5    5    3    3    3    4    3    4    5    3    3 
##   65   66   67   68 
##    3    3    3    2
multi_big <- cluster_louvain(hero_graph)
modularity(multi_big)
## [1] 0.4227405
length(multi_big)
## [1] 24
sizes(multi_big)
## Community sizes
##    1    2    3    4    5    6    7    8    9   10   11   12   13   14   15   16 
## 1196 1109  252  281 1364  480  362  530  456  105   34    9   31   11   23    9 
##   17   18   19   20   21   22   23   24 
##  118   21    8    8    7    5    5    2

5 Conclusion

This network analysis of Marvel Universe characters was able to show me how many different tightly knit communities there are within the network as well as what characters appeared the most with other characters. The centrality measures helped identify which characters are more popular and which ones act more as a bridge to other characters in other parts of the network. So, if the Marvel Universe wanted to introduce more unknown characters organically, ideally they would appear to either Spider-Man, Captain America, or Iron Man.

Since the dataset was so huge, I had to reduce the size of the dataset in order for my computer to be able to render and calculate the network. Maybe with a more powerful computer and more data cleaning, the whole dataset could be analyzed. There are so many possibilities with this dataset such as making the visualization interactive. There are also other datasets that could be used but since this dataset was so computationally intensive, full analysis of all datasets would require a powerful computer. This network analysis not only showcased central characters but also these measures and analysis can be used to explore storytelling through seeing character social network structures. In conclusion, this project showcased data visualization but also making sense of fictional social networks.