In this project, I used network analysis to explore the structure of the Twitch streamer network. I worked with a dataset from the Stanford Large Network Dataset Collection, focusing on connections between streamers like follows or collaborations. I sampled 20,000 connections to keep the graph readable and extracted the largest connected component for analysis.
I applied centrality and community detection methods to understand who the most connected and influential streamers are, and how they form communities. The network was visualized using basic igraph layouts, and the results show the presence of hub-like users and over 90 distinct communities.
library(igraph)
##
## Attaching package: 'igraph'
## The following objects are masked from 'package:stats':
##
## decompose, spectrum
## The following object is masked from 'package:base':
##
## union
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:igraph':
##
## as_data_frame, groups, union
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
edges_data <- read.csv("large_twitch_edges.csv")
features_data <- read.csv("large_twitch_features.csv")
set.seed(123)
sample_edges <- edges_data[sample(nrow(edges_data), min(20000, nrow(edges_data))), ]
# Create graph from sampled edge list
twitch_graph <- graph_from_data_frame(sample_edges, directed = FALSE)
This dataset comes from the Stanford Large Network Dataset Collection (https://snap.stanford.edu/data/). The “edges” data represents mutual follows or collaborations between Twitch streamers (nodes). Each vertex is a streamer, and an edge represents an observed connection between two users. The dataset was collected in 2018 and is widely used in network analysis research.
largest_component <- components(twitch_graph)$membership == which.max(table(components(twitch_graph)$membership))
largest_subgraph <- induced_subgraph(twitch_graph, vids = V(twitch_graph)[largest_component])
degree_vals <- degree(largest_subgraph)
plot(largest_subgraph,
vertex.size = log1p(degree_vals) * 2,
vertex.color = "skyblue",
vertex.label = NA,
edge.width = 0.2,
edge.color = "gray70",
layout = layout_with_fr(largest_subgraph),
main = "Twitch Network Visualization")
communities <- cluster_fast_greedy(largest_subgraph)
plot(largest_subgraph,
vertex.color = membership(communities),
vertex.size = 3,
vertex.label = NA,
layout = layout_with_fr(largest_subgraph),
edge.color = "lightgray",
main = "Community Detection (Fast Greedy)")
degree_c <- degree(largest_subgraph)
betweenness_c <- betweenness(largest_subgraph)
cat("Top 5 by Degree Centrality:\n")
## Top 5 by Degree Centrality:
print(sort(degree_c, decreasing = TRUE)[1:5])
## 61862 32338 71050 125642 110345
## 107 102 94 74 65
cat("Top 5 by Betweenness Centrality:\n")
## Top 5 by Betweenness Centrality:
print(sort(betweenness_c, decreasing = TRUE)[1:5])
## 155127 52703 40604 119034 81147
## 7606043 6728911 5180275 5170467 4821334
cat("Network Properties Summary:\n")
## Network Properties Summary:
cat("Nodes:", vcount(largest_subgraph), "\n")
## Nodes: 9135
cat("Edges:", ecount(largest_subgraph), "\n")
## Edges: 9302
cat("Average Degree:", round(mean(degree(largest_subgraph)), 2), "\n")
## Average Degree: 2.04
cat("Diameter:", diameter(largest_subgraph), "\n")
## Diameter: 51
cat("Avg Path Length:", round(mean_distance(largest_subgraph), 2), "\n")
## Avg Path Length: 15.25
cat("Clustering Coefficient:", round(transitivity(largest_subgraph), 3), "\n")
## Clustering Coefficient: 0
cat("Communities Detected:", length(communities), "\n")
## Communities Detected: 90
hist(degree(largest_subgraph), breaks = 50, col = "lightblue",
main = "Histogram of Node Degrees",
xlab = "Degree")
This project highlights how network analysis can reveal important patterns in how Twitch streamers are connected, how they form communities, and how influence spreads across the platform. The analysis showed that the Twitch network has a scale-free structure, meaning a few streamers have a very high number of connections while most have only a few. These central users are likely more visible and influential, whether due to their popularity, consistent streaming, or collaborations. Additionally, users with high betweenness centrality were shown to act as bridges between different groups, playing a key role in connecting otherwise separate communities.
One of the main takeaways is that social connections on Twitch can significantly impact a streamer’s growth and visibility. Being part of a community or collaborating with highly connected users may help streamers reach new audiences and strengthen their position within the network. For new or aspiring streamers, this means that growing a channel is not just about producing content but also about building meaningful relationships. For researchers and platform developers, understanding these network patterns can help improve how content is recommended, how new creators are supported, and how digital communities are shaped over time. This project shows that even using simple methods and tools, it is possible to uncover valuable insights into how influence and discovery work in online social spaces like Twitch.