In this project, edges represent collaborations—specifically when artists are featured together on a song—while nodes (or vertices) represent individual artists. The data set includes information on approximately 20,000 primary artists whose songs appeared on the Spotify weekly charts, along with around 136,000 additional artists who were featured on songs with at least one charting artist.
The dataset also captures the frequency and structure of these collaborations, allowing us to generate a large-scale network comprising over 135,000 artists as nodes and more than 300,000 edges representing collaborative links between them.
Data Sources: -Aggregated weekly Spotify chart data was collected from Kworb -Artist and feature data were scraped from the Spotify API.
Temporal Coverage: -Start Date: September 28, 2013 -End Date: October 9, 2022
library(tidyverse)
## Warning: package 'tidyverse' was built under R version 4.4.3
library(igraph)
## Warning: package 'igraph' was built under R version 4.4.3
library(tidygraph)
## Warning: package 'tidygraph' was built under R version 4.4.3
library(ggraph)
## Warning: package 'ggraph' was built under R version 4.4.3
e <- read.csv("edges.csv", header = TRUE, stringsAsFactors = FALSE)
n <- read.csv("nodes.csv", header = TRUE, stringsAsFactors = FALSE)
head(e)
## id_0 id_1
## 1 76M2Ekj8bG8W7X2nbx2CpF 7sfl4Xt5KmfyDs2T3SVSMK
## 2 0hk4xVujcyOr6USD95wcWb 7Do8se3ZoaVqUt3woqqSrD
## 3 38jpuy3yt3QIxQ8Fn1HTeJ 4csQIMQm6vI2A2SCVDuM2z
## 4 6PvcxssrQ0QaJVaBWHD07l 6UCQYrcJ6wab6gnQ89OJFh
## 5 2R1QrQqWuw3IjoP5dXRFjt 4mk1ScvOUkuQzzCZpT6bc0
## 6 0k70gnDBLPirCltbTzoxuM 5FK3qokBQYxr7ZLkr8GVFn
head(n)
## spotify_id name followers popularity
## 1 48WvrUGoijadXXCsGocwM4 Byklubben 1738 24
## 2 4lDiJcOJ2GLCK6p9q5BgfK Kontra K 1999676 72
## 3 652XIvIBNGg3C0KIGEJWit Maxim 34596 36
## 4 3dXC1YPbnQPsfHPVkm1ipj Christopher Martin 249233 52
## 5 74terC9ol9zMo8rfzhSOiG Jakob Hellman 21193 39
## 6 0FQMb3mVrAKlyU4H5mQOJh Madh 26677 19
## genres
## 1 ['nordic house', 'russelater']
## 2 ['christlicher rap', 'german hip hop']
## 3 []
## 4 ['dancehall', 'lovers rock', 'modern reggae', 'reggae fusion']
## 5 ['classic swedish pop', 'norrbotten indie', 'swedish pop']
## 6 []
## chart_hits
## 1 ['no (3)']
## 2 ['at (44)', 'de (111)', 'lu (22)', 'ch (31)', 'vn (1)']
## 3 ['de (1)']
## 4 ['at (1)', 'de (1)']
## 5 ['se (6)']
## 6 ['it (2)']
n_subset <- n %>%
rename(id = spotify_id) %>%
distinct(id, .keep_all = TRUE) %>%
slice(1:1000) #for only first 1000 for faster results
valid_ids <- n_subset$id
edges_filtered <- e %>%
filter(id_0 %in% valid_ids & id_1 %in% valid_ids)
g <- graph_from_data_frame(d = edges_filtered, vertices = n_subset, directed = FALSE)
plot(g,
edge.arrow.size = .4,
edge.arrow.color = "pink",
vertex.color = "green",
vertex.label = NA,
vertex.size = 7)
Graph shows us a tight group of popular artist in the center meaning they are some of the biggest names. Most artist here are connected to a few people. A few people however, help link the rest together through lots of collaborations.
V(g)$degree <- degree(g)
V(g)$betweenness <- betweenness(g)
V(g)$pagerank <- page_rank(g)$vector
top_degree <- V(g)[order(-degree)][1:10]
top_degree_df <- data.frame(
name = V(g)$name[top_degree],
degree = V(g)$degree[top_degree]
)
print("Top 10 artists by degree:")
## [1] "Top 10 artists by degree:"
print(top_degree_df)
## name degree
## 1 The Him 8
## 2 Cardi B 7
## 3 Lil Baby 7
## 4 Nicki Minaj 7
## 5 Dua Lipa 6
## 6 Epik High 6
## 7 Cheat Codes 6
## 8 Bebe Rexha 6
## 9 Juicy J 6
## 10 Kygo 5
top_betweenness <- V(g)[order(-betweenness)][1:10]
top_betweenness_df <- data.frame(
name = V(g)$name[top_betweenness],
betweenness = V(g)$betweenness[top_betweenness]
)
print("Top 10 artists by betweenness:")
## [1] "Top 10 artists by betweenness:"
print(top_betweenness_df)
## name betweenness
## 1 Cardi B 933.2667
## 2 Bebe Rexha 874.6476
## 3 Juicy J 641.8952
## 4 The Him 635.7429
## 5 Kontra K 572.0000
## 6 Nicki Minaj 526.0143
## 7 Logic 395.0000
## 8 Olexesh 390.0000
## 9 Kygo 345.9286
## 10 Cheat Codes 337.6429
top_pagerank <- V(g)[order(-pagerank)][1:10]
top_pagerank_df <- data.frame(
name = V(g)$name[top_pagerank],
pagerank = V(g)$pagerank[top_pagerank]
)
print("Top 10 artists by PageRank:")
## [1] "Top 10 artists by PageRank:"
print(top_pagerank_df)
## name pagerank
## 1 Epik High 0.007897324
## 2 The Him 0.007835590
## 3 Gusttavo Lima 0.007436214
## 4 Carlos Rivera 0.007003816
## 5 Nicki Minaj 0.006894586
## 6 Dua Lipa 0.006655025
## 7 Cardi B 0.006384164
## 8 Lil Baby 0.006348447
## 9 Common 0.005889330
## 10 Juicy J 0.005764036
Top degree artist - The Him. The top artist here has worked with the most other artists. They are very active in collaborations. Top betweeness artist - Cardi B. This artist connects different groups of artists. They often work with people who don’t usually work together. Top pagerank artist - Epik High. This artist is very influential in the network. They are connected to other important artists.
louvain_clusters <- cluster_louvain(g)
V(g)$community <- louvain_clusters$membership
community_df <- data.frame(
name = V(g)$name,
community = V(g)$community
)
print("Sample of artists and their community:")
## [1] "Sample of artists and their community:"
print(head(community_df, 10))
## name community
## 1 Byklubben 1
## 2 Kontra K 2
## 3 Maxim 2
## 4 Christopher Martin 3
## 5 Jakob Hellman 4
## 6 Madh 5
## 7 Juice 6
## 8 Nehuda 7
## 9 VovaZiLvova 8
## 10 Nata Record 9
We used the Louvain algorithm to find communities in the artist network. Each community is a group of artists who tend to collaborate with each other more often. This helps us see how the music industry is organized into smaller groups or genres, and which artists belong to which group.
top_artists <- V(g)[order(-V(g)$degree)][1:10]
top_artists_info <- data.frame(
name = V(g)$name[top_artists],
degree = V(g)$degree[top_artists],
popularity = V(g)$popularity[top_artists],
followers = V(g)$followers[top_artists]
)
print(top_artists_info)
## name degree popularity followers
## 1 The Him 8 56 107823
## 2 Cardi B 7 80 20361435
## 3 Lil Baby 7 89 11530234
## 4 Nicki Minaj 7 87 26039960
## 5 Dua Lipa 6 88 36163788
## 6 Epik High 6 56 630279
## 7 Cheat Codes 6 71 2087675
## 8 Bebe Rexha 6 81 7531004
## 9 Juicy J 6 73 2891371
## 10 Kygo 5 80 8134874
This table shows the top 10 artists with the most collaborations in the network. These artists are highly active and connected in the music industry.
This analysis highlights the structure and key players in the Spotify artist collaboration network. Using community detection and centrality metrics, we gain insight into how interconnected the music industry is and which artists play central roles in shaping collaboration trends.