This project uses R to explore and visualize a network of artist collaborations based on Spotify’s Top 200 chart data. The primary goal is to identify clusters of artists who frequently collaborate and highlight key influencers in the network using graph theory techniques. The network is constructed where each node represents an artist, and an edge between nodes indicates a shared appearance on a song (e.g., a collaboration or featured artist).
By simulating this structure and analyzing its properties, I apply
skills learned in previous assignments like the sparrow and teacher
networks to a real-world cultural dataset. The broader goal is to
practice using R and igraph to conduct meaningful social
network analysis and produce professional visualizations.
# Load necessary packages
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.0.4
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(igraph)
## Warning: package 'igraph' was built under R version 4.4.3
##
## Attaching package: 'igraph'
##
## The following objects are masked from 'package:lubridate':
##
## %--%, union
##
## The following objects are masked from 'package:dplyr':
##
## as_data_frame, groups, union
##
## The following objects are masked from 'package:purrr':
##
## compose, simplify
##
## The following object is masked from 'package:tidyr':
##
## crossing
##
## The following object is masked from 'package:tibble':
##
## as_data_frame
##
## The following objects are masked from 'package:stats':
##
## decompose, spectrum
##
## The following object is masked from 'package:base':
##
## union
library(readr)
# Simulated small sample of Spotify Top 200 collaboration data
# Replace this with parsed Spotify chart data from: https://spotifycharts.com/regional
collab_data <- tibble::tibble(
song = c("Song A", "Song A", "Song B", "Song B", "Song C", "Song D", "Song D"),
artist = c("Artist 1", "Artist 2", "Artist 2", "Artist 3", "Artist 4", "Artist 1", "Artist 3")
)
# Join on song to create edge list
edges <- collab_data %>%
inner_join(collab_data, by = "song") %>%
filter(artist.x != artist.y) %>%
select(from = artist.x, to = artist.y) %>%
distinct()
## Warning in inner_join(., collab_data, by = "song"): Detected an unexpected many-to-many relationship between `x` and `y`.
## ℹ Row 1 of `x` matches multiple rows in `y`.
## ℹ Row 1 of `y` matches multiple rows in `x`.
## ℹ If a many-to-many relationship is expected, set `relationship =
## "many-to-many"` to silence this warning.
# Create graph object from edge list
g <- graph_from_data_frame(edges, directed = FALSE)
Data Summary:
- Vertices (Nodes): Artists
- Edges (Ties): Appearances on the same song
- Source: Simulated data structure based on real
Spotify chart CSVs
- Access: Real data can be obtained from spotifycharts.com
plot(g,
vertex.label.cex = 0.8,
vertex.size = degree(g) * 10,
vertex.color = "skyblue",
edge.color = "grey",
layout = layout_with_fr,
main = "Spotify Artist Collaboration Network")
This plot shows the collaboration structure among a small group of top-charting artists. Node size reflects degree (number of connections), and layout was selected for visual clarity using the Fruchterman-Reingold algorithm.
# Network size
cat("Number of nodes:", vcount(g), "\n")
## Number of nodes: 3
cat("Number of edges:", ecount(g), "\n")
## Number of edges: 6
# Degree centrality
deg <- degree(g)
cat("Degree centrality:\n")
## Degree centrality:
print(deg)
## Artist 1 Artist 2 Artist 3
## 4 4 4
# Community detection
comm <- cluster_louvain(g)
cat("Communities detected:", length(comm), "\n")
## Communities detected: 1
# Plot communities
plot(comm, g, main = "Communities in the Network")
All artists in this small sample are highly connected (degree of 4), meaning each has collaborated multiple times. Only one community was detected, indicating a tightly-knit cluster. In larger Spotify datasets, we would expect more distinct communities based on genre, region, or label affiliation.
This project demonstrates how collaboration networks among music artists can be analyzed and visualized using R. Using even a small dataset, we see patterns of centrality and clustering. In real-world data, further analysis could uncover genre-specific clusters, regional collaboration trends, and identify key influencers.
Limitations:
- The dataset is small and simulated for demonstration.
- Real Spotify data would require string cleaning and edge
filtering.
Next Steps:
- Scale the analysis using multiple weeks/regions from
SpotifyCharts.com
- Add genre metadata for deeper insight
- Publish an interactive version or dashboard for exploration