Project Summary

This project uses R to explore and visualize a network of artist collaborations based on Spotify’s Top 200 chart data. The primary goal is to identify clusters of artists who frequently collaborate and highlight key influencers in the network using graph theory techniques. The network is constructed where each node represents an artist, and an edge between nodes indicates a shared appearance on a song (e.g., a collaboration or featured artist).

By simulating this structure and analyzing its properties, I apply skills learned in previous assignments like the sparrow and teacher networks to a real-world cultural dataset. The broader goal is to practice using R and igraph to conduct meaningful social network analysis and produce professional visualizations.

Setup and Data Description

# Load necessary packages
library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.0.4     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(igraph)

## Warning: package 'igraph' was built under R version 4.4.3

## 
## Attaching package: 'igraph'
## 
## The following objects are masked from 'package:lubridate':
## 
##     %--%, union
## 
## The following objects are masked from 'package:dplyr':
## 
##     as_data_frame, groups, union
## 
## The following objects are masked from 'package:purrr':
## 
##     compose, simplify
## 
## The following object is masked from 'package:tidyr':
## 
##     crossing
## 
## The following object is masked from 'package:tibble':
## 
##     as_data_frame
## 
## The following objects are masked from 'package:stats':
## 
##     decompose, spectrum
## 
## The following object is masked from 'package:base':
## 
##     union

library(readr)

# Simulated small sample of Spotify Top 200 collaboration data
# Replace this with parsed Spotify chart data from: https://spotifycharts.com/regional

collab_data <- tibble::tibble(
  song = c("Song A", "Song A", "Song B", "Song B", "Song C", "Song D", "Song D"),
  artist = c("Artist 1", "Artist 2", "Artist 2", "Artist 3", "Artist 4", "Artist 1", "Artist 3")
)

# Join on song to create edge list
edges <- collab_data %>%
  inner_join(collab_data, by = "song") %>%
  filter(artist.x != artist.y) %>%
  select(from = artist.x, to = artist.y) %>%
  distinct()

## Warning in inner_join(., collab_data, by = "song"): Detected an unexpected many-to-many relationship between `x` and `y`.
## ℹ Row 1 of `x` matches multiple rows in `y`.
## ℹ Row 1 of `y` matches multiple rows in `x`.
## ℹ If a many-to-many relationship is expected, set `relationship =
##   "many-to-many"` to silence this warning.

# Create graph object from edge list
g <- graph_from_data_frame(edges, directed = FALSE)

Data Summary:
- Vertices (Nodes): Artists
- Edges (Ties): Appearances on the same song
- Source: Simulated data structure based on real Spotify chart CSVs
- Access: Real data can be obtained from spotifycharts.com

Network Visualization

plot(g,
     vertex.label.cex = 0.8,
     vertex.size = degree(g) * 10,
     vertex.color = "skyblue",
     edge.color = "grey",
     layout = layout_with_fr,
     main = "Spotify Artist Collaboration Network")

This plot shows the collaboration structure among a small group of top-charting artists. Node size reflects degree (number of connections), and layout was selected for visual clarity using the Fruchterman-Reingold algorithm.

Network Analysis

# Network size
cat("Number of nodes:", vcount(g), "\n")

## Number of nodes: 3

cat("Number of edges:", ecount(g), "\n")

## Number of edges: 6

# Degree centrality
deg <- degree(g)
cat("Degree centrality:\n")

## Degree centrality:

print(deg)

## Artist 1 Artist 2 Artist 3 
##        4        4        4

# Community detection
comm <- cluster_louvain(g)
cat("Communities detected:", length(comm), "\n")

## Communities detected: 1

# Plot communities
plot(comm, g, main = "Communities in the Network")

Interpretation:

All artists in this small sample are highly connected (degree of 4), meaning each has collaborated multiple times. Only one community was detected, indicating a tightly-knit cluster. In larger Spotify datasets, we would expect more distinct communities based on genre, region, or label affiliation.

Conclusion

This project demonstrates how collaboration networks among music artists can be analyzed and visualized using R. Using even a small dataset, we see patterns of centrality and clustering. In real-world data, further analysis could uncover genre-specific clusters, regional collaboration trends, and identify key influencers.

Limitations:
- The dataset is small and simulated for demonstration.
- Real Spotify data would require string cleaning and edge filtering.

Next Steps:
- Scale the analysis using multiple weeks/regions from SpotifyCharts.com
- Add genre metadata for deeper insight
- Publish an interactive version or dashboard for exploration

Spotify Artist Collaboration Network

Anthony Adib

Table of Contents