library(spotifyr)
library(dplyr)
library(tidyr)
library(ggplot2)
library(purrr)
library(lubridate)
library(tibble)
library(readr) # first chunkUsing the Spotify API in R
Introduction
In this tutorial I demonstrate how to use the Spotify Web API from R with the help of the spotifyr package. Spotify’s API lets developers programmatically access information about artists, albums, and tracks, including metadata such as release date, duration, and whether a song is marked as explicit.
For this assignment I focus on Beyoncé and her top collaborators. Using the API, I:
pull Beyoncé’s albums and tracks
identify artists who have collaborated with her most often
collect those collaborators’ own catalogues
build a combined dataset and make visualizations about track duration, explicit content, number of artists on a track, and release dates.
This type of workflow would be useful for anyone interested in music analytics, playlist curation, or building recommendation systems.
Setting Up API
To access Spotify’s API:
Visit the Spotify Developer Dashboard.
Log in with your Spotify account.
Click Create App.
Spotify will give you:
A Client ID
A Client Secret
This act like your username/password for the API.
Sys.setenv(SPOTIFY_CLIENT_ID = '57fb7ff22df6403ea4685cecd11125e2') Sys.setenv(SPOTIFY_CLIENT_SECRET = 'e3e39b535a374f67aefd3bc92f0c43f9') access_token <- get_spotify_access_token()
get_artist_albums() sends a GET request to spotify.
The API returns a list of albums, which is converted into a tibble.
distinct(id) removes duplicates.
API Call Demonstration
Getting Tracks from Each Album
map_df() Loops through every Album ID.
get_album_tracks() retrieves all tracks from an album.
Errors are caught using possibly()
Data is combined into one dataframe.
Identifying Beyoncé’s Top Collaborators
This gives me the top 10 artists who have collaborated with Beyonce the most.
Album and Tracks for Collaborators
Combine Beyonce and collaborator Tracks
combining info on Beyonce tracks and info on collaborators tracks with album information, identifying the main artist and then binding all tracks into a single table
Loading Saved Data
This data was generated by my API script and contains
Track names
Artist names
Album metadata
Duration
Explicit label
Number of artists
Collaboration count
bey_data <- read_csv("beyonce_collab_data.csv")
bey_data# A tibble: 1,548 × 11
track_id track_name album_id album_name album_release_date duration_ms
<chr> <chr> <chr> <chr> <date> <dbl>
1 6K2Iut8nSQigP8… AMERIICAN… 6BzxX6z… COWBOY CA… 2024-03-29 325037
2 7eEr7lgWYudwEK… BLACKBIIRD 6BzxX6z… COWBOY CA… 2024-03-29 131949
3 6XXxKsu3RJeN3Z… 16 CARRIA… 6BzxX6z… COWBOY CA… 2024-03-29 227250
4 4dsdSwSdBWjlsV… PROTECTOR 6BzxX6z… COWBOY CA… 2024-03-29 184399
5 486YUqcmF9IWPk… MY ROSE 6BzxX6z… COWBOY CA… 2024-03-29 53442
6 2dv2DYn0V0j0Wx… SMOKE HOU… 6BzxX6z… COWBOY CA… 2024-03-29 50721
7 7wLShogStyDeZv… TEXAS HOL… 6BzxX6z… COWBOY CA… 2024-03-29 233456
8 6Y4rniIxibegzs… BODYGUARD 6BzxX6z… COWBOY CA… 2024-03-29 240254
9 1fp4DIjhHyptQW… DOLLY P 6BzxX6z… COWBOY CA… 2024-03-29 22623
10 2PmMh2t7jAtN6c… JOLENE 6BzxX6z… COWBOY CA… 2024-03-29 189638
# ℹ 1,538 more rows
# ℹ 5 more variables: explicit <lgl>, main_artist_id <chr>,
# main_artist_name <chr>, n_collabs <dbl>, num_artists <dbl>
Analyzing the Spotify Data
Converts duration from milliseconds to minutes
Creates explicit/clean labels
Parses dates
Extracts release year
plot_df <- bey_data %>%
mutate(
duration_min = duration_ms / 60000,
explicit_label = if_else(explicit, "Explicit", "Clean"),
album_release_date = lubridate::ymd(album_release_date, quiet = TRUE),
release_year = lubridate::year(album_release_date)
)Song Duration by Explicit Content
This shows differences in track length between explicit and clean songs
Percentage of Explicit Tracks by Artist
This plot shows which artists in the dataset have the highest proportion of explicit tracks.
Average Track Duration VS Number of Artist
This highlights how collaborators relate to average song length.
Trend in Explicit Content Over Time
Conclusion
Using the Spotify Web API, I built a pipeline that:
collects data on Beyoncé’s tracks and her top collaborators
merges multiple API endpoints into one rich dataset
engineers features like number of artists per track, explicit labels, and release year
saves a reusable CSV file for analysis