Using the Spotify API in R

Author

Oma Ugwu

library(spotifyr)
library(dplyr)
library(tidyr)
library(ggplot2)
library(purrr)
library(lubridate)
library(tibble)
library(readr)  # first chunk

Introduction

In this tutorial I demonstrate how to use the Spotify Web API from R with the help of the spotifyr package. Spotify’s API lets developers programmatically access information about artists, albums, and tracks, including metadata such as release date, duration, and whether a song is marked as explicit.

For this assignment I focus on Beyoncé and her top collaborators. Using the API, I:

  • pull Beyoncé’s albums and tracks

  • identify artists who have collaborated with her most often

  • collect those collaborators’ own catalogues

  • build a combined dataset and make visualizations about track duration, explicit content, number of artists on a track, and release dates.

This type of workflow would be useful for anyone interested in music analytics, playlist curation, or building recommendation systems.

Setting Up API

To access Spotify’s API:

  1. Visit the Spotify Developer Dashboard.

  2. Log in with your Spotify account.

  3. Click Create App.

  4. Spotify will give you:

    • A Client ID

    • A Client Secret

    This act like your username/password for the API.

    Sys.setenv(SPOTIFY_CLIENT_ID = '57fb7ff22df6403ea4685cecd11125e2')
    Sys.setenv(SPOTIFY_CLIENT_SECRET = 'e3e39b535a374f67aefd3bc92f0c43f9')
    access_token <- get_spotify_access_token()
  • get_artist_albums() sends a GET request to spotify.

  • The API returns a list of albums, which is converted into a tibble.

  • distinct(id) removes duplicates.

API Call Demonstration

Getting Tracks from Each Album

  • map_df() Loops through every Album ID.

  • get_album_tracks() retrieves all tracks from an album.

  • Errors are caught using possibly()

  • Data is combined into one dataframe.

Identifying Beyoncé’s Top Collaborators

This gives me the top 10 artists who have collaborated with Beyonce the most.

Album and Tracks for Collaborators

Combine Beyonce and collaborator Tracks

combining info on Beyonce tracks and info on collaborators tracks with album information, identifying the main artist and then binding all tracks into a single table

Loading Saved Data

  • This data was generated by my API script and contains

    • Track names

    • Artist names

    • Album metadata

    • Duration

    • Explicit label

    • Number of artists

    • Collaboration count

bey_data <- read_csv("beyonce_collab_data.csv")
bey_data
# A tibble: 1,548 × 11
   track_id        track_name album_id album_name album_release_date duration_ms
   <chr>           <chr>      <chr>    <chr>      <date>                   <dbl>
 1 6K2Iut8nSQigP8… AMERIICAN… 6BzxX6z… COWBOY CA… 2024-03-29              325037
 2 7eEr7lgWYudwEK… BLACKBIIRD 6BzxX6z… COWBOY CA… 2024-03-29              131949
 3 6XXxKsu3RJeN3Z… 16 CARRIA… 6BzxX6z… COWBOY CA… 2024-03-29              227250
 4 4dsdSwSdBWjlsV… PROTECTOR  6BzxX6z… COWBOY CA… 2024-03-29              184399
 5 486YUqcmF9IWPk… MY ROSE    6BzxX6z… COWBOY CA… 2024-03-29               53442
 6 2dv2DYn0V0j0Wx… SMOKE HOU… 6BzxX6z… COWBOY CA… 2024-03-29               50721
 7 7wLShogStyDeZv… TEXAS HOL… 6BzxX6z… COWBOY CA… 2024-03-29              233456
 8 6Y4rniIxibegzs… BODYGUARD  6BzxX6z… COWBOY CA… 2024-03-29              240254
 9 1fp4DIjhHyptQW… DOLLY P    6BzxX6z… COWBOY CA… 2024-03-29               22623
10 2PmMh2t7jAtN6c… JOLENE     6BzxX6z… COWBOY CA… 2024-03-29              189638
# ℹ 1,538 more rows
# ℹ 5 more variables: explicit <lgl>, main_artist_id <chr>,
#   main_artist_name <chr>, n_collabs <dbl>, num_artists <dbl>

Analyzing the Spotify Data

  • Converts duration from milliseconds to minutes

  • Creates explicit/clean labels

  • Parses dates

  • Extracts release year

plot_df <- bey_data %>%
  mutate(
    duration_min   = duration_ms / 60000,
    explicit_label = if_else(explicit, "Explicit", "Clean"),
    album_release_date = lubridate::ymd(album_release_date, quiet = TRUE),
    release_year  = lubridate::year(album_release_date)
  )

Song Duration by Explicit Content

This shows differences in track length between explicit and clean songs

Percentage of Explicit Tracks by Artist

This plot shows which artists in the dataset have the highest proportion of explicit tracks.

Average Track Duration VS Number of Artist

This highlights how collaborators relate to average song length.

Trend in Explicit Content Over Time

Conclusion

Using the Spotify Web API, I built a pipeline that:

  • collects data on Beyoncé’s tracks and her top collaborators

  • merges multiple API endpoints into one rich dataset

  • engineers features like number of artists per track, explicit labels, and release year

  • saves a reusable CSV file for analysis