This script was created as an experiment to see what the spotifyr package (an application of the Spotify API) could extract from a given Spotify URI. By using term-document matrices, a data frame containing the specific genres associated with the artists, alongside their frequencies, in a user’s playlist can be created and plotted with, for example, ggplot2.

Required libraries The following libraries are needed to run the script:

For transparency purposes:

Getting Spotify API ready

Before doing anything with the Spotify API (spotifyr), certain values have to be set. All spotifyr function require a developer-bound client ID and client secret, both of which are generated before use.

The access token is reset every time the script is ran.

client_id <- '4087d17626f64e4fa787e14a9a42188f'
client_secret <- '48004c6456cc44e79ebd271bccf4d917'
Sys.setenv(SPOTIFY_CLIENT_ID = client_id)
Sys.setenv(SPOTIFY_CLIENT_SECRET = client_secret)
access_token <- get_spotify_access_token()

Getting playlist and extracting genres

Fetching playlist

To fetch the playlist, the get_playlist function is used, after which the playlist data is restructered and re-ordered.

Note: get_playlist only allows for the fetching of the first 100 tracks in a playlist, the rest is dropped. Currently, there is no known fix for this.

input_uri <- "spotify:playlist:64CXllz4upFWdouyixlZfd"
input_pl <- str_remove(as.character(input_uri), "spotify:playlist:")

pl <- get_playlist(input_pl)
pl_list <- pl$tracks$items[, c('track.artists', 'track.id', 'track.name', 'track.album.name')]

# Removing nested data frame with artist information
unlisted_artists <- pl_list$track.artists

artist_names <- sapply(unlisted_artists, '[[', "name")
artist_id <- sapply(unlisted_artists, '[[', "id")

pl_list$artist.names <- artist_names
pl_list$artist.id <- artist_id

# Re-ordering and renaming columns
pl_list <-
  pl_list[, c('artist.names',
              'track.name',
              'track.album.name',
              'track.id',
              'artist.id')]

colnames(pl_list) <-
  c('artist', 'track', 'album', 'track_id', 'artist_id')

The resulting data frame looks like this:

paged_table(pl_list)

Fetching genres

Every artist in Spotify has one or more genres associated with them. Thus, genres are not bound to individual tracks, but to the artists performing the song. This means that if an artist has three genres associated with them, every track released by that artist on the Spotify platform will have those three genres nested in the track information.

To extract these genres from their nested lists, the get_artists function alongside the unlist function is used. Because get_artists doesn’t allow for more than 50 requests at a time, but the fetched playlist is never longer than 100 tracks, the requests have to be split.

artist_id_list <- unlist(artist_id)
total_artist_list <-
  rbind(get_artists(artist_id_list[1:50]), # get_artists doesn't allow for more than 50 requests at a time
        get_artists(artist_id_list[51:100]))
genre_list <- unlist(total_artist_list$genres)

Transforming genres into displayable data frame

The data frame is transformed from the fetched form (for example, uk_indie_rock, lowercase with underscores) to a more aesthetically pleasing format (for example, UK Indie Rock).

To achieve this, the tm (Text Mining) package is used. The tm package has methods for corpus handling[1] and the creation of term-document matrices[2].

underScore <- function(x, pattern) {
  sub(pattern = pattern, replacement = "_", x = x)
}

genre_list_df <- data.frame(doc_id = seq(genre_list), text = genre_list, stringsAsFactors = FALSE)
genre_corpus <- Corpus(DataframeSource(genre_list_df))
genre_tm_map <- tm_map(genre_corpus, underScore, " ")
genre_tdm <- TermDocumentMatrix(genre_tm_map)
genre_tdm_matrix <- as.matrix(genre_tdm)
genre_tdm_sorted <- sort(rowSums(genre_tdm_matrix), decreasing = TRUE)

final_genres <- data.frame(word = names(genre_tdm_sorted), freq = genre_tdm_sorted)
final_genres$perc <- round(final_genres$freq/sum(final_genres$freq)*100, digits = 1)

for (val in final_genres$word) {
  if ('_' %in% final_genres$word == FALSE) {
    final_genres$word <- sub("_", " ", final_genres$word)
  }
}

final_genres$word <- str_to_title(final_genres$word)

for (val in final_genres$word) {if ('Uk' %in% final_genres$word == FALSE) {final_genres$word <- sub("Uk", "UK", final_genres$word)}}

for (val in final_genres$word) {if ('Us' %in% final_genres$word == FALSE) { final_genres$word <- sub("Us", "US", final_genres$word)}}

1) In linguistics, a corpus is a large, structured set of texts.

2) A document-term matrix describes a frequency of terms in documents, in this case, the list of genres.

The resulting list of genres looks like this:

paged_table(final_genres)

Plotting genres

Below is the resulting plot, displaying every genre that occurs within the playlist and the amount of times (frequency) a track with that specific genre can be found in the playlist.

ggplot(final_genres, aes(
  x = reorder(word, freq),
  y = freq,
  fill = freq
)) +
  scale_fill_gradient(name = "Frequency",
                      low = "darkgreen",
                      high = "steelblue") +
  geom_bar(stat = "identity", width = .8) +
  geom_text(aes(label = freq),
            nudge_y = 2.5,
            size = 3.5) +
  labs(title = "Genres in playlist",
       x = "Genre",
       y = "Frequency of genre within playlist") +
  coord_flip()