Required libraries

The required libraries for this script are:

tidyverse
spotifyr
psych
stats
data.table
dplyr
ggplot2
SnowballC
tm
wordcloud

Getting Spotify API ready

client_id <- '4087d17626f64e4fa787e14a9a42188f'
client_secret <- 'b3f482a2120c42cdbb898f66206bab9d'
Sys.setenv(SPOTIFY_CLIENT_ID = client_id)
Sys.setenv(SPOTIFY_CLIENT_SECRET = client_secret)
access_token <- get_spotify_access_token()

Getting Current Rotation playlist data

After getting playlist data, putting track data into jcr_items variable.

pl <- get_playlist('4sVkLBTSB5dwjBG5GWoSTw')

tr <- pl$tracks

tr_items <- tr$items

Removing unwanted information from jcr_items variable

pl_list <- tr_items[, c('track.artists', 'track.id', 'track.name', 'track.album.name')]

Removing nested data frame with artist information

Removing data frame track.artists and adding the artist information as a column to jcr_list

i <- pl_list$track.artists

artist_names <- sapply(i, '[[', "name")
artist_id <- sapply(i, '[[', "id")

artist_names <- artist_names[-53] # Removing line 53, which has value NULL
artist_id <- artist_id[-53]

pl_list$artist.names <- artist_names
pl_list$artist.id <- artist_id

Re-ordering and renaming columns

pl_list <- pl_list[, c('artist.names', 'track.name', 'track.album.name', 'track.id', 'artist.id')]

colnames(pl_list) <- c('artist', 'track', 'album', 'track_id', 'artist_id')

Viewing finished table

pl_list

Getting genres

Using the get_artists function from spotifyr, a customisable data frame of artists and their respective characteristics is generated. Using this data frame, genres are placed in a variable called genre_list.

artist_id_list <- unlist(artist_id)
total_artist_list <- rbind(get_artists(artist_id_list[1:50]), # get_artists doesn't allow for more than 50 requests at a time
get_artists(artist_id_list[51:55]))

genre_list <- unlist(total_artist_list$genres)

Transforming the list of genres

Before any visualising can be done using the list of genres, they have to be transformed to a term-document matrix, which contains the frequency of each genre. Because genres are stored with spaces when fetched from the Spotify API using spotifyr, spaces have to be replaced with an underscore (’_’) first.

docs <- Corpus(VectorSource(genre_list)) # Setting list of genres as a Corpus

SpaceToUnderscore <- content_transformer(function (x , pattern ) gsub(pattern, "_", x))

docs <- tm_map(docs, SpaceToUnderscore, " ")

## Warning in tm_map.SimpleCorpus(docs, SpaceToUnderscore, " "):
## transformation drops documents

g <- TermDocumentMatrix(docs)

Setting up final data frame with genres

Using the generated term-document matrix, a new data frame is created with every specific genre in ‘underscore form’ (e.g. modern_alternative_rock) along with the frequency of each genre.

m <- as.matrix(g)
v <- sort(rowSums(m),decreasing=TRUE)
d <- data.frame(word = names(v),freq=v)

Plots

Using the variables generated before, various plots can be created.

Bar plot of genres

ggplot(d, aes(x=reorder(word, freq), y=freq, fill = freq)) + 
  geom_bar(stat = "identity", width = .8) +
  labs(title = "Genres in playlist", x = "Frequency", y = "Genre") +
  coord_flip()

Jesse’s Spotify Genre Exploration