Dive Deeper into your Favorite Music!

Ever wanted to learn about the songs you listen to on a daily basis? Wonder if there are similarities that create your unique taste? The spotifyr package allows you to pull all sorts of data about songs, artists, or playlists!

In this guide, you will learn how to download the package, set up your account, and pull data on your favorite artist, or from your own account.

Install spotifyr package

Installing the spotifyr package requires the devtools package. Once loaded, use the install function to install the development version of the spotifyr package, as shown below.

library(devtools)
devtools::install_github('charlie86/spotifyr')

Account Setup

First, go to https://developer.spotify.com/dashboard/login and create a dev account that will allow you to access Spotify’s API. Next, navigate to your dashboard and create an app. This will generate a client ID and secret. Transfer these into R as shown here:

Sys.setenv(SPOTIFY_CLIENT_ID = 'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx')
Sys.setenv(SPOTIFY_CLIENT_SECRET = 'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx')

access_token <- get_spotify_access_token()

At this point, we can begin pulling information about publicly available songs, artists, playlists, and more!

Using Artist Data to Compare Styles

get_artist_audio_features is one of the first commands to explore in spotifyr. It retrieves all songs attributable to a given artist. For this example, I will be using the tool to look at two of my favorite artists, Jon Bellion and Khalid, and performing some comparisons of their data.

First, to pull the data for each artist:

bellion <- get_artist_audio_features('jon bellion')
khalid <- get_artist_audio_features('khalid')

The resulting data sets were then combined to do some analysis.

combined %>% 
  group_by("Artist"=artist_name) %>% 
  summarize("Danceability" = round(mean(danceability),3),
            "Energy"= round(mean(energy),3),
            "Valence" = round(mean(valence),3),
            "Tempo" = round(mean(tempo),3)) %>% 
  kbl() %>% kable_styling()
Artist Danceability Energy Valence Tempo
Jon Bellion 0.600 0.575 0.495 116.360
Khalid 0.617 0.491 0.371 107.623

Unsurprisingly, Jon Bellion rates noticeably higher than Khalid in Energy, Valence (a measure of the happiness of a song’s sound), and Tempo, as Khalid has a notoriously chill vibe. I was surprised that Khalid ranks higher in Danceability, as some Bellion songs have insane rhythm (check out Guillotine if you don’t believe me).

Since I was curious about danceability between the two artists, I created histograms showing the danceability of each artist’s songs.

combined %>% 
  ggplot(aes(x=danceability)) + geom_histogram(bins=20) + facet_wrap(~ artist_name) + 
  xlab("Danceability") + ylab("Number of Songs")

This output shows that while Bellion has tons of songs clustered around 0.6-0.7, his most common ratings are between 0.5 and 0.6, while Khalid’s most common ratings are above 0.7, leading to the higher average score. Bellion also has a fairly large number of songs scoring below 0.4, which drags on his average.

Learn More about Yourself (and your Music)

Now that we have examined some of our favorite artists, we will look at our own profiles and playlists. To do so, you must set up your Spotify profile to link to R.

First, within your Spotify developer account, navigate to “Edit Settiings” and under “Redirect URI’s”, enter http://localhost:1410/. You can then enter this same URI in R to connect your account, allowing the system to authenticate your account.

After doing so, you can begin to pull your own data from Spotify. I’m going to look at some of my own playlists:

my_playlists <- get_my_playlists()

From these, I will choose the ID from one playlist to look a little bit deeper at that one. This playlist is a collection of all my current favorite songs.

playlist <- get_playlist_tracks("3ZOsElieRC28V0lF3VEJMh")

First, let’s take a look at how popular my favorite songs are, based on the popularity metric, which rates songs on a scale from 0 to 100.

playlist %>% 
  ggplot(aes(x=track.popularity)) + geom_histogram() + 
  scale_y_continuous(breaks=1:10) + xlab("Popularity") + ylab("Number of Songs")

Most of my songs tend to be reasonably popular, but definitely not top-40 hits. The low end is populated by a few smaller artists, such as Rhetorik and Travis Mendes.

Finally, let’s take a look at which albums have the most representation on my primary playlist.

playlist %>% 
  group_by("Album Title" = track.album.name) %>% 
  summarize(Appearances=n()) %>% 
  arrange(desc(Appearances)) %>% 
  slice_max(Appearances, n=6) %>% 
  kbl() %>% kable_styling()
Album Title Appearances
No Pressure 8
Music To Be Murdered By - Side B (Deluxe Edition) 6
The Definition 6
Glory Sound Prep 5
The Marshall Mathers LP2 (Deluxe) 5
The Human Condition 4
The Search 4

Unsurprisingly, lots of appearances from Logic, Eminem, and Jon Bellion on this list.

Hopefully this guide gives you a taste of how to utilize the spotifyr function to learn lots of cool information about your own music tastes! Enjoy!