Analysis my Spotify in R with spotifyr package

All the packages we need are:

library(spotifyr)
library(tidyverse)
library(knitr)

Before the start

Let R knows who you are! Just like the offical cilent, you should login before start working. However, this works will take some minutes more than login to Spotify cilent as usual, but don’t worry, I can guarantee it is easy case.

You should apply CLIENT_ID and CLIENT_SECRET from HERE. We don’t suppose to make any commercial activity right? The non-commercial application will be easier. After that, you may can get CLIENT_ID and CLIENT_SECRET. As easy as walking in a park, isn’t it?. Then, analysis works start!

Let’s set authorize staffs

All the authorize staffs contain two main parameter, as I said CLIENT_ID and CLIENT_SECRET, in spotifyr package, they are SPOTIFY_CLIENT_ID and SPOTIFY_CLIENT_SECRET. There are many wasy to set these two parameters, but here, I highly recommand you to save these two parameters to a .R file, and then source this file in other script files. Here, I name this file as setToekn.R, and the content looks like:

Sys.setenv(SPOTIFY_CLIENT_ID = 'xxxxxxxxxxxxx')
Sys.setenv(SPOTIFY_CLIENT_SECRET = 'xxxxxxxxxxxxx')

Once, we should set these two parameters, just run this code: source('Scripts/setToken.R')(Here, Scripts is the directory where I save my scripts).

So let’s try to authorize our R to access spotify!

get_spotify_authorization_code()

But I guess something wrong happened to you, some error like invalid CLIENT_ID here! What happend? Let’s check how Charlie86, the author of spotifyr, said:

For certain functions and applications, you’ll need to log in as a Spotify user. To do this, your Spotify Developer application needs to have a callback url. You can set this to whatever you want that will work with your application, but a good default option is http://localhost:1410/ (see image below). For more information on authorization, visit the offical Spotify Developer Guide.

The little issue has been solved, time to get all my loved tracks!

What’s my favorite?

There is no doubt, the recommendation algorithm is the most valuable staff for Spotify, and this is why we can’t leave this platform, even the Apple Music offer a more competitive price (in China, less than 1 dollar per month for student, where Spotify doesn’t provide service). So how can we found all ours favorite tracks and albums?

Easy thing! Just run get_my_saved_tracks, till current moment, this function is not available yet. But I have pull a request to spotifyr in Github. Before a merge happen install spotifyr from my own repo by devtools::install_github('womeimingzi11/spotifyr') will work. Let’s keep moving. Spotify sets a limit, you can get up to 50 items per request, and the default value is 20, not only for loved tracks, this is also exist almost every functions in spotfiyr. How to get more? offset, as the name says, set offset to jump items and get more. Here is the code.

all_my_fav_tracks <-
  # This is somehow tough to read, but I lOVE PIPELINE!
  # FIRST we send get_my_saved_tracks request, set include_meta_info to TRUE, will return the number of all tracks in total. After that, we request 50 items per time, therefore, the looptime should be 50 divide the length of tracks.
  # Not all of us is lucky man, so if the number of tracks can't be divided exactly, we make one more time offset.
  ceiling(get_my_saved_tracks(include_meta_info = TRUE)[['total']] / 50) %>%
  # Generate a sequence by looptimes.
  seq() %>%
  # PlZ remember, the offset is start from zero, so we minus 1 from the offset. And everytime, we jump 50 items, because we have already request 50 items every time.
  # Every loop, we get a data.frame with 50 rows, once the loop finished, we got a list with looptime piece data.frame, with reduce(rbind), we can merge all the data.frames as one.
  map(function(x) {
    get_my_saved_tracks(limit = 50, offset = (x - 1) * 50)
  }) %>% reduce(rbind) %>%
  # For saving time, we can save the data as rds, this is not required, but it can take things back, once we make some mistakes.
  write_rds('raw_all_my_fav_tracks.rds')
# Let's check the structure of our tracks.
glimpse(all_my_fav_tracks)

## Observations: 1,353
## Variables: 30
## $ added_at                           <chr> "2019-10-31T03:42:08Z", "201…
## $ track.artists                      <list> [<data.frame[1 x 6]>, <data…
## $ track.available_markets            <list> [<"AD", "AE", "AR", "AT", "…
## $ track.disc_number                  <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ track.duration_ms                  <int> 158899, 214240, 219200, 2392…
## $ track.explicit                     <lgl> FALSE, FALSE, FALSE, FALSE, …
## $ track.href                         <chr> "https://api.spotify.com/v1/…
## $ track.id                           <chr> "0mjAU3yKR1QnXnHtjGJqTM", "6…
## $ track.is_local                     <lgl> FALSE, FALSE, FALSE, FALSE, …
## $ track.name                         <chr> "Rescue Me", "Baby", "Shake …
## $ track.popularity                   <int> 88, 77, 67, 57, 94, 64, 70, …
## $ track.preview_url                  <chr> "https://p.scdn.co/mp3-previ…
## $ track.track_number                 <int> 1, 1, 12, 1, 1, 1, 3, 1, 5, …
## $ track.type                         <chr> "track", "track", "track", "…
## $ track.uri                          <chr> "spotify:track:0mjAU3yKR1QnX…
## $ track.album.album_type             <chr> "single", "album", "album", …
## $ track.album.artists                <list> [<data.frame[1 x 6]>, <data…
## $ track.album.available_markets      <list> [<"AD", "AE", "AR", "AT", "…
## $ track.album.href                   <chr> "https://api.spotify.com/v1/…
## $ track.album.id                     <chr> "57BhVBJbVPfIbwLANO5fSe", "3…
## $ track.album.images                 <list> [<data.frame[3 x 3]>, <data…
## $ track.album.name                   <chr> "Rescue Me", "My World 2.0",…
## $ track.album.release_date           <chr> "2019-05-17", "2010-01-01", …
## $ track.album.release_date_precision <chr> "day", "day", "day", "day", …
## $ track.album.total_tracks           <int> 1, 10, 26, 1, 1, 1, 11, 1, 1…
## $ track.album.type                   <chr> "album", "album", "album", "…
## $ track.album.uri                    <chr> "spotify:album:57BhVBJbVPfIb…
## $ track.album.external_urls.spotify  <chr> "https://open.spotify.com/al…
## $ track.external_ids.isrc            <chr> "USUM71907507", "USUM7091926…
## $ track.external_urls.spotify        <chr> "https://open.spotify.com/tr…

Which music is my first love?

Before further exploration, here is a question: What’s your first loved song? I bet you guy can not remember that! Luckily, Spotify remembers every thing! First, we use the lubridate package to handle the date, it can convert the date in character class into date class, and the most important thing is, lubridate is really easy to use. Here, the spotify record the date we loved a song with the format as Year-Mon-Day H-M-S, and lubridate exactly provide a functin ymd_hms to convert the date. For more help, try ?lubridate anytime.

library(lubridate)
all_my_fav_tracks %>%
  mutate(added_at = ymd_hms(added_at)) %>%
  arrange(added_at) %>%
  head(1, wt = added_at) %>%
  select(track.name,added_at)  %>%
  kable()

track.name	added_at
Eyes Open	2016-12-01 06:15:37

Okay, let’s back to our journey. First, let’s found out which artist is my favorite singer. I guess, it’s Taylor Swift! Once I am a freshman in University, I started to enjoy her music, and one decade almost past. No matterhow, let’s verify my hypothesis. First, Let’s get all the artist of my loved songs! Remember, the track.artists is stored as list, it’s easy to understand! Not every song is made by only one artist, is it? So we need to extract from lists in lists. Twice reduce operations are necessary here.

artist_from_fav_tracks <-
  all_my_fav_tracks %>%
  select(track.artists) %>%
  reduce(rbind) %>%
  reduce(rbind) %>%
  # I don't think we need Urls in further analyses, id (unique mark of artist) and name are selected here.
  select(id, name)

After above operation, I extracted all the artists from all my loved songs. Time to count their appear times! Although, the names of artists are well-know, but we select the id as the mark of artist in this case. WHY? To avoid dulicated name. Then, we add n, as tracks number column, back to artist data.frame, and omit the duplicated records. After that, we remove id column, because it’s really meanless for human eyes, at least, in my opinion.

track_num_artist <-
  artist_from_fav_tracks %>%
  count(id, sort = TRUE) %>%
  left_join(artist_from_fav_tracks, by = 'id',.) %>%
  unique() %>%
  select(-id) %>%
  top_n(20, n)

track_num_artist  %>%
  kable()

name	n
OneRepublic	18
Taylor Swift	166
Camila Cabello	25
Sia	15
Charlie Puth	22
S.H.E	52
The Chainsmokers	32
Ed Sheeran	29
Kesha	34
Maroon 5	55
Avril Lavigne	60
James Blunt	21
Daya	15
Kelly Clarkson	16
Kygo	16
Ellie Goulding	46
The Weeknd	29
Britney Spears	36
Katy Perry	20
Shakira	23

Bingo! Taylor is definitely my favorite artist. Following are Avril and Maroon 5, which are exactly in line with my prediction! However, tables are weak, graphics are stronger! Let’s check how the column plot looks like! BTW, ggplot2 package has been loaded with tidyverse packages bundle, we don’t need to load (ggplot2) again.

# For numerical variables, sometimes for simplifying problems, cut them into fractions is a good idea. Here, we go further, we fill the column plot with different color to represent different frequency group.
track_num_artist %>%
  mutate(
    freq = case_when(
      n > 100 ~ 'More than 100 tracks',
      between(n, 50, 99) ~ '50~99 tracks',
      between(n, 20, 49) ~ '20~49 tracks',
      TRUE ~ 'Less than 20 tracks'
    )
  ) %>%
  # To avoid mess up the order of frequency group, I always suggest to convert the category variables as factor variables, with built-in order, levels.
  mutate(freq = factor(
    freq,
    levels = c(
      'More than 100 tracks',
      '50~99 tracks',
      '20~49 tracks',
      'Less than 20 tracks'
    )
  )) %>%
  ggplot(mapping = aes(
    x = reorder(name, -n),
    y = n,
    fill = freq
  )) +
  geom_col() +
  labs(fill = NULL,title = 'Who is My Favorite Artist',caption = 'data from spotify via spotiyr') +
  xlab('Artist') +
  ylab('Tracks Number') +
  theme_classic() +
  theme(axis.text.x = element_text(angle = -60),
        axis.title = element_text(face = 'bold'),
        plot.title = element_text(hjust = 0.5, face = 'bold', size = 15),
        plot.caption = element_text(hjust = 1,face = 'bold.italic'))

There is no doubt that I can explore more, for instance, we can consider how my taste changes along the time gradient. But in case, I consider every thing in the global dimension.

Do you agree with what you listen reveals how your feel? Spoify has defined an index: valence. It is considered as a measure of musical positivity. Here is the definition.

A measure from 0.0 to 1.0 describing the musical positiveness conveyed by a track. Tracks with high valence sound more positive (e.g. happy, cheerful, euphoric), while tracks with low valence sound more negative (e.g. sad, depressed, angry).

It looks cool, but if it is really meaningful for everyone, at least for me? I will find it out. What about my first 20 favorite artist? Although it’s ONLY 20 artist (actully it’s 22, becasue some artist has the same number of musics I loved), it still cost minutes to request data from server of Spotify. I highly recommend you to save the object as RDS file. And don’t request data everytime! In this plot, the emotional quadrants are consistent with Sentify.

if(!file.exists('audio_features.rds')){
  track_num_artist$name %>%
    map(function(x){
      get_artist_audio_features(x)
      }) %>%
    reduce(rbind) %>%
    inner_join(all_my_fav_tracks,
               by = c('track_id' = 'track.id')) %>%
    write_rds('audio_features.rds')
}

audio_features <- read_rds('audio_features.rds')

ggplot(data = audio_features, aes(x = valence, y = energy, color = artist_name)) +
  geom_jitter() +
  geom_vline(xintercept = 0.5) +
  geom_hline(yintercept = 0.5) +
  scale_x_continuous(expand = c(0, 0), limits = c(0, 1)) +
  scale_y_continuous(expand = c(0, 0), limits = c(0, 1)) +
  annotate('text', 0.25 / 2, 0.95, label = "Turbulent/Angry", fontface =
             "bold") +
  annotate('text', 1.75 / 2, 0.95, label = "Happy/Joyful", fontface = "bold") +
  annotate('text', 1.75 / 2, 0.05, label = "Chill/Peaceful", fontface =
             "bold") +
  annotate('text', 0.25 / 2, 0.05, label = "Sad/Depressing", fontface =
             "bold")

Analysis my Spotify in R with spotifyr package

Han Chen

2019-10-31

Before the start

Let’s set authorize staffs

What’s my favorite?

Which music is my first love?