What is the SpotifyR Package & What Can I Do With It?
SpotifyR is an R Wrapper Used for Pulling Information from Spotify’s Web API.
You’re Able to Pull Song & Playlist Information for a Given Spotify User, Retrieve Albums & Their Popularity, Retrieve Audio Features from Tracks, & So Much More :)
By Following This Guide, You Will Be Able to Download the Package, Set Up Your Account & Pull Data on Your Favorite Artist(s) or From Your Own Account.
Set-Up Directions
Helpful Packages
| Package | Summary |
|---|---|
| tidyverse | the tidyverse collection of packages |
| devtools | simplifies many common tasks |
| spotifyr | spotify’s API tool |
| kableExtra | helps build common complex tables & manipulate table styles |
| DT | create, filter & sort datatables |
| dplyr | grammar of data manipulation |
| ggplot2 | system for declaratively creating graphics |
Installing SpotifyR Package
There’s An Extra Step Involved After Downloading the “DevTools” Package:
devtools::install_github('charlie86/spotifyr')Then, You’re Able to Add SpotifyR To Your Library :)
Account Set-Up
- Go To https://developer.spotify.com/dashboard/login
- Create a Dev Account Which Will Allow You to Access Spotify’s API
- Navigate to Your Dashboard
- Create an App Which Will Generate a Client ID & Secret
- Transfer These Into R
Sys.setenv(SPOTIFY_CLIENT_ID = 'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX')
Sys.setenv(SPOTIFY_CLIENT_SECRET = 'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX')
access_token <- get_spotify_access_token()Authenticate Account
- Within Your Spotify Developer Account, Navigate to “Edit Settings”
- Under “Redirect URI’s”, Enter http://localhost:1410/
- Click “Add” & “Save”
Now You Can Begin to Pull Your Own Data from Spotify :)
Demonstrations of Different Features
Top Tracks & Artists
“Get_My_Top_Artists_Or_Tracks()” Allows You To Get Your Top Artists & Tracks Depending on the Criteria You Include.
I Saved My Friend’s Top 5 Recent & Long-Term Tracks & Artists:
short_term_tracks <- get_my_top_artists_or_tracks(type = 'tracks',
time_range = 'short_term',
limit = 5)
long_term_tracks <- get_my_top_artists_or_tracks(type = 'tracks',
time_range = 'long_term',
limit = 5)
short_term_artists <- get_my_top_artists_or_tracks(type = 'artists',
time_range = 'short_term',
limit = 5)
long_term_artists <- get_my_top_artists_or_tracks(type = 'artists',
time_range = 'long_term',
limit = 5)Retrieving Playlists
“Get_My_Playlists()” Fetches All Your Playlists. If You Want Details About the Tracks Within a Playlist, You Can Use “Get_Playlist_Tracks()” to Obtain the Tracks & Their Features.
Within the Parentheses, Input the Playlist’s URI (Example Below): spotify:playlist:XXXXXXXXXXXXXXXXXXXX
I Saved My Friend’s Playlists With the Most Tracks (Those She Spent the Most Time Creating):
all_playlists <- get_my_playlists()
indie <- get_playlist_tracks("XXXXXXXXXXXXXXXXXXXXX")
it_is_2021 <- get_playlist_tracks("XXXXXXXXXXXXXXXXXXXXX")
yeethaw <- get_playlist_tracks("XXXXXXXXXXXXXXXXXXXXX")
just_good_music <- get_playlist_tracks("XXXXXXXXXXXXXXXXXXXXX")Saving Audio Features
You Are Able to Retrieve Features of Tracks, Such as Their Tempo, Danceability, Valence & More :)
An Artist’s Audio Features
“Get_Artist_Audio_Features()” Retrieves Features of an Artist’s Songs.
After Determining My Friend’s Top Artists, I Saved The Artist’s Audio Features.
In Order to Compare & Analyze the Artists & Their Music, I Combined the Individual Artists’ Information Into Additional Data Frames:
All Seven Were Combined Into “Combine_All”.
Quinn XCII, LANY & Chelsea Cutler Were On Her Long- & Short-Term Lists, So I Combined Their Information Into “Top_Three_Artists”.
Then, I Also Combined The Short-Term & The Long-Term Artists Into Two Data Frames, “ST_Artists” & “LT_Artists”, Respectively.
quinn_xcii <- get_artist_audio_features('Quinn XCII')
lany <- get_artist_audio_features('LANY')
chelsea_cutler <- get_artist_audio_features('Chelsea Cutler')
t_swift <- get_artist_audio_features('Taylor Swift')
blackbear <- get_artist_audio_features('blackbear')
jeremy_zucker <- get_artist_audio_features('Jeremy Zucker')
band_camino <- get_artist_audio_features('The Band CAMINO')
combine_all <- rbind(quinn_xcii, lany, chelsea_cutler, t_swift,
blackbear, jeremy_zucker, band_camino)
top_three_artists <- rbind(quinn_xcii, lany, chelsea_cutler)
st_artists <- rbind(jeremy_zucker, band_camino, quinn_xcii, lany, chelsea_cutler)
lt_artists <- rbind(t_swift, blackbear, quinn_xcii, lany, chelsea_cutler)A Playlist’s Tracks’ Audio Features
“Get_Playlist_Audio_Features()” Retrieves the Features of the Songs Compiled On a Playlist.
Within the Parentases, You Have To Include the Playlist URI. This is the Same URI Used to Get the Playlist’s Tracks.
indie_features <- get_playlist_audio_features(playlist_uris = "XXXXXXXXXXXXXXXXXXXXX")
it_is_2021_features <- get_playlist_audio_features(playlist_uris = "XXXXXXXXXXXXXXXXXXXXX")
yeethaw_features <- get_playlist_audio_features(playlist_uris = "XXXXXXXXXXXXXXXXXXXXX")
jgm_features <- get_playlist_audio_features(playlist_uris = "XXXXXXXXXXXXXXXXXXXXX")
all_features <- rbind(indie_features, it_is_2021_features, yeethaw_features, jgm_features)Summary Statistics Table
Grouped By Artist
By Summarizing the Means of Different Features of the Artists’ Songs, We Are Able to Get a More Holistic View of the Artists to Compare Them With One Another.
combine_all %>%
group_by("Artist" = artist_name) %>%
summarize("Danceability" = round(mean(danceability), 2),
"Energy" = round(mean(energy), 2),
"Valence" = round(mean(valence), 2),
"Tempo" = round(mean(tempo), 2),
"Loudness" = round(mean(loudness), 2),
"Acousticness" = round(mean(acousticness), 2),
"Tempo" = round(mean(tempo), 2)) %>%
kbl() %>% kable_styling()| Artist | Danceability | Energy | Valence | Tempo | Loudness | Acousticness |
|---|---|---|---|---|---|---|
| blackbear | 0.70 | 0.57 | 0.53 | 115.58 | -7.07 | 0.19 |
| Chelsea Cutler | 0.63 | 0.40 | 0.37 | 106.94 | -9.43 | 0.39 |
| Jeremy Zucker | 0.61 | 0.33 | 0.46 | 113.29 | -11.65 | 0.62 |
| LANY | 0.62 | 0.55 | 0.41 | 118.71 | -8.68 | 0.30 |
| Quinn XCII | 0.72 | 0.54 | 0.53 | 115.04 | -7.36 | 0.27 |
| Taylor Swift | 0.60 | 0.57 | 0.43 | 118.63 | -7.82 | 0.32 |
| The Band CAMINO | 0.49 | 0.91 | 0.63 | 119.75 | -3.51 | 0.01 |
Danceability Describes How Suitable a Track is for Dancing Based on a Combination of Musical Elements Including Tempo, Rhythm Stability, Beat Strength & Overall Regularity. Interestingly Though, The Band CAMINO Has the Lowest Danceability, but the Highest Energy, Valence & Tempo.
These Artists Also Vary Quite A Bit In Their Acousticness & Loudness.
Grouped by Playlist
Taking a Deeper Look Into Her Four Most Prized Playlists & The Songs They Are Made Up Of, I Am Better Able to Compare the Vibes of the Playlists.
all_features %>%
group_by("Playlist" = playlist_name) %>%
summarize("Danceability" = round(mean(danceability), 2),
"Energy" = round(mean(energy), 2),
"Valence" = round(mean(valence), 2),
"Tempo" = round(mean(tempo), 2),
"Loudness" = round(mean(loudness), 2),
"Acousticness" = round(mean(acousticness), 2),
"Tempo" = round(mean(tempo), 2)) %>%
kbl() %>% kable_styling()| Playlist | Danceability | Energy | Valence | Tempo | Loudness | Acousticness |
|---|---|---|---|---|---|---|
| indie pop❕ | 0.63 | 0.55 | 0.44 | 116.01 | -7.90 | 0.30 |
| its 2021 b | 0.60 | 0.63 | 0.48 | 122.56 | -6.89 | 0.26 |
| just good music | 0.60 | 0.60 | 0.48 | 120.63 | -6.93 | 0.30 |
| yee(t)haw🤠 | 0.56 | 0.74 | 0.56 | 127.29 | -5.18 | 0.20 |
On Average, The Playlists Have Relatively Similar Features, So I Will Take a Look at the Range of Features of the Songs to Gain a Clearer Picture of the Different Vibes.
Visuals
Deeper Dive From Summary Tables
While the Tables Show a Holistic Overview of the Data, Creating Visuals Will Emphasize the Differences Between Features of the Tracks.
Energy Levels of Most Recently Listened to Artists’ Songs
There Was Quite a Range of The Artists’ Average Energy Levels, So I Wanted to Look at How the Intensity & Activity Compared Among The Artists She Has Listened to the Most Recently.
st_artists %>%
ggplot(aes(x = energy)) +
geom_histogram(bins = 20) +
facet_wrap(~ artist_name) +
xlab("Energy") +
ylab("Number of Songs")The Energy Levels of Chelsea Cutler & Jeremy Zucker’s Songs Are Much Lower Than Those of LANY & Quinn XCII, Whose Energy Levels Are Comparable. The Band CAMINO, Though, Hands Down Has the Highest Energy.
ggplot(st_artists, aes(energy, fill = artist_name)) +
geom_histogram(alpha = 0.5, position = 'identity', bins = 30) +
xlab("Energy") +
ylab("Number of Songs")Each Individual Artists’ Energy Levels Are Not As Easily Distinguishable On The Graph Above. However, The Darker Colors Show Where Many of the Artists Overlap, Possibly Indicating What Energy Levels She Vibes With the Most - Which Looks To Be Right in the Middle (Surrounding 0.50).
Tempos of Most Recently Listed to Artists’ Songs
Similar to The Energy Level Comparsion Above, Now I’m Analyzing The Tempo of These Artists’ Songs.
st_artists %>%
ggplot(aes(x = tempo)) +
geom_histogram(bins = 20) +
facet_wrap(~ artist_name) +
xlab("Tempo") +
ylab("Number of Songs")Similar to Their Energy Levels, Quinn XCII & Chelsea Cutler Appear to Have Comparable Tempos As Well. LANY, Though, Has More Tracks With a Faster Tempo.
Based on Her Most Listened to Artists, It Appears that My Friend Most Enjoys Music with a Moderate Tempo (Around 100-125).
Tempo of Most Prized Playlists
Do Those Tempo Preferences Transfer to her Playlists?
all_features %>%
ggplot(aes(x=tempo)) +
geom_histogram(bins = 30) +
facet_wrap(~ playlist_name) +
xlab("Tempo") +
ylab("Number of Songs")Her “Indie Pop”, “It’s 2021 B” & “Just Good Music” Playlists All Have a Similar Shape, With Their Peaks Around the 100 Mark. This Aligns With My Above Conclusion, That She Prefers Music With a Tempo Speed Around 100-125.
As For her “Yeethaw” Playlist, It Does Have a Spike Around 100, But The Higher Peak is Around 150. Thus, This Playlist Has More Songs With a Fast Tempo.
Preferred Key
top_three_artists[c(4, 5, 10, 11, 14, 52, 92), c(37, 38)]## key_name mode_name
## 4 A major
## 5 C# minor
## 10 F major
## 11 E major
## 14 G# major
## 52 C# minor
## 92 B major
Several of Her Most Listened to Tracks (Short- & Long-Term) Are Sung By Her Top Three Most Listened to Artists. Therefore, I Found the Keys of Those Songs (Above) to Compare Them to Her Most Listened to Artists’ Songs (Depicted Below).
st_artists %>%
group_by(key_name) %>%
ggplot(aes(x = key_name, fill = artist_name)) +
geom_bar() +
facet_wrap(~ mode_name) +
xlab("Key") +
ylab("Count")It Looks Like Those Artists Are Spread Across Many Keys, More in Major Than Minor Keys, Though. The Two Most Popular Keys Being C & D Major. It’s Interesting That Her Favorite Songs Are Not in the Two Most Popular Keys of Her Most Listened to Artists. But As We Will See Later, She Listens To a Wide Variety of Music.
To Look at Those Charts A Little Differently, The Charts Below Shows What Percentage The Artists Make Up of Each Key.
Cross-Examining the Artists & Songs She Listens to Most, It’s Interesting That Her Top Songs That Are Sung by Quinn XCII Are Not in the Keys That Are Most Unique to Him (Songs 4, 5, 10, 11, 14 & 52).
st_artists %>%
group_by(key_name) %>%
ggplot(aes(x = key_name, fill = artist_name)) +
geom_bar(position = "fill") +
facet_wrap(~ mode_name) +
xlab("Key") +
ylab("Percentage")Popularity Comparsions
Another Feature That We Can Analyze is a Track’s Popularity. Looking at Her Playlists Once Again, I Wonder Which One Has the Most Hits / Popular Songs?
ggplot(all_features, aes(track.popularity)) +
geom_histogram(bins = 30) +
facet_wrap(~ all_features$playlist_name) +
xlab("Track Popularity") +
ylab("Number of Songs")“Indie Pop”’s Peak is a Little Behind the Other Three Whose Spikes Are Between 50 & 75, Approximately. So, The Songs She Includes Are Relatively Popular, Although They All Have At Least a Few Songs Will Low Popularity.
ggplot(all_features, aes(track.popularity, fill = playlist_name)) +
geom_histogram(alpha = 0.5, position = 'identity', bins = 30) +
xlab("Track Popularity") +
ylab("Number of Songs")This Graph Concurs With My Conclusions From the Faceted Graphs, As The Darker Regions Are Between 40 & 75, Meaning That is Where There is A Lot of Overlap or a Lot of Songs in that Region.
Energy vs. Valence of Artists’ Songs (Short-Term)
Lastly, I Wanted to See How Dispersed Her Most Listened to Artists’ Songs Are & How They Would be Categorized Based on their Valence & Energy.
st_artists %>%
ggplot(aes(x = valence, y = energy, color = artist_name)) +
geom_jitter() +
geom_vline(xintercept = 0.5) +
geom_hline(yintercept = 0.5) +
scale_x_continuous(expand = c(0, 0), limits = c(0, 1)) +
scale_y_continuous(expand = c(0, 0), limits = c(0, 1)) +
annotate('text', 0.25 / 2, 0.95, label = "Turbulent/Angry", fontface = "bold") +
annotate('text', 1.75 / 2, 0.95, label = "Happy/Joyful", fontface = "bold") +
annotate('text', 1.75 / 2, 0.05, label = "Chill/Peaceful", fontface = "bold") +
annotate('text', 0.25 / 2, 0.05, label = "Sad/Depressing", fontface = "bold")As Mentioned Before, She Listens To Just About Everything. Looking at the Far Corners, She Listens to A Few Songs With Very Low Valence & Low Energy (Sad/Depressing), Which is the Most Extreme. Opposite of That Though, There Are Some Songs With High Energy & Relatively High Valence (Happy/Joyful). The Majority of the Tracks, Though, Are Clustered Around the Middle. Thus, She Like Songs With Average Valence & Energy - Nothing Too Extreme :)