The data comes from Spotify via the spotifyr package. Charlie Thompson, Josiah Parry, Donal Phipps, and Tom Wolff authored this package to make it easier to get either your own data or general metadata arounds songs from Spotify’s API. Make sure to check out the spotifyr package website to see how you can collect your own data!
There are 12 audio features for each track - acousticness, liveness, speechiness, instrumentalness, energy, loudness, danceability, valence (positiveness), duration, tempo, key, and mode.
options(warn = -1)
library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v ggplot2 3.3.5 v purrr 0.3.4
## v tibble 3.1.4 v dplyr 1.0.7
## v tidyr 1.1.3 v stringr 1.4.0
## v readr 2.0.1 v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(DT)
library(dplyr)
library(readr)
#loading the data set
spotify_data <- read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-01-21/spotify_songs.csv')
## Rows: 32833 Columns: 23
## -- Column specification --------------------------------------------------------
## Delimiter: ","
## chr (10): track_id, track_name, track_artist, track_album_id, track_album_na...
## dbl (13): track_popularity, danceability, energy, key, loudness, mode, speec...
##
## i Use `spec()` to retrieve the full column specification for this data.
## i Specify the column types or set `show_col_types = FALSE` to quiet this message.
#finding the top 10 popular songs on spotify
spotify_data%>%
select(track_album_id, track_name, track_popularity, danceability, loudness, energy, tempo)%>%
filter(!duplicated(track_album_id))%>%
mutate(pop_rank= min_rank(desc(track_popularity)))%>%
arrange(pop_rank)%>%
top_n(-10)
## Selecting by pop_rank
## # A tibble: 10 x 8
## track_album_id track_name track_popularity danceability loudness energy tempo
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 0UywfDKYlyiu1~ Dance Mon~ 100 0.824 -6.4 0.588 98.0
## 2 6HJDrXs0hpeba~ ROXANNE 99 0.621 -5.62 0.601 117.
## 3 7mKevNHhVnZER~ Tusa 98 0.803 -3.28 0.715 101.
## 4 3nR9B40hYLKLc~ Memories 98 0.764 -7.21 0.32 91.0
## 5 2ZfHkwHuoAZrl~ Blinding ~ 98 0.513 -4.08 0.796 171.
## 6 4g1ZRSobMefqF~ Circles 98 0.695 -3.50 0.762 120.
## 7 52u4anZbHd6UI~ The Box 98 0.896 -6.69 0.586 117.
## 8 4i3rAwPw7Ln2Y~ everythin~ 97 0.704 -14.5 0.225 120.
## 9 0ix3XtPV1LwmZ~ Don't Sta~ 97 0.794 -4.52 0.793 124.
## 10 1Czfd5tEby3Db~ Falling 97 0.784 -8.76 0.43 127.
## # ... with 1 more variable: pop_rank <int>
From the above result, we can see that all the top 10 popular songs in Spotify is high on danceability and tempo.
#plotting the graph for danceability vs track_popularity
ggplot(data=spotify_data) +
geom_point(aes(x = danceability,
y = track_popularity),
alpha = .10)+
geom_density(aes(x= danceability, color = playlist_genre))+
geom_smooth(aes(x = danceability,
y = track_popularity,
color = playlist_genre))+
scale_y_continuous(name = "Track Popularity") +
scale_x_continuous(name = "Danceability", labels = scales::percent) +
ggtitle("Spotify data",
subtitle = "Danceability vs. Track Popularity over genre")
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
#plotting the graph for tempo vs track_popularity
ggplot(data=spotify_data) +
geom_point(aes(x = tempo,
y = track_popularity),
alpha = .10)+
geom_density(aes(x= tempo, color = playlist_genre))+
geom_smooth(aes(x = tempo,
y = track_popularity,
color = playlist_genre))+
scale_y_continuous(name = "Track Popularity") +
scale_x_continuous(name = "Tempo", labels = scales::percent) +
ggtitle("Spotify data",
subtitle = "Tempo vs. Track Popularity over genre")
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
From the above graph we can conclude that for most of the songs that have high popularity, they have tempo and danceablity above 50%. The density lines and scatter plots indicates that songs in the spotify dataset tend to have higher danceability and tempo across all genres.