Overview

While I was browsing through Kaggle, I came across a dataset that interested me called “Spotify Stats for 2023.” As someone who loves music, I thought it would be interesting to compare statistics compiled by Spotify and see if any overlapped with my music preferences.

# load packages
library(readr)
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.2     ✔ purrr     1.0.2
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.3     ✔ tibble    3.2.1
## ✔ lubridate 1.9.2     ✔ tidyr     1.3.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(dplyr)
library(ggplot2)

# read data
spotify <- read.csv("/Users/victorzheng/Documents/NYU/R/spotify_2023.csv")

# show column names
print(names(spotify))
##  [1] "track_name"           "artist.s._name"       "artist_count"        
##  [4] "released_year"        "released_month"       "released_day"        
##  [7] "in_spotify_playlists" "in_spotify_charts"    "streams"             
## [10] "in_apple_playlists"   "in_apple_charts"      "in_deezer_playlists" 
## [13] "in_deezer_charts"     "in_shazam_charts"     "bpm"                 
## [16] "key"                  "mode"                 "dance"               
## [19] "valence_."            "energy"               "acousticness_."      
## [22] "instrumentalness_."   "liveness_."           "wordiness"

Interesting Points

Below is a summary of the data as well as a few superlatives of interest.

# summarize data
summary(spotify)
##   track_name        artist.s._name      artist_count   released_year 
##  Length:953         Length:953         Min.   :1.000   Min.   :1930  
##  Class :character   Class :character   1st Qu.:1.000   1st Qu.:2020  
##  Mode  :character   Mode  :character   Median :1.000   Median :2022  
##                                        Mean   :1.556   Mean   :2018  
##                                        3rd Qu.:2.000   3rd Qu.:2022  
##                                        Max.   :8.000   Max.   :2023  
##  released_month    released_day   in_spotify_playlists in_spotify_charts
##  Min.   : 1.000   Min.   : 1.00   Min.   :   31        Min.   :  0.00   
##  1st Qu.: 3.000   1st Qu.: 6.00   1st Qu.:  875        1st Qu.:  0.00   
##  Median : 6.000   Median :13.00   Median : 2224        Median :  3.00   
##  Mean   : 6.034   Mean   :13.93   Mean   : 5200        Mean   : 12.01   
##  3rd Qu.: 9.000   3rd Qu.:22.00   3rd Qu.: 5542        3rd Qu.: 16.00   
##  Max.   :12.000   Max.   :31.00   Max.   :52898        Max.   :147.00   
##     streams          in_apple_playlists in_apple_charts  in_deezer_playlists
##  Min.   :2.762e+03   Min.   :  0.00     Min.   :  0.00   Length:953         
##  1st Qu.:1.414e+08   1st Qu.: 13.00     1st Qu.:  7.00   Class :character   
##  Median :2.902e+08   Median : 34.00     Median : 38.00   Mode  :character   
##  Mean   :5.136e+08   Mean   : 67.81     Mean   : 51.91                      
##  3rd Qu.:6.738e+08   3rd Qu.: 88.00     3rd Qu.: 87.00                      
##  Max.   :3.704e+09   Max.   :672.00     Max.   :275.00                      
##  in_deezer_charts in_shazam_charts        bpm            key           
##  Min.   : 0.000   Length:953         Min.   : 65.0   Length:953        
##  1st Qu.: 0.000   Class :character   1st Qu.:100.0   Class :character  
##  Median : 0.000   Mode  :character   Median :121.0   Mode  :character  
##  Mean   : 2.666                      Mean   :122.5                     
##  3rd Qu.: 2.000                      3rd Qu.:140.0                     
##  Max.   :58.000                      Max.   :206.0                     
##      mode               dance         valence_.         energy     
##  Length:953         Min.   :23.00   Min.   : 4.00   Min.   : 9.00  
##  Class :character   1st Qu.:57.00   1st Qu.:32.00   1st Qu.:53.00  
##  Mode  :character   Median :69.00   Median :51.00   Median :66.00  
##                     Mean   :66.97   Mean   :51.43   Mean   :64.28  
##                     3rd Qu.:78.00   3rd Qu.:70.00   3rd Qu.:77.00  
##                     Max.   :96.00   Max.   :97.00   Max.   :97.00  
##  acousticness_.  instrumentalness_.   liveness_.      wordiness    
##  Min.   : 0.00   Min.   : 0.000     Min.   : 3.00   Min.   : 2.00  
##  1st Qu.: 6.00   1st Qu.: 0.000     1st Qu.:10.00   1st Qu.: 4.00  
##  Median :18.00   Median : 0.000     Median :12.00   Median : 6.00  
##  Mean   :27.06   Mean   : 1.581     Mean   :18.21   Mean   :10.13  
##  3rd Qu.:43.00   3rd Qu.: 0.000     3rd Qu.:24.00   3rd Qu.:11.00  
##  Max.   :97.00   Max.   :91.000     Max.   :97.00   Max.   :64.00
# Track added to the most Spotify playlists in 2023
in_most_playlists = spotify$track_name[spotify$in_spotify_playlists==max(spotify$in_spotify_playlists)]
print(in_most_playlists)
## [1] "Get Lucky - Radio Edit"
# Track with the most Spotify streams in 2023
most_streams = spotify$track_name[spotify$streams==max(spotify$streams)]
print(most_streams)
## [1] "Blinding Lights"
# Track with the highest danceability in Spotify's Top Songs of 2023
highest_danceability = spotify$track_name[spotify$dance==max(spotify$dance)]
print(highest_danceability)
## [1] "Peru"
# Track with the highest beats per minute (bpm) in Spotify's Top Songs of 2023
highest_bpm = spotify$track_name[spotify$bpm==max(spotify$bpm)]
print(highest_bpm)
## [1] "We Don't Talk About Bruno" "Lover"
# Track with the lowest beats per minute (bpm) in Spotify's Top Songs of 2023
lowest_bpm = spotify$track_name[spotify$bpm==min(spotify$bpm)]
print(lowest_bpm)
## [1] "Love Language"     "Happier Than Ever"
# Oldest track in Spotify's Top Songs of 2023
oldest_song = spotify$track_name[spotify$released_year==min(spotify$released_year)]
print(oldest_song)
## [1] "Agudo"

Is Song Danceability a Driving Factor?

Danceability is calculated using a combination of musical elements including the strength of beat, stability of rhythm, tempo, and overall regularity. Have you ever wondered if there was any correlation between a song’s danceability rating and the number of times it gets streamed? The data suggests no.

ggplot(data = spotify, aes(x = dance, y = streams)) +
  geom_point() +
  scale_x_continuous(breaks = seq(0, 100, by = 10))

While song danceability may determine whether or not your music gets played in a lively environment, it is not an end all be all.

Is Beats Per Minute (BPM) a Factor to Consider?

Looking into a song’s BPM and the number of times it gets streamed also yields similar results. However, it does appear that most listeners prefer songs within a certain BPM range. The data shows a preference for songs between 90 BPM and 120 BPM. If you are an aspiring artist hoping to break into the industry, I’d recommend writing a song within that range ;)

ggplot(data = spotify, aes(x = bpm, y = streams)) +
  geom_point() +
  scale_x_continuous(breaks = seq(0, 220, by = 20))

Major or Minor … What Difference Does It Make?

Songs written in major keys are generally happier songs. Conversely, songs written in minor keys are more melancholy or serious. Does the key a song is written in affect the number of streams? The answer is a very candid “it depends.” While songs written in major keys generated more streams than songs written in minor keys, there is no strong correlation between major/minor key songs and the number of streams they generate. If I HAD to pick one, I would most likely write a song in a major key based off this data alone. But to be clear, these are only statistics for one year.

ggplot(spotify, aes(x = mode, y = streams)) +
  geom_boxplot() +
  labs(x = "Major/Minor Key", y = "Streams", title = "Major/Minor Keys vs. # of Streams") 

Song Wordiness - Is It Even a Factor?

To use more or less words in a song - that is the question of a century. Well, lucky for you, the data is showing a strong preference towards songs that use less words. Some of the tracks with the most streams had a wordiness rating below 20 (these were graded on a scale of 1 to 100, with 100 being incredibly wordy).

ggplot(data = spotify, aes(x = wordiness, y = streams)) +
  geom_point() +
  scale_x_continuous(breaks = seq(0, 100, by = 10))

Does this make sense?

Whole-heartedly, yes.

Now, is there a time and place for songs that are wordy? Of course. However, you must keep in mind that you are a story teller without thousands of pages to work with. You must be able to convey your message in short, simple, but memorable phrases. This allows the audience to remember your lyrics easier and allow room for your music to shine through!