Top 50 Streamed Spotify Song’s Chart Appearances vs BPM 2023

Author

Kenny Nguyen

Link

Introduction

The topic I’m focusing on for this project is the top streamed songs on Spotify in 2023. My dataset contains 943 songs, sourced from Kaggle and compiled using multiple sources. Along with the track names, the dataset includes artist names, the number of artists on each track, the release year, month, and day, the number of Spotify playlists the song was added to, the number of times the song appeared on the Spotify charts, the total number of Spotify streams, and data from other platforms like Apple Music, Deezer, and Shazam. These platforms contributed information such as amount of playlist the song was added , how many times the song was on the Apple and Spotify chart.

The dataset also includes musical features like BPM (beats per minute), key, and mode (major or minor). A major scale usually sounds upbeat or happy, while a minor scale often feels more emotional or sad. Additional variables describe the track’s danceability, valence (musical positivity), energy level, acousticness (presence of acoustic sounds), instrumentalness (amount of instrumental content), liveness (live performance elements), and speechiness (spoken word content). The dataset was fairly clean, but I needed to remove the “%” symbol from some variable names for consistency and strip non-numeric characters from the stream counts.

I chose this topic because music has a powerful influence on my life. It can shift my mood and shape the course of my day. I listen to music constantly,when I wake up, drive, shower, study, or work. Every Thursday night at midnight, I look forward to new music drops, and Spotify creates a playlist of newly released songs it thinks I’ll enjoy. I listen to them all, picking out the ones I love. I’ve also been to countless concerts, and there’s no other experience when you are singing your favorite lyrics with a crowd and feeling the energy of others. According to my Spotify Wrapped, I listened to music for 89,571 minutes in 2023, which truly reflects how much music means to me.

One helpful article I found was How Does Spotify Work: A Comprehensive Guide by Mogul. It explains how Spotify, launched in 2008, revolutionized music access. Before streaming, people had to purchase physical media or download songs. Spotify eliminated that need by offering access to a library of songs through streaming. It allows users to create playlists and listen to a wide range of music. While the free version includes ads and limited skips, premium users have an ad free listening. A common misconception is that streams are equivalent to album sales, however, it takes 1,500 streams to equal one album sale.

One concept that I am focusing on in my project is beats per minute (BPM), which refers to the tempo, or how fast a song is going. I decided to gain more background knowledge, so I did some research and stumbled upon an article titled Everything to Know About the Song BPM to Make Music by Anton Berner. Berner goes into detail, explaining how the BPM of a song determines its mood, and how, when you tap your foot to a song, you’re actually tapping to its BPM. Different genres of music have preset BPM ranges—for example, house music typically ranges from 115 to 130 BPM, while hip hop music generally falls between 60 and 100 BPM. A fun fact that Berner shared is that, as you’re listening to a song, the BPM can actually impact your heart rate. Tempo also influences how catchy a song is, the catchier the song, the more likely people are to listen to it.

My project also focuses on Spotify’s chart. Spotify’s chart is calculated based on the popularity of a song among its users. It tracks songs that are actively being played at a given moment. The engagement a song receives also plays a role in its chart activity—for example, the number of saves, likes, and shares it gets. There are some limitations in place when Spotify determines chart rankings. One of these is the “30-Second Rule,” which states that a user must listen to a song for at least 30 seconds for it to be counted as a stream. A higher number of streams increases a song’s chances of climbing up the Spotify chart.

library(tidyverse) # Loading library for dplyr commands
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(RColorBrewer) # Loading library for color 
library(GGally) # Loading library for linear regression
Registered S3 method overwritten by 'GGally':
  method from   
  +.gg   ggplot2
library(plotly) # Loading library for interactivity

Attaching package: 'plotly'

The following object is masked from 'package:ggplot2':

    last_plot

The following object is masked from 'package:stats':

    filter

The following object is masked from 'package:graphics':

    layout
setwd("~/Desktop/Data 110")
songs <- read_csv("spotify-2023.csv")
Rows: 953 Columns: 24
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr  (5): track_name, artist(s)_name, streams, key, mode
dbl (17): artist_count, released_year, released_month, released_day, in_spot...
num  (2): in_deezer_playlists, in_shazam_charts

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(songs)
# A tibble: 6 × 24
  track_name          `artist(s)_name` artist_count released_year released_month
  <chr>               <chr>                   <dbl>         <dbl>          <dbl>
1 Seven (feat. Latto… Latto, Jung Kook            2          2023              7
2 LALA                Myke Towers                 1          2023              3
3 vampire             Olivia Rodrigo              1          2023              6
4 Cruel Summer        Taylor Swift                1          2019              8
5 WHERE SHE GOES      Bad Bunny                   1          2023              5
6 Sprinter            Dave, Central C…            2          2023              6
# ℹ 19 more variables: released_day <dbl>, in_spotify_playlists <dbl>,
#   in_spotify_charts <dbl>, streams <chr>, in_apple_playlists <dbl>,
#   in_apple_charts <dbl>, in_deezer_playlists <dbl>, in_deezer_charts <dbl>,
#   in_shazam_charts <dbl>, bpm <dbl>, key <chr>, mode <chr>,
#   `danceability_%` <dbl>, `valence_%` <dbl>, `energy_%` <dbl>,
#   `acousticness_%` <dbl>, `instrumentalness_%` <dbl>, `liveness_%` <dbl>,
#   `speechiness_%` <dbl>

Data Cleaning

names(songs) <- gsub("[%]", "", names(songs)) # removes % symobl
songs$streams <- as.numeric(gsub("[^0-9]", "", songs$streams)) #  remove all non-numeric characters from streams
top_50 <- songs |> # Creating a new data set to get songs with the top ten 50 songs 
  arrange(desc(streams)) |> # Making amount of streams going from highest to lowest
  slice(1:50) # Only getting songs with highest stream 1-50 
era<- top_50 |>
mutate( music_era = case_when(
released_year <= 1999 ~ "90's and older", #Finding songs that were released after 1999
released_year <=2010 ~ "2000's", #Finding songs that were released in the 2000s
released_year <= 2020 ~ "2010's", #Finding songs that were released in the 2010s
released_year >= 2020 ~ "2020's" #Finding songs that were released in the 2020s
)) 
head(era)
# A tibble: 6 × 25
  track_name          `artist(s)_name` artist_count released_year released_month
  <chr>               <chr>                   <dbl>         <dbl>          <dbl>
1 Love Grows (Where … Edison Lighthou…            1          1970              1
2 Blinding Lights     The Weeknd                  1          2019             11
3 Shape of You        Ed Sheeran                  1          2017              1
4 Someone You Loved   Lewis Capaldi               1          2018             11
5 Dance Monkey        Tones and I                 1          2019              5
6 Sunflower - Spider… Post Malone, Sw…            2          2018             10
# ℹ 20 more variables: released_day <dbl>, in_spotify_playlists <dbl>,
#   in_spotify_charts <dbl>, streams <dbl>, in_apple_playlists <dbl>,
#   in_apple_charts <dbl>, in_deezer_playlists <dbl>, in_deezer_charts <dbl>,
#   in_shazam_charts <dbl>, bpm <dbl>, key <chr>, mode <chr>,
#   danceability_ <dbl>, valence_ <dbl>, energy_ <dbl>, acousticness_ <dbl>,
#   instrumentalness_ <dbl>, liveness_ <dbl>, speechiness_ <dbl>,
#   music_era <chr>

Statistical Analysis

cor(era$in_spotify_charts, era$bpm) # Finding the correlation between worldwide sales and budget
[1] 0.3019694
fit1 <- lm(in_spotify_charts ~ bpm, data = era) # Creating a model to predict streams based on bpm
  
summary(fit1) # Getting a summary of the regression model (p-value, R-squared, coefficients, etc.)

Call:
lm(formula = in_spotify_charts ~ bpm, data = era)

Residuals:
    Min      1Q  Median      3Q     Max 
-34.088 -18.928  -6.076  14.277  89.081 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)  
(Intercept)  -8.6042    16.1246  -0.534   0.5961  
bpm           0.2846     0.1297   2.195   0.0331 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 27.2 on 48 degrees of freedom
Multiple R-squared:  0.09119,   Adjusted R-squared:  0.07225 
F-statistic: 4.816 on 1 and 48 DF,  p-value: 0.03307

The correlation between a song’s BPM (beats per minute) and its presence on Spotify charts is 0.302, indicating a moderate positive relationship. This suggests that songs with a higher BPM tend to appear more frequently on Spotify charts. The regression model is represented by the equation:

in_spotify_charts = -8.60 + 0.285(BPM)

This means that for each additional beat per minute, the model predicts an increase of about 0.285 chart appearances. The p-value for the BPM variable is 0.0331, which is statistically significant at the 0.05 level, suggesting that BPM is indeed a meaningful predictor of chart presence.

The Adjusted R-Squared value is 0.07225, meaning that about 7.2% of the variation in chart appearances can be explained by BPM. While BPM does have some impact on a song’s chart success, the majority of the variation is likely due to other factors, such as the artist’s popularity, song promotion, or release strategy.

Visualization

plot <- ggplot(era, aes(x = bpm, y = in_spotify_charts, color = music_era)) +  # Created a plot with bpm on x-axis, chart count on y-axis, colored by music era
  geom_point(size = 4, alpha = 0.7, shape = 21, stroke = 1.5) +  # Use large, transparent points with alpha of .7 and an outlined shape
  scale_color_manual(
    values = c(
      "90's and older" = "#9c0c1b",  # Assigned a deep red for 90s and older
      "2000's" = "#4188f2",# Assigned a light blue for 2000s
      "2010's" = "#f5f51b", # Assigned a neon yellow for 2010s
      "2020's" = "#6cf522" # Assigned light green  for 2020s
    )
  ) +
  labs(
    title = "Top 50 Streamed Spotify Song's Chart Appearances vs BPM 2023",  # Chart title
    x = "BPM (Tempo)",  # Label for x-axis
    y = "Number of Times on Spotify Charts",# Label for y-axis
    color = "Decade Released",# Legend title for color
    caption = "Source: Multiple Data Sources") +# Caption for data source
  theme_minimal(base_size = 14) + # Use minimal theme with larger base font
  theme(
    plot.title = element_text(size = 14, face = "bold"),# Charged size and boldness for plot title
    axis.title = element_text(size = 14)) # Changed size for axis titles
ggplotly()  # Made the plot interactive 

My visualization focuses on the top 50 most streamed songs on Spotify in 2023. I chose to analyze two variables: the number of times a song appeared on the Spotify charts and its BPM (tempo). To make my visualization rich, I colored the outlines of each data point according to the decade the song was released.

One interesting observation is that the song with the most Spotify chart appearances had a BPM of 174, which is quite fast. I also noticed that songs with fewer chart appearances tend to cluster at lower BPM values. This pattern led me to wonder whether there’s a “sweet spot” for BPM—an optimal tempo range that boosts a song’s success on the charts—perhaps known only to a select group of producers or industry experts.

Before finalizing this visualization, I tested various correlations between different combinations of categorical and quantitative variables. Many of those relationships turned out to be weak. At first, I wanted to explore the relationship between BPM and total stream count, but the correlation was too low to justify deeper analysis. Instead, I chose to focus on BPM and the number of Spotify chart appearances, which showed the strongest correlation in the dataset.

Bibliography

Berner, Anton. “Everything to Know about the Song BPM to Make Music – Soundtrap Blog.” Soundtrap Blog, 15 Mar. 2025, blog.soundtrap.com/everything-about-song-bpm. Accessed 18 Apr. 2025.

“Charts.” Spotify, support.spotify.com/us/artists/article/charts/.

“Frequently Asked Questions (FAQ).” CHART DATA, 1 Sept. 2017, chartdata.org/faq/.

“How Does Spotify Work: A Comprehensive Guide.” Usemogul.com, 2024, www.usemogul.com/post/how-does-spotify-work-a-comprehensive-guide.

Spotify. Spotify to Continue Its Service in Uruguay, 12 Dec. 2023, www.google.com/url?sa=i&url=https%3A%2F%2Fnewsroom.spotify.com%2F2023-12-12%2Fspotify-to-continue-its-service-in-uruguay%2F&psig=AOvVaw17J1NVimZfXW7KCGd-0Ijv&ust=1744854345074000&source=images&cd=vfe&opi=89978449&ved=0CBQQjRxqFwoTCNiv446324wDFQAAAAAdAAAAABAE. Accessed 15 Apr. 2025.