The topic I’m focusing on for this project is the top streamed songs on Spotify in 2023. My dataset contains 943 songs, sourced from Kaggle and compiled using multiple sources. Along with the track names, the dataset includes artist names, the number of artists on each track, the release year, month, and day, the number of Spotify playlists the song was added to, the number of times the song appeared on the Spotify charts, the total number of Spotify streams, and data from other platforms like Apple Music, Deezer, and Shazam. These platforms contributed information such as amount of playlist the song was added , how many times the song was on the Apple and Spotify chart.
The dataset also includes musical features like BPM (beats per minute), key, and mode (major or minor). A major scale usually sounds upbeat or happy, while a minor scale often feels more emotional or sad. Additional variables describe the track’s danceability, valence (musical positivity), energy level, acousticness (presence of acoustic sounds), instrumentalness (amount of instrumental content), liveness (live performance elements), and speechiness (spoken word content). The dataset was fairly clean, but I needed to remove the “%” symbol from some variable names for consistency and strip non-numeric characters from the stream counts.
I chose this topic because music has a powerful influence on my life. It can shift my mood and shape the course of my day. I listen to music constantly,when I wake up, drive, shower, study, or work. Every Thursday night at midnight, I look forward to new music drops, and Spotify creates a playlist of newly released songs it thinks I’ll enjoy. I listen to them all, picking out the ones I love. I’ve also been to countless concerts, and there’s no other experience when you are singing your favorite lyrics with a crowd and feeling the energy of others. According to my Spotify Wrapped, I listened to music for 89,571 minutes in 2023, which truly reflects how much music means to me.
One helpful article I found was How Does Spotify Work: A Comprehensive Guide by Mogul. It explains how Spotify, launched in 2008, revolutionized music access. Before streaming, people had to purchase physical media or download songs. Spotify eliminated that need by offering access to a library of songs through streaming. It allows users to create playlists and listen to a wide range of music. While the free version includes ads and limited skips, premium users have an ad free listening. A common misconception is that streams are equivalent to album sales, however, it takes 1,500 streams to equal one album sale.
One concept that I am focusing on in my project is beats per minute (BPM), which refers to the tempo, or how fast a song is going. I decided to gain more background knowledge, so I did some research and stumbled upon an article titled Everything to Know About the Song BPM to Make Music by Anton Berner. Berner goes into detail, explaining how the BPM of a song determines its mood, and how, when you tap your foot to a song, you’re actually tapping to its BPM. Different genres of music have preset BPM ranges—for example, house music typically ranges from 115 to 130 BPM, while hip hop music generally falls between 60 and 100 BPM. A fun fact that Berner shared is that, as you’re listening to a song, the BPM can actually impact your heart rate. Tempo also influences how catchy a song is, the catchier the song, the more likely people are to listen to it.
My project also focuses on Spotify’s chart. Spotify’s chart is calculated based on the popularity of a song among its users. It tracks songs that are actively being played at a given moment. The engagement a song receives also plays a role in its chart activity—for example, the number of saves, likes, and shares it gets. There are some limitations in place when Spotify determines chart rankings. One of these is the “30-Second Rule,” which states that a user must listen to a song for at least 30 seconds for it to be counted as a stream. A higher number of streams increases a song’s chances of climbing up the Spotify chart.
library(tidyverse) # Loading library for dplyr commands
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.1 ✔ tibble 3.2.1
✔ lubridate 1.9.4 ✔ tidyr 1.3.1
✔ purrr 1.0.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(RColorBrewer) # Loading library for color library(GGally) # Loading library for linear regression
Registered S3 method overwritten by 'GGally':
method from
+.gg ggplot2
library(plotly) # Loading library for interactivity
Attaching package: 'plotly'
The following object is masked from 'package:ggplot2':
last_plot
The following object is masked from 'package:stats':
filter
The following object is masked from 'package:graphics':
layout
Rows: 953 Columns: 24
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (5): track_name, artist(s)_name, streams, key, mode
dbl (17): artist_count, released_year, released_month, released_day, in_spot...
num (2): in_deezer_playlists, in_shazam_charts
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
names(songs) <-gsub("[%]", "", names(songs)) # removes % symoblsongs$streams <-as.numeric(gsub("[^0-9]", "", songs$streams)) # remove all non-numeric characters from streams
top_50 <- songs |># Creating a new data set to get songs with the top ten 50 songs arrange(desc(streams)) |># Making amount of streams going from highest to lowestslice(1:50) # Only getting songs with highest stream 1-50
era<- top_50 |>mutate( music_era =case_when(released_year <=1999~"90's and older", #Finding songs that were released after 1999released_year <=2010~"2000's", #Finding songs that were released in the 2000sreleased_year <=2020~"2010's", #Finding songs that were released in the 2010sreleased_year >=2020~"2020's"#Finding songs that were released in the 2020s)) head(era)
cor(era$in_spotify_charts, era$bpm) # Finding the correlation between worldwide sales and budget
[1] 0.3019694
fit1 <-lm(in_spotify_charts ~ bpm, data = era) # Creating a model to predict streams based on bpmsummary(fit1) # Getting a summary of the regression model (p-value, R-squared, coefficients, etc.)
Call:
lm(formula = in_spotify_charts ~ bpm, data = era)
Residuals:
Min 1Q Median 3Q Max
-34.088 -18.928 -6.076 14.277 89.081
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -8.6042 16.1246 -0.534 0.5961
bpm 0.2846 0.1297 2.195 0.0331 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 27.2 on 48 degrees of freedom
Multiple R-squared: 0.09119, Adjusted R-squared: 0.07225
F-statistic: 4.816 on 1 and 48 DF, p-value: 0.03307
The correlation between a song’s BPM (beats per minute) and its presence on Spotify charts is 0.302, indicating a moderate positive relationship. This suggests that songs with a higher BPM tend to appear more frequently on Spotify charts. The regression model is represented by the equation:
in_spotify_charts = -8.60 + 0.285(BPM)
This means that for each additional beat per minute, the model predicts an increase of about 0.285 chart appearances. The p-value for the BPM variable is 0.0331, which is statistically significant at the 0.05 level, suggesting that BPM is indeed a meaningful predictor of chart presence.
The Adjusted R-Squared value is 0.07225, meaning that about 7.2% of the variation in chart appearances can be explained by BPM. While BPM does have some impact on a song’s chart success, the majority of the variation is likely due to other factors, such as the artist’s popularity, song promotion, or release strategy.
Visualization
plot <-ggplot(era, aes(x = bpm, y = in_spotify_charts, color = music_era)) +# Created a plot with bpm on x-axis, chart count on y-axis, colored by music erageom_point(size =4, alpha =0.7, shape =21, stroke =1.5) +# Use large, transparent points with alpha of .7 and an outlined shapescale_color_manual(values =c("90's and older"="#9c0c1b", # Assigned a deep red for 90s and older"2000's"="#4188f2",# Assigned a light blue for 2000s"2010's"="#f5f51b", # Assigned a neon yellow for 2010s"2020's"="#6cf522"# Assigned light green for 2020s ) ) +labs(title ="Top 50 Streamed Spotify Song's Chart Appearances vs BPM 2023", # Chart titlex ="BPM (Tempo)", # Label for x-axisy ="Number of Times on Spotify Charts",# Label for y-axiscolor ="Decade Released",# Legend title for colorcaption ="Source: Multiple Data Sources") +# Caption for data sourcetheme_minimal(base_size =14) +# Use minimal theme with larger base fonttheme(plot.title =element_text(size =14, face ="bold"),# Charged size and boldness for plot titleaxis.title =element_text(size =14)) # Changed size for axis titlesggplotly() # Made the plot interactive
My visualization focuses on the top 50 most streamed songs on Spotify in 2023. I chose to analyze two variables: the number of times a song appeared on the Spotify charts and its BPM (tempo). To make my visualization rich, I colored the outlines of each data point according to the decade the song was released.
One interesting observation is that the song with the most Spotify chart appearances had a BPM of 174, which is quite fast. I also noticed that songs with fewer chart appearances tend to cluster at lower BPM values. This pattern led me to wonder whether there’s a “sweet spot” for BPM—an optimal tempo range that boosts a song’s success on the charts—perhaps known only to a select group of producers or industry experts.
Before finalizing this visualization, I tested various correlations between different combinations of categorical and quantitative variables. Many of those relationships turned out to be weak. At first, I wanted to explore the relationship between BPM and total stream count, but the correlation was too low to justify deeper analysis. Instead, I chose to focus on BPM and the number of Spotify chart appearances, which showed the strongest correlation in the dataset.
Bibliography
Berner, Anton. “Everything to Know about the Song BPM to Make Music – Soundtrap Blog.” Soundtrap Blog, 15 Mar. 2025, blog.soundtrap.com/everything-about-song-bpm. Accessed 18 Apr. 2025.
“How Does Spotify Work: A Comprehensive Guide.” Usemogul.com, 2024, www.usemogul.com/post/how-does-spotify-work-a-comprehensive-guide.
Spotify. Spotify to Continue Its Service in Uruguay, 12 Dec. 2023, www.google.com/url?sa=i&url=https%3A%2F%2Fnewsroom.spotify.com%2F2023-12-12%2Fspotify-to-continue-its-service-in-uruguay%2F&psig=AOvVaw17J1NVimZfXW7KCGd-0Ijv&ust=1744854345074000&source=images&cd=vfe&opi=89978449&ved=0CBQQjRxqFwoTCNiv446324wDFQAAAAAdAAAAABAE. Accessed 15 Apr. 2025.