SPOTIFY
Spotify is a popular music streaming platform that is used by millions of people daily (About Spotify, 2023). After launching in 2008, the company forever changed how music was spread, listened to, and enjoyed by the mass public. As the “world’s most popular audio streaming subscription service,” Spotify’s success is beyond dispute with their 489 million yearly users (About Spotify, 2023). However, this success would be impossible without the plethora of musical genres, songs, and diversity of artists on the platform itself. A countless number of songs exist within Spotify’s databases, each with their own unique qualities that attract the attention of listeners and aid in Spotify’s returning and loyal customer base.
WHAT ARE WE LOOKING AT?
Judging on the factors of tempo, duration in milliseconds, and danceability, how have these aspects of music changed over the last 8 decades? Especially with the rise of platforms such as TikTok, my investigation is rooted in seeing whether songs have changed, gotten shorter, or altered aspects of their music as artists may have begun to cater their music to become more popular on short video streaming services. According to Business of Apps, TikTok usage has skyrocketed in the past five years. Since 2018, the user base on the application has increased by a ginormous +1872.94% (Iqbal, 2023). In addition to this, many companies such as Instagram, Youtube, and Snapchat followed suit by creating similar short form media platforms within their application. Therefore, this investigation also seeks to see how the influence of such platforms may influence music marketing, creation, and changing aspects of current popular music from before.
HYPOTHESES
In our present investigation, we are looking at 5 main predictions/hypotheses:
The average tempo over decades has increased over time, as songs are becoming quicker with the rise of social media.
The average danceability of music has increased from the 1990’s onward, following an increase in internet dancing trends and dance-centered music on social media platforms, such as TikTok.
The average length (Duration in milliseconds) of songs throughout decades will show a general decline in the 2020’s.
The average duration of songs will show the largest decrease in the genre of Pop.
There will be an overall decrease in average song duration in every genre
THE PRESENT DATASET
The following dataset was taken from a Tidy Tuesday dataset. This dataset can be found at the link: here
The dataset has songs from the year 1950 to 2020, included on the platform Spotify, with a variety of variables of interest. The present dataset is ‘sample’ data, that picked the 5 most popular genres, and then completed a selection of 20 playlists from each genre to curate the present list of songs.
In order to better suit this analysis, the original .CSV file was opened in Excel and 2 columns were added. The column “Year” was added to the original dataset utilizing the =YEAR() function in Excel, and extracting the year from each release date. On top of this, another column was added implementing the decade that the year comes from (i.e.: 1980’s, 1990’s, etc) utilizing an Excel function in order to group each song average by the decade it came from for our visualizations below.
First we load all necessary packages for this project below. This includes the tidyverse package,the dplyr package, the knitr package, and the stringr package:
library(tidyverse)
library(dplyr)
library(knitr)
library(stringr)
options(scipen = 100000)Next, we load in the necessary Spotify dataset to base our analysis on:
library(readr)
spotify <- read_csv("~/Downloads/spotify_songs_upd.csv")Next, we subset the columns necessary for the present analysis:
spotify_songs <- spotify[c('tempo',
'danceability', 'year', 'playlist_genre', "track_popularity" , 'duration_ms', 'decade')]Average Tempo Over Time
Our first visualization takes a look at the average tempo over the 4 decades our dataset covers. In order to visualize this, we start by grouping the data by decades, and also filtering out any decades that were blank values in order to clean our data. This was done utilizing the filter function. Next, we take the average of the tempos of those songs by decades, utilizing the mean() function.
This information is then plotted with an emphasis made of the bar of the 1950’s to showcase the highest tempo average in comparison to other decades.
tempo_over_time <- spotify_songs %>%
group_by(decade) %>%
filter(decade != '#VALUE!') %>%
filter(decade != '1900s') %>%
summarise(avg_tempo = mean(tempo, na.rm = TRUE))
ggplot(tempo_over_time, aes(x = as.factor(decade), y = avg_tempo)) +
geom_col(aes(fill = decade == '1950s'), show.legend = FALSE) +
scale_fill_manual(values = c("FALSE" = "magenta", "TRUE" = "purple")) +
labs(title = 'Average Tempo of Songs Over Decades',
x = 'Decade',
y = 'Average Tempo') +
geom_smooth()+
theme_classic()## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
As is evident, our first hypothesis is unfortunately disproven. We see a pretty general peak of the average tempo of songs by decade beginning in the 1950’s, with a plateau since then, not showing much movement or any distinct trend. This indicates that songs that were actually the quickest, with the highest tempos, were released in the 1950’s, with songs staying at an average tempo level of ~120 bpm throughout the decades since then. This allows for insight into how music has changed since the 50’s, with music seemingly being much faster during that decade than the decades since then and now.
Danceability
Our next variable of interest is the average danceability of songs over the decades investigated. Our hypothesis for this specifically stated that following the rise of social media, we would expect to see an increase of the ‘danceability’ ratings of songs on average since the 1990’s. In entering the digital world, we are expecting to see a general increase in danceability on average.
danceability <- spotify_songs %>%
group_by(decade) %>%
filter(decade != '#VALUE!') %>%
filter(decade != '1900s') %>%
summarise(avg_danceability = mean(danceability, na.rm = TRUE))
as.factor(danceability$decade) -> danceability$decade
danceability %>%
ggplot(aes(x = decade, y = avg_danceability)) +
geom_col(fill = 'hotpink') +
labs(title = 'Average Danceability of Songs Over Time',
x = 'Decade',
y = 'Average Danceability') +
geom_smooth()+
theme_classic()## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
Judging from this visualization, we see a specific drop in danceability on average from the 1950’s to the 1960’s, indicating songs became less ‘danceable’ in the 1960’s. On top of that, from 1960- 1990 we see a general increase in danceability on average, with a slight dip in the 2000’s, and. ageneral increase since then. This visualization does not necessarily disprove our hypothesis, as we do see a somewhat general increase in danceability scores from 1990’s onwards into the ‘Digital Age’ of the 2000’s, however, that increase from a visual standpoint seems quite miniscule in comparison to the increase we see from the 1960’s to the 1990’s. Consequently, it is questionable if our hypothesis was proven fully correct or not. However, this visualization did allow us to see the decrease in danceability in the 1960’s, likely because of the rise of rock and R&B music influences, with a peak in the 1990’s and 2020’s showing the highest danceability scores, likely as a result of quick paced ‘Pop’ music.
Danceability by Genre
The next visualization in Tableau splits up danceability’s average ratings by genre overall to see distinct differences.
Average Duration Over Decades
Our next variable of interest is the average duration of songs over the decades our dataset covers. This is an interesting investigation, as our hypothesis predicts that the average duration of songs over the past 8 decades will show that songs are getting shorter and shorter. We predict this will be because of the rise in popularity of social media and short video platforms, indicating that artists are having to cater their music to fit shorter spans of time to reach peak levels of popularity in the online space.
spotify_songs %>%
group_by(decade) %>%
filter(decade != '1900s') %>%
filter(decade != '#VALUE!') %>%
summarize(avg_length = mean(duration_ms)) %>%
ggplot(aes(decade, avg_length)) +
labs(title = "Average Duration of Songs by Decade", x = "Decade", y = "Average Length (ms)") +
geom_col(fill = "pink") +
geom_smooth()+
geom_smooth(method = lm, se = FALSE, color = "pink") +
theme_classic() ## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'
As we can see, the shortest songs in all decades actually seemed to be from the 1950’s, again possibly pointing to the influence of bop and rock music taking rise.
In terms of our hypothesis, we can see that from the peak of song durations in the 1980’s, songs seemingly follow what was predicted, and show a general decline in average song duration from the 1980’s to the 2020’s. This is indicative of our hypothesis, as songs seem to be getting shorter and shorter with the rise of social media and short form video content.
Next, I thought it could be impactful to look at the average durations of songs over time by their release date specifically, rather than their decade of release. The following visualizations are interactive Tableau graphs that allow you to highlight trends and see specific data points, as well as containing some conclusions in the text boxes placed above the visualizations. Feel free to click between the two graphs to see the average duration by years overall, and click on the second visualization in order to see the trend of average duration of songs from the 1990’s to the 2020’s specifically.
As we can see from both of these visualizations, we see a normal bell-curve occurring in terms of song duration on average. Based on the Spotify dataset, we can see a low peak in the most recent years from the height in the 1980’s.
Genres by Decade
Finally, the final visualization looks at this duration split by genre, to see if there were any specific nuances in how music duration changed over the decades between different categories of music. The following visualization in Tableau is also interactive, meaning you cna highlight the bars to see specific data points, and there is also an option to change the dropdown of what decade you are highlighting. The decade of the 2020’s is auto-selected in order to portray how it is the lowest for nearly every genre, indicating that this overall decrease in song duration occurred throughout music in its entirety, not only to pop music.
The text boxes above the visualization help conclude this visualization further.
CONCLUSIONS
The analysis of the Spotify dataset reveals interesting trends in music over the past two decades. It appears that the average tempo and danceability of songs have shown notable variations, potentially reflecting changes in listener preferences and industry trends.
In terms of our conclusions from our hypotheses,
The average tempo did not show an increase over time, notably showing a drastic peak in the 1950’s, and a plateau since then. This indicates that songs were extremely quick and fast in the 1950’s, with music staying at a same tempo of ~120 bpm in the decades since then.
The average danceability of music did show a small increase in danceability scores from the 2000’s to the 2020’s, but it is hard to tell if it is a statistically significant increase. This indicates that songs are becoming more danceable, but songs in the 1990’s were seemingly the ‘most danceable’.
The average length (Duration in milliseconds) of songs throughout the decades did show a general decline over time, affirming our 3rd hypothesis. This indicates that songs seemingly get shorter and shorter over time, with larger decreases following the rise of social media and short video platforms.
The average duration of songs did not show the largest decrease in the pop genre specifically as our hypothesis predicted, but rather showed the largest decrease in the rap genre in particular. Other genres also showed a general decrease in duration, including the pop genre, indicating that songs are getting shorter and shorter overall.
Our 5th hypothesis was proven that all genres of songs showed a decrease in the average duration overall, with some nuances explained.
Overall, 3 of our main hypotheses were proven. Although the other 2 were not, there were some valuable takeaways in terms of how music has changed. Particularly, the tempo of songs showing a pretty interesting peak in the 1950’s in particular. In addition, the largest change/decrease in the genres themselves was rap, offering valuable insights into how the genre has changed to fit shorter song formats, possibly to be easier to share and/or stream on social media platforms.
The influence of social media and digital platforms seems evident, though further detailed study would be required to establish direct causal relationships. This study provides a foundation for understanding how digital platforms might be shaping musical trends.
LIMITATIONS
These visualizations, while extensive, have certain limitations that are very important to recognize in our generalization of music as a whole.
First, it is important to recognize that the utilized dataset is confined to songs only available on Spotify, which may not represent all music trends globally or songs that may not be on this platform. This may also inadvertently exclude other cultures or ways of spreading music that are not typically online.
Additionally, this analysis is limited to the variables provided in the dataset, and there may be other factors influencing music trends that are not captured nor investigated in this present study. These could include an influx in ways of listening to music, an infux of genres or subgenre types, and/or an influx of accessibility to music.
Although we provide a hypothesis for why we believed songs were getting shorter following the impact of what I labeled the ‘Digital Age’, the impact of social media platforms like TikTok is inferred but not directly measured. This could be an area for further research.
In addition, the decade of the 2020’s is not completed (as we know, as we are only currently in 2023), however the dataset was last updated in 2020, limiting our results to only that year entirely encompassing the decade of the 20’s, which, we can assume, is not entirely encompassing nor accurate for an entire decade worth of songs in comparison.
Finally, the variable ‘danceability’ itself is very vague and subjective. This means this numerical value was put up to the listeners and writers of the datasets own opinions, limiting it in its ability to truly explain patterns in music in a generalizable way.
Overall, we learned a lot about how music seems to change over time, and can hopefully utilize this information to predict how changes in music can occur in the future, possibly equipping those within the industry to better curate their music for optimal popularity.
CITATIONS
About Spotify. Spotify. (2023, February 2). Retrieved March 3, 2023, from https://newsroom.spotify.com/company-info/
Iqbal, Mansoor. “Tiktok Revenue and Usage Statistics (2023).” Business of Apps, 3 May 2023, www.businessofapps.com/data/tik-tok-statistics/#:~:text=TikTok%20reached%201.6%20billion%20users,by%20the%20end%20of%202023.