By Group S (Anzelle, Binali, Jia Yee, Sandy)
Listening to music has never been easier with audio-streaming services available on the go. The advent of services like Spotify has helped to facilitate the spread of music to people all over the world. Its ubiquitous presence in our lives means that studying data obtained from the platform may give us insights into listening trends and how song properties have evolved over time.
We will investigate the following in our project:
1. How does song popularity vary across genres and is danceability a key
indicator of how popular a song is?
2. How has track duration changed in terms of tracks produced and track
popularity?
We hope our visualisations will reveal to both readers and ourselves, patterns in music over time and certain audio features that people subconsciously find appealing.
Our group has gotten this data from the TidyTuesday project; it was
collected by Kaylin Pavlik from Spotify, an audio-streaming platform,
with the help of the “spotifyr” package. Pavlik selected 6 genres and 24
sub-genres to be used to find 471 playlists (20 for each sub-genre;
there may be potential cleaning reducing it from 480.) In the data, we
are provided with information about various tracks and the playlists
they were found in, such as track popularity score, track duration and
playlist genre. During cleaning, we removed 6 rows as they contained
missing or anomalous values, which is not substantial compared to the
original dimensions of 32833 rows and 23 columns. Due to some missing
month and day values in track_album_release_date, year
values will be isolated and primarily used over other time values.
Prior to getting the summary statistics, we removed duplicate rows that were a result of adding the same song to multiple different playlists. We want to roughly understand how danceable and popular the songs in our data are, as well as look at how many songs from each decade are in our data. From the results, we can see that the songs in the data have high danceability scores, lower popularity scores and an overwhelming majority of the songs were released in the last decade.
## track_id track_popularity danceability Decade
## Length:28351 Min. : 0.00 Min. :0.0771 1951 - 1960: 7
## Class :character 1st Qu.: 21.00 1st Qu.:0.5610 1961 - 1970: 191
## Mode :character Median : 42.00 Median :0.6700 1971 - 1980: 797
## Mean : 39.34 Mean :0.6534 1981 - 1990: 1170
## 3rd Qu.: 58.00 3rd Qu.:0.7600 1991 - 2000: 2153
## Max. :100.00 Max. :0.9830 2001 - 2010: 4135
## 2011 - 2020:19898
Every song has its own unique set of audio features determined by the Spotify algorithm, that makes it distinguishable and special. In the dataset, songs are classified into six main genres and further divided into unique sub-genres. For this question, we require: genre, popularity, and audio features (danceability and energy). We decided to explore this question due to the rising popularity of Tik Tok, and the increased prevalence of video clips featuring people dancing to catchy music in recent years. Hence, we hypothesise that the more danceable songs are likely to gain more traction.
The first visualisation is a box plot layered on top of a scatterplot
with the Y-axis representing the genre of songs, X-axis representing
song popularity, and the colour gradient of dots representing
danceability. A box plot was used because it makes comparison of the
popularities of the different genres easier to understand as it
highlights the median, interquartile range and spread. Labelled median
values for track_popularity were added to help readers
easily identify the different popularities for each genre. Colour of the
data points were used to accentuate the spread of
danceability across popularity for each genre. From the
plot, there seems to be no clear correlation between danceability and
popularity, but it is instead related to the genre.
Following our findings, we go into our second visualisation to see if there are any other factors actually at play. We investigated each audio feature and realised that when plotting danceability against energy, distinct clusters between genres are evident. For the second visualisation, we used the Y-axis and X-axis to represent the mean danceability and mean energy within each sub-genre respectively. Colour was used to denote genre and size of the points represented the mean popularity within the sub-genre. In this case, since both danceability and energy are continuous variables, a scatterplot would be most suitable as it allows distinct plotting data points for the various sub-genres. We also added ellipses to highlight the clusters so that they will be more obvious to the reader.
From the first plot, it can be seen that the most popular genre within our data is Pop, and the least popular is EDM. Danceability seems to be higher than average for Latin and Rap, and lower than average for Rock. Evidently Pop is the most popular genre even though its songs have lower danceability than Latin and Rap, which come in second and third place respectively. A possible reason is that Pop is commonly used as an umbrella term for songs that produce the most hits. Thus, it may encompass a larger range of music with varying audio features. The widespread appeal and versatility of Pop also means that it can be played in many different environments and settings, which could be the reason for higher popularity scores.
For the second visualisation, we hope that when another factor (energy) is brought into the picture, readers will be able to identify how different combinations of levels of danceability and energy contribute to a genre’s popularity. The most popular genres all have medium levels of energy and medium to high levels of danceability. This suggests that people enjoy songs of more moderate intensity instead of the mellower or more energetic ones located at the ends of the energy spectrum. Findings from a study in 2018 on how the intensity of songs affect emotional experience showed that medium music evoked the strongest emotional arousal. This trend can be seen from the plot whereby the top 4 most popular genres are all generally of medium energy levels.
A limitation we observed is that the assignment of song genres are based on the playlist that the song was added to and possibly not what the track artist had originally intended. However, it is worth noting that the songs were taken from “Every Noise”, a visualisation of Spotify’s entire genre space which is maintained by a “genre taxonomist”, making the labels somewhat reliable. Even though both plots show a clear diversion from our initial hypothesis that danceable songs are more popular, the second plot still shows some overlaps between genres such as “new jack swing”, that cannot be explained by merely two audio features. While music taste and preferences will always be a topic of debate due to its subjectivity to the unique individual, insights from data could still help to paint a general picture on listening trends.
Being able to record music used to be a miracle, a rarity. However, with the advancements in sound engineering, we have transitioned from vinyl records, to compact discs (CDs) to audio softwares, and longer songs can be produced. But is that the case? Are songs produced getting longer? Are longer songs even popular? We aim to find out in this project how song duration has changed over the decades by looking at track duration, track popularity, and album release date (seen as synonymous with track release date).
For the first visualisation, we have a multiple probability density
plot of track duration, where various curves represent song duration
distribution in various decades. This plot has been chosen as it allows
readers to compare track duration distributions over the decades at a
glance, as well as identify mode duration values easily. The shape of
the density curves, together with labelled mode values, will help
readers see a trend in track duration easily. It is to be noted that for
identical songs released in different markets (essentially the same
track but with different track_id), only one song is kept
for analysis for the density plot as we require only one duration
value.
For the second visualisation we chose a 2D histogram as both variables under consideration (track popularity and track duration) are continuous. We divided the data by year, and stacked the plots on top of each other to make comparisons between years easier. Moreover, we felt this plot type allowed users to best see how the observations are spread out through the years. (Some years had more observations than others). Furthermore, we added a median line, to show to readers how the song duration has changed over the years. As there were many outliers in each year, we realised the data was skewed, and hence chose to display the median over the mean. The outliers were not removed as they could be different kinds of songs (film scores, musicals, etc.) or simply the artist’s choice, and hence needed to be considered.
For the first visualisation, it is hoped that the reader will be able to see that track duration distribution has gradually shifted to the left; mode duration values have also decreased. The plot reveals a decreasing trend of track duration over the specified period, though distribution of track duration remains largely normal. One reason behind the observed trend could be that artists are incentivised by music streaming services to produce shorter songs so that they earn more as per the pay-per-play basis. Through our second visualisation, we shall be able to discover if shorter songs are indeed more popular and fuel artists to produce shorter tracks.
From the second visualisation, readers can observe that median track duration has decreased consistently over the years, except from 2011 to 2012. The plot also reveals that at every track length, song popularity varies a lot and thus, we cannot observe a clear pattern between track duration and track popularity. Readers can see that song popularity peaks in and around the 3.5 minute mark and slightly tapers as you move away from it in either direction. We believe the main motivation for artists to reduce song duration is revenue. On Spotify, artists get paid according to the number of streams and a song must be played for longer than 30 seconds (0.5 minutes) to be considered a stream. So if a shorter song is played over and over, or if an album is packed with more songs of shorter duration rather than fewer longer songs, the album would generate a higher revenue. Thus, it could be more profitable to have a shorter song.
Going into a brief discussion of the limitations and evaluations, for
the first visualisation, entries from decades before the 1990s were
excluded as they constituted a small percentage of the data and were
deemed to be insufficient to be used for creating a density curve that
generalises song duration for that decade. Had there been more data
available for the decades before the 1990s, we would be more confident
of plotting a multiple density plot for the past 7 decades. For the
second visualisation, we only included data from the year 2010 because
we wanted to avoid huge discrepancies between the number of observations
in each year. However, as seen on the graph, there is still a big
variation between data points in each year. Moreover, as most data
points are concentrated in one region, it is difficult to see individual
data points. We attempted to resolve this issue by using a colour
gradient for count (number of songs) with two contrasting
colours.
Data Source:
https://github.com/rfordatascience/tidytuesday/blob/master/data/2020/2020-01-21/readme.md
Other References:
https://www.ipr.edu/blogs/audio-production/transformation-sound-recording-technology/
https://fortune.com/2019/01/17/shorter-songs-spotify/
https://www.vox.com/2014/8/18/6003271/why-are-songs-3-minutes-long
https://www.planetarygroup.com/do-artists-get-paid-every-time-song-played-spotify/
https://www.hypebot.com/hypebot/2021/11/how-spotify-royalties-actually-work.html