Exploring music across genres and time

By Group S (Anzelle, Binali, Jia Yee, Sandy)

Introduction

Listening to music has never been easier with audio-streaming services available on the go. The advent of services like Spotify has helped to facilitate the spread of music to people all over the world. Its ubiquitous presence in our lives means that studying data obtained from the platform may give us insights into listening trends and how song properties have evolved over time.

We will investigate the following in our project:
1. How does song popularity vary across genres and is danceability a key indicator of how popular a song is?
2. How has track duration changed in terms of tracks produced and track popularity?

We hope our visualisations will reveal to both readers and ourselves, patterns in music over time and certain audio features that people subconsciously find appealing.

Data Description

Our group has gotten this data from the TidyTuesday project; it was collected by Kaylin Pavlik from Spotify, an audio-streaming platform, with the help of the “spotifyr” package. Pavlik selected 6 genres and 24 sub-genres to be used to find 471 playlists (20 for each sub-genre; there may be potential cleaning reducing it from 480.) In the data, we are provided with information about various tracks and the playlists they were found in, such as track popularity score, track duration and playlist genre. During cleaning, we removed 6 rows as they contained missing or anomalous values, which is not substantial compared to the original dimensions of 32833 rows and 23 columns. Due to some missing month and day values in track_album_release_date, year values will be isolated and primarily used over other time values.

Prior to getting the summary statistics, we removed duplicate rows that were a result of adding the same song to multiple different playlists. We want to roughly understand how danceable and popular the songs in our data are, as well as look at how many songs from each decade are in our data. From the results, we can see that the songs in the data have high danceability scores, lower popularity scores and an overwhelming majority of the songs were released in the last decade.

##    track_id         track_popularity  danceability            Decade     
##  Length:28351       Min.   :  0.00   Min.   :0.0771   1951 - 1960:    7  
##  Class :character   1st Qu.: 21.00   1st Qu.:0.5610   1961 - 1970:  191  
##  Mode  :character   Median : 42.00   Median :0.6700   1971 - 1980:  797  
##                     Mean   : 39.34   Mean   :0.6534   1981 - 1990: 1170  
##                     3rd Qu.: 58.00   3rd Qu.:0.7600   1991 - 2000: 2153  
##                     Max.   :100.00   Max.   :0.9830   2001 - 2010: 4135  
##                                                       2011 - 2020:19898

How has track duration changed in terms of tracks produced and track popularity?

Introduction

Being able to record music used to be a miracle, a rarity. However, with the advancements in sound engineering, we have transitioned from vinyl records, to compact discs (CDs) to audio softwares, and longer songs can be produced. But is that the case? Are songs produced getting longer? Are longer songs even popular? We aim to find out in this project how song duration has changed over the decades by looking at track duration, track popularity, and album release date (seen as synonymous with track release date).

Methodology

For the first visualisation, we have a multiple probability density plot of track duration, where various curves represent song duration distribution in various decades. This plot has been chosen as it allows readers to compare track duration distributions over the decades at a glance, as well as identify mode duration values easily. The shape of the density curves, together with labelled mode values, will help readers see a trend in track duration easily. It is to be noted that for identical songs released in different markets (essentially the same track but with different track_id), only one song is kept for analysis for the density plot as we require only one duration value.

For the second visualisation we chose a 2D histogram as both variables under consideration (track popularity and track duration) are continuous. We divided the data by year, and stacked the plots on top of each other to make comparisons between years easier. Moreover, we felt this plot type allowed users to best see how the observations are spread out through the years. (Some years had more observations than others). Furthermore, we added a median line, to show to readers how the song duration has changed over the years. As there were many outliers in each year, we realised the data was skewed, and hence chose to display the median over the mean. The outliers were not removed as they could be different kinds of songs (film scores, musicals, etc.) or simply the artist’s choice, and hence needed to be considered.

Visualizations

Discussions

For the first visualisation, it is hoped that the reader will be able to see that track duration distribution has gradually shifted to the left; mode duration values have also decreased. The plot reveals a decreasing trend of track duration over the specified period, though distribution of track duration remains largely normal. One reason behind the observed trend could be that artists are incentivised by music streaming services to produce shorter songs so that they earn more as per the pay-per-play basis. Through our second visualisation, we shall be able to discover if shorter songs are indeed more popular and fuel artists to produce shorter tracks.

From the second visualisation, readers can observe that median track duration has decreased consistently over the years, except from 2011 to 2012. The plot also reveals that at every track length, song popularity varies a lot and thus, we cannot observe a clear pattern between track duration and track popularity. Readers can see that song popularity peaks in and around the 3.5 minute mark and slightly tapers as you move away from it in either direction. We believe the main motivation for artists to reduce song duration is revenue. On Spotify, artists get paid according to the number of streams and a song must be played for longer than 30 seconds (0.5 minutes) to be considered a stream. So if a shorter song is played over and over, or if an album is packed with more songs of shorter duration rather than fewer longer songs, the album would generate a higher revenue. Thus, it could be more profitable to have a shorter song.

Going into a brief discussion of the limitations and evaluations, for the first visualisation, entries from decades before the 1990s were excluded as they constituted a small percentage of the data and were deemed to be insufficient to be used for creating a density curve that generalises song duration for that decade. Had there been more data available for the decades before the 1990s, we would be more confident of plotting a multiple density plot for the past 7 decades. For the second visualisation, we only included data from the year 2010 because we wanted to avoid huge discrepancies between the number of observations in each year. However, as seen on the graph, there is still a big variation between data points in each year. Moreover, as most data points are concentrated in one region, it is difficult to see individual data points. We attempted to resolve this issue by using a colour gradient for count (number of songs) with two contrasting colours.

Reference

Data Source:
https://github.com/rfordatascience/tidytuesday/blob/master/data/2020/2020-01-21/readme.md

Other References:
https://www.ipr.edu/blogs/audio-production/transformation-sound-recording-technology/

https://fortune.com/2019/01/17/shorter-songs-spotify/

https://www.vox.com/2014/8/18/6003271/why-are-songs-3-minutes-long

https://www.dailymail.co.uk/sciencetech/article-9085211/Pop-songs-shorter-decade-faltering-attention-spans.html

https://www.hypebot.com/hypebot/2021/01/short-attention-spans-are-dramatically-altering-songwriting-heres-how.html

https://www.theverge.com/2019/5/28/18642978/music-streaming-spotify-song-length-distribution-production-switched-on-pop-vergecast-interview

https://www.planetarygroup.com/do-artists-get-paid-every-time-song-played-spotify/

https://www.hypebot.com/hypebot/2021/11/how-spotify-royalties-actually-work.html

https://www.kaylinpavlik.com/classifying-songs-genres/