Danceability in Time and Tonality

Introduction

We aim to explore how the danceability of popular songs has evolved over a specific time period using quantitative data analysis. By leveraging music datasets (e.g., Spotify’s music database), we will investigate trends in danceability scores across genres, decades, and cultural shifts.

For this research, we will use R and RStudio to analyze the danceability of songs across different decades by querying data from Spotify’s API. First, we will import the .csv file and retrieve track metadata and audio features—including danceability scores—from curated playlists that span various genres and time periods. we will extract release dates to categorize tracks by decade, then use packages like dplyr to calculate and visualize average danceability trends over time. Additionally, we examine which playlists these songs appear in to identify patterns in how danceable tracks are grouped—such as workout, party, or chill playlists—and whether certain playlist types consistently feature higher danceability scores. This approach will allow us to uncover both temporal and contextual shifts in musical rhythm and movement.

To ensure meaningful analysis, the first step in this research involves tidying and standardizing the data. This includes cleaning inconsistencies in release dates, converting them into a uniform format, and categorizing songs by decade. Duplicate entries and missing values in key variables like danceability scores must be addressed to maintain data integrity. Standardizing playlist names and genres also helps in grouping and comparing across categories. Once the dataset is clean, visualizations become powerful tools: line graphs will be especially useful to show trends in average danceability over time, while boxplots can highlight the distribution and variability of danceability within each decade. Heatmaps may also reveal correlations between playlist types and danceability scores, offering a layered view of how musical movement has evolved both temporally and contextually.

Packages Required

We load the spotifyr package to access Spotify data through its API. The tidyverse, dplyr, and lubridate packages are also loaded, as they provide a comprehensive set of tools for efficient data manipulation, visualization, and analysis.

Data Preparation

Our data comes directly from Spotify via the spotifyr package.

Data Dictionary

variable class description
track_id character Song unique ID
track_name character Song Name
track_artist character Song Artist
track_popularity double Song Popularity (0-100) where higher is better
track_album_id character Album unique ID
track_album_name character Song album name
track_album_release_date character Date when album released
playlist_name character Name of playlist
playlist_id character Playlist ID
playlist_genre character Playlist genre
playlist_subgenre character Playlist subgenre
danceability double Danceability describes how suitable a track is for dancing based on a combination of musical elements including tempo, rhythm stability, beat strength, and overall regularity. A value of 0.0 is least danceable and 1.0 is most danceable.
energy double Energy is a measure from 0.0 to 1.0 and represents a perceptual measure of intensity and activity. Typically, energetic tracks feel fast, loud, and noisy. For example, death metal has high energy, while a Bach prelude scores low on the scale. Perceptual features contributing to this attribute include dynamic range, perceived loudness, timbre, onset rate, and general entropy.
key double The estimated overall key of the track. Integers map to pitches using standard Pitch Class notation . E.g. 0 = C, 1 = C♯/D♭, 2 = D, and so on. If no key was detected, the value is -1.
loudness double The overall loudness of a track in decibels (dB). Loudness values are averaged across the entire track and are useful for comparing relative loudness of tracks. Loudness is the quality of a sound that is the primary psychological correlate of physical strength (amplitude). Values typical range between -60 and 0 db.
mode double Mode indicates the modality (major or minor) of a track, the type of scale from which its melodic content is derived. Major is represented by 1 and minor is 0.
speechiness double Speechiness detects the presence of spoken words in a track. The more exclusively speech-like the recording (e.g. talk show, audio book, poetry), the closer to 1.0 the attribute value. Values above 0.66 describe tracks that are probably made entirely of spoken words. Values between 0.33 and 0.66 describe tracks that may contain both music and speech, either in sections or layered, including such cases as rap music. Values below 0.33 most likely represent music and other non-speech-like tracks.
acousticness double A confidence measure from 0.0 to 1.0 of whether the track is acoustic. 1.0 represents high confidence the track is acoustic.
instrumentalness double Predicts whether a track contains no vocals. “Ooh” and “aah” sounds are treated as instrumental in this context. Rap or spoken word tracks are clearly “vocal”. The closer the instrumentalness value is to 1.0, the greater likelihood the track contains no vocal content. Values above 0.5 are intended to represent instrumental tracks, but confidence is higher as the value approaches 1.0.
liveness double Detects the presence of an audience in the recording. Higher liveness values represent an increased probability that the track was performed live. A value above 0.8 provides strong likelihood that the track is live.
valence double A measure from 0.0 to 1.0 describing the musical positiveness conveyed by a track. Tracks with high valence sound more positive (e.g. happy, cheerful, euphoric), while tracks with low valence sound more negative (e.g. sad, depressed, angry).
tempo double The overall estimated tempo of a track in beats per minute (BPM). In musical terminology, tempo is the speed or pace of a given piece and derives directly from the average beat duration.
duration_ms double Duration of song in milliseconds

Next, we will transform and tidy our data using the below codes.

####Turning scientific numbers into integers 
options(scipen = 999)

####Turning loudness numbers over zero to n/a
spotify_songs$instrumentalness[spotify_songs$instrumentalness > 0] <- NA

####turning mode to major or minor keys
spotify_songs$mode <- ifelse(spotify_songs$mode == 1, "major", "minor")

####turning key from numeric to character
spotify_songs <- spotify_songs %>%mutate(key = case_when(key == 0  ~ "C",key == 1  ~ "C♯/D♭",key == 2  ~ "D",key == 3  ~ "D♯/E♭",key == 4  ~ "E",key == 5  ~ "F",key == 6  ~ "F♯/G♭",key == 7  ~ "G",key == 8  ~ "G♯/A♭",key == 9  ~ "A",key == 10 ~ "A♯/B♭",key == 11 ~ "B",TRUE      ~ as.character(key)),mode = ifelse(mode == 1, "major", "minor"),key_signature = paste(key, mode))

###using date column to create  years column and then decades column
spotify_songs$year <- as.numeric(substr(spotify_songs$track_album_release_date, 1, 4))
spotify_songs$decade <- floor(spotify_songs$year / 10) * 10

Now that our data is tidied, we are able to create a new data frame with only the columns that we are going to use:

spotify_songs_project <- spotify_songs %>%
  select(track_name, track_artist, playlist_genre, danceability, decade, key_signature)

Below is the top 10 lines of our tidied data:

head(spotify_songs_project,10)
## # A tibble: 10 × 6
##    track_name      track_artist playlist_genre danceability decade key_signature
##    <chr>           <chr>        <chr>                 <dbl>  <dbl> <chr>        
##  1 I Don't Care (… Ed Sheeran   pop                   0.748   2010 F♯/G♭ minor  
##  2 Memories - Dil… Maroon 5     pop                   0.726   2010 B minor      
##  3 All the Time -… Zara Larsson pop                   0.675   2010 C♯/D♭ minor  
##  4 Call You Mine … The Chainsm… pop                   0.718   2010 G minor      
##  5 Someone You Lo… Lewis Capal… pop                   0.65    2010 C♯/D♭ minor  
##  6 Beautiful Peop… Ed Sheeran   pop                   0.675   2010 G♯/A♭ minor  
##  7 Never Really O… Katy Perry   pop                   0.449   2010 F minor      
##  8 Post Malone (f… Sam Feldt    pop                   0.542   2010 E minor      
##  9 Tough Love - T… Avicii       pop                   0.594   2010 G♯/A♭ minor  
## 10 If I Can't Hav… Shawn Mendes pop                   0.642   2010 D minor

Some interesting tidbits about our data are how few genres are represented, only 6! Most of our data set includes songs only from the 2010s which may skew our results.

## 
##   edm latin   pop   r&b   rap  rock 
##  6043  5155  5507  5431  5746  4951
## 
##  1950  1960  1970  1980  1990  2000  2010  2020 
##     3   172   966  1306  2310  4077 23214   785

Proposed Exploratory Data Analysis

For our analysis, we have already created new variables, decade and key_signatures. Decades was done to chunk up our original track_album_release_date column so that we would have less unique observations and help us track danceability over eras not just years. Also, we decided to create a new data frame from our original to only include columns that we were interested in to help us keep our information tidy and succinct.

We are planning to use line graphs, box plots, and heatmaps to help us find our answers.

As far as machine learning, we are thinking of creating a binary classification model to help us predict whether a song in a certain key from a certain genre will have a high danceability or not.

Formatting and Other Requirements