This project aims to compare audio track features from the top 6 artists from the 1990s era, and analyze it in a way to see if there are any factors that may contribute to their success. The six artists selected were based off of grossing sales of their most successful album in the 90s. For comparison purposes, I selected 3 female and 3 male artists/ groups. The artists up for analysis are: Alanis Morissette, Celine Dion, Whitney Houston, Metallica, Michael Jackson, and Nirvana. The audio features I chose to analyze are valence, energy, key, and loudness. I hypothesize that there will be a notable difference between the male and female artists for all the features.
library(tidyverse)
library(spotifyr)
library(ggthemes)
library(ggridges)
I created a developer account with Spotify in order to access their web API. Below is the credentials needed to authenticate access to acquire the variables and data used for this project.
Sys.setenv(SPOTIFY_CLIENT_ID = 'ae11c8fe44c243b7aacca7944f9fd433')
Sys.setenv(SPOTIFY_CLIENT_SECRET = 'b4c9959e02384297a84e8e2f04612908')
After acquiring a Spotify Client ID, I gathered the data from each artist individually, then combined all of the data into one singular dataframe.
AlanisMorissette <- get_artist_audio_features('Alanis Morissette')
CelineDion <- get_artist_audio_features('Celine Dion')
WhitneyHouston <- get_artist_audio_features('Whitney Houston')
Metallica <- get_artist_audio_features('Metallica')
MichaelJackson <- get_artist_audio_features('Michael Jackson')
Nirvana <- get_artist_audio_features('Nirvana')
data <- rbind(AlanisMorissette, CelineDion)
data <- rbind(data, WhitneyHouston)
data <- rbind(data, Metallica)
data <- rbind(data, MichaelJackson)
data <- rbind(data, Nirvana)
In the graph below, the average valence is shown for each artist. Valence is most similar to positivity levels in a sentiment analysis, except Spotify utilizes their own algorithm to determine, which considers more than textual content. Valence is defined by Spotify as, “a measure from 0.0 to 1.0 describing the musical positiveness conveyed by a track. Tracks with high valence sound more positive (e.g. happy, cheerful, euphoric), while tracks with low valence sound more negative (e.g. sad, depressed, angry).” In the chart below, we see that Michael Jackson tops the chart by a significant margin. This makes sense, with the understanding that Jackson has created a plethora of sing-along anthems for his listeners. It’s interesting to see the variance of gender throughout the list. There seems to be no influence based off of gender in terms of valence.
data %>%
group_by(artist_name) %>%
summarise(pos = mean(valence)) %>%
ggplot(aes(pos, reorder(artist_name, pos))) + geom_col() + theme_economist()
In this chart, the average energy levels are shown, sorted in ranking of most energetic to least energetic per artist. Spotify defines energy as, “a measure from 0.0 to 1.0 and represents a perceptual measure of intensity and activity.” Here we can see the all male group, Metallica, top the charts with the most energy, followed by the other two male artists, Nirvana and Michael Jackson. This is quite interesting, as Metallica scored the lowest amount of valence in their music. The three males groups notably scored higher energy scores than the females in this.
data %>%
group_by(artist_name) %>%
summarise(Energy = mean(energy)) %>%
ggplot(aes(Energy, reorder(artist_name, Energy))) + geom_col() + theme_economist()
This chart explores the average key the songs were made in. The feature key is defined by Spotify as “the key the track is in.” It uses standard Pitch Class Notation. In this visualization, all artists seem to average their key around the numbers 4 through 6. In Pitch Class Notation, this is E to F#, or F-sharp. It is quite interesting that average key of all factors, is the one consistent factor thus far that all artists share.
data %>%
group_by(artist_name) %>%
summarise(key = mean(key)) %>%
ggplot(aes(key, reorder(artist_name, key))) + geom_col() + theme_economist()
The graph below shows the average loudness per artist. According to Spotify, loudness is “the overall loudness of a track in decibels (dB). Loudness values are averaged across the entire track and are useful for comparing relative loudness of tracks. The all male group Nirvana takes the top spot, with an average loudness, with an average of approximately -6.5 dB. Followed by Nirvana, are the other two male artists. This is quite interesting. This follows a similar relationship to the energy feature. Though, the margins are quite minor between each artist consecutively in the ranking.
data %>%
group_by(artist_name) %>%
summarise(loudness = mean(loudness)) %>%
ggplot(aes(loudness, reorder(artist_name, loudness))) + geom_col() + theme_economist()
Next, I utilized ggridges in order to create visualizations that displayed the distribution curves of the data. While we previously looked at strictly mean values in the graphs above, this visuals allow us to see a more accurate distribution of the given data.
This valence distribution chart shows some fascinating aspects of the collection of music from these artists. Michael Jackson’s distribution curve appears to be negatively skewed (left skew). Contextually, it indicates that most of his music is pretty positive, as it is closer to 1. He does have a decent amount of less positive music, which can be seen in the distribution. Most other artists have symmetrical distributions that centralize close to zero (Nirvana, Alanis Morissette), or are skewed positively (right skew) (Celine Dion, Metallica, Whitney Houston).
data %>%
ggplot(aes(valence, artist_name)) + geom_density_ridges()
## Picking joint bandwidth of 0.0554
This energy distribution chart highlights the extremely skewed distributions of Nirvana and Metallica’s music, which is negatively skewed. It’s fasinating to see how their music is predominantly dedicated to high energy. Knowing that these are rock/metal groups, it does make sense contextually, yet it is still interesting to see the difference between those two versus the other artists.
data %>%
ggplot(aes(energy, artist_name)) + geom_density_ridges()
## Picking joint bandwidth of 0.0492
The key distribution visualization is similar to the previous chart with the keys as well. All of the artists seem to explore many different keys in their music, which makes for a interesting distribution spread.
data %>%
ggplot(aes(key, artist_name)) + geom_density_ridges()
## Picking joint bandwidth of 0.898
In this loudness distribution chart, it’s apparent that most music created lies in a certain margin. I believe most of this is due to the technical standard of volume levels, so that it can be optimized for listening. All of these curves seem pretty similar, but I am curious to see why there are dips at specific points, like in Michael Jackson’s curve.
data %>%
ggplot(aes(loudness, artist_name)) + geom_density_ridges()
## Picking joint bandwidth of 0.762
In conclusion, it was fascinating to utilize Spotify API to gather information on the top artists of the 90s. My hypothesis on male vs female positivity proved to be false, however there was some similarities in genders throughout some of the features, like energy and loudness. This may be in result of the genres of the artists, as Nirvana and Metallica classify themselves in rock and metal. However, more research and data is needed to support this correlation.