Introduction

Let’s take a look at (again) the queen of pop music today - Ariana Grande. As noted in a previous report, she has shown courage in the face of trauma, tragedy and tabloid snipping to become the voice of a mass movement towards infectious optimism. In my previous report, I analyzed five of her albums and proved the hypothesis that the afinn values specifically for Ariana Grande’s album lyrics will be more extreme (-5, -4, -3, 3, 4, 5) for later albums than earlier ones.

To build off of this conclusion, I further expanded my hypothesis with the aid of diving into the Spotify API. For this project, I hypothesize that the frequent words in Ariana Grande’s later albums directly correlate with overall valence measured in Spotify, among other measurements. For this project again, I relied on lyrical data presented by the following albums:

  1. Yours Truly (2013)
  2. My Everything (2014)
  3. Dangerous Woman (2016)
  4. Sweetener (2018)
  5. Thank U, Next (2019)

In each of the albums, I presented my code and findings in an organized way and broken down by sections: Sentiment, Word Frequency, Word Count and Density. At the end of the report, you will find an comprehensive analysis that I have completed on both valence and danceability.

Process

I began compiling data for my analysis by loading the following packages:

I then compiled the lyric data from the genius package per album:

library(tidytext)
library(tidyverse)
library(genius)
library(wordcloud2)

Ariana_Albums <- tribble(
  ~ artist, ~ title,
  "Ariana Grande", "Yours Truly",
  "Ariana Grande", "My Everything",
  "Ariana Grande", "Dangerous Woman",
  "Ariana Grande", "Sweetener",
  "Ariana Grande", "Thank U, Next")

Ariana_song_lyrics <- Ariana_Albums %>% 
  add_genius(artist, title, type = "album")

To be able to load and observe the data presented from the Spotify API as a developer, I used a Spotify Client ID from my developer account, as well as a Secret Client ID. Following this, I was able to download lyric data directly from Spotify.

## Error in get_spotify_access_token(): could not find function "get_spotify_access_token"

Yours Truly

Yours Truly is Ariana Grande’s debut studio album. It was released on September 3, 2013, by Republic Records. Incorporating R&B as its main genre, Yours Truly was influenced by the music of Whitney Houston, Amy Winehouse, Christina Aguilera, Mariah Carey and more.

Sentiment

I decided to incorporate the bing sentiment analysis that I included from my previous report. The bing sentiment scale analyzes the positive and negative words. I completed this process for each of the five albums.

library(genius)
yourstruly <- genius_album(artist = "Ariana Grande", album = "Yours Truly")

yourstruly %>% 
  unnest_tokens(word, lyric) %>% 
  anti_join(stop_words) %>% 
  inner_join(get_sentiments("bing")) -> yourstruly_words

yourstruly_words %>% 
  count(word, sentiment, sort = TRUE) %>% 
  arrange(desc(n)) %>% 
  head(10) %>% 
  ggplot(aes(reorder(word, n), n, fill = sentiment)) +
  geom_col() + coord_flip() +
  ggtitle("Bing Analysis - Yours Truly") +
  xlab("Word") +
  ylab("Number of Instances")

As you can see from the visualization, the word that was both utilized the most in Yours Truly and has the most positive sentiment is “love.” In terms of instances used, negatively sentiment words have appeared more than positive.

Word Frequency

Beginning my analysis in my previous report, I utilized wordclouds to present text data in a simple and clear format. Ontop of being easy and quick to understand, wordclouds also show the frequency of words in a document by varying the size of words in a visualization. I completed the following coding process for each album:

devtools::install_github("gaospecial/wordcloud2")
library(wordcloud2)

Ariana_song_lyrics %>% 
  unnest_tokens(word, lyric) %>% 
  anti_join(stop_words) %>% 
  filter(title %in% "Yours Truly") %>% 
  filter(!word %in% "ooh") %>% 
  count(word, sort = TRUE) -> ArianaWordsAlbum1

ArianaWordsAlbum1 %>%
  wordcloud2(shape = 'circle')

While wordclouds analyze word frequency by making more frequent words larger in the vizualtion, I thought it would be beneficial to also include a table that showcases the most frequent words based on the number of instances they are utilized in the album. After loading the knitr package, I completed the following coding process for each album:

library(knitr)
ArianaWordsAlbum1 %>% 
  head(10) %>% 
  kable()
word n
love 161
baby 111
feel 49
heart 44
hands 42
hard 42
boy 41
wanna 34
lovin 27
te 25

In the Yours Truly album, the most frequent words include “love,” baby," “feel,” “heart,” “hands” and more.

Word Count

After diving into word frequency measurements, I thought it would be interesting to include the total word count per album. Because this report is heavily influenced by frequency, it is wise to include information pertaining to word count. Word count and frequency may be directly related.

ArianaWordsAlbum1 %>% 
  count()
## # A tibble: 1 x 1
##       n
##   <int>
## 1   462

Yours Truly has a total of 462 words. I completed this process for each album.

Density

Density is a new term that is directly linked in the analysis using the Spotify API specifically. Density refers to the number of musical elements that exist within a given segment of music. The following code I have completed for each album:

library(ggridges)
library(ggthemes)
library(textdata)

yourstruly_words %>% 
  count(word, sentiment, sort = TRUE) %>% 
  arrange(desc(n)) %>% 
  head(10) %>% 
  ggplot(aes(x = n, y = as.factor(sentiment))) + 
  geom_density_ridges() +
  ggtitle("Density - Yours Truly") +
  xlab("Number of Instances") +
  ylab("Sentiment")

So What?

Yours Truly is the first of Ariana Grande’s albums. While the album is comprised of a mixture of positive and negative words, the number of occuring instances of positive words help characterize the album as predominantly positive. The wordcloud visualization for this album is rather small, which means that there may be less words or less of a variety of words present in the album. The density visualization reveals that the album has both positive and negative sentiment; however, Yours Truly can be perceived as more positive overall. The density plot here is quite variable and split between positive and negative sentiment, which is consistent with what I have observed when analyzing sentiment.

My Everything

My Everything is a pop-R&B album. It revists the ’90’s retro-R&B style present in Grande’s debut album Yours Truly. The album’s tracks include EDM, hip hop tunes and piano-driven ballads.

Sentiment

This visualization reveals to me that the word “love,” yet again, carries the album in most highest positive sentiment. What is interesting to me is that this album is pretty equally split between positive and negatively sentimented words, as opposed to the original album in which there were more instances of negatively sentimented words. What’s more, the words that are in the chart are also being utilized more in the album.

Word Frequency

word n
baby 91
love 71
heart 47
yeah 47
bang 41
hands 39
gotta 37
harder 32
bit 29
time 26

This wordcloud is much larger in size than the previous wordcloud, which leads me to believe that there is more varied language utilized in this album. As displayed in the table, the most frequent words include “baby,” “love,” “heart,” “yeah” and more. “Baby,” “love” and “heart” were also more frequent in Yours Truly.

Word Count

## # A tibble: 1 x 1
##       n
##   <int>
## 1   476

Since the worldcloud of this album is larger in size than that of Yours Truly, I also interpreted this as the album itself having more words. This finding is accurate, as My Everything has 14 more words than Yours Truly, with 476 total words.

Density

This density visualization is more variable than that of Yours Truly. While the figure still has that spike in negative sentiment, there is more variability in the positive sentiment throughout. There are also more overall instances of positive sentiment in a musical sense.

So What?

The increase in overall word count and word frequency in this second album directly correlates with the musical density as well. This album is also more variable and unique in terms of the way the music feels. As previously stated, My Everything includes EDM, hip hop tunes and piano-driven ballads witin the different tracks of the album. The fact that, in a musical sense, the inspiration, genre and vibe is uniquely spread out across the album may have influence in density, and furthermore, in valence.

Dangerous Woman

Dangerous Woman is the third studio album by Grande, released on May 20, 2016. The album features guest appearances from Nicki Minaj, Lil Wayne, Macy Gray and Future. What was interesting to me is that this album was Grande’s first to not hit number-one in the country.

Sentiment

According to the bing analysis, the top words in Dangerous Woman correlate to a more negatively sentimented album, as eight of the ten words pictured in the visualization are negative. The top words in the sentiment analysis include words like “love,” “greedy,” “bad,” “woo” and “dangerous.” What has been consistent so far is the presence and recurring instances of the word “love” in the bing analysis. What is unique to this album is the incorporation of the word “greedy.” This word did not show up in either of the other bing analyses for Yours Truly and My Everything.

Word Frequency

word n
love 110
baby 85
yeah 82
boy 63
focus 55
greedy 55
day 52
somethin 44
gonna 43
touch 38

The wordcloud shows that there are various words that are being used heavily across the tracks in the album. There are also even more smaller words that make up the wordcloud itself. The table reveals that this is actually the least, out of Yours Truly, My Everything and now Dangerous Woman, that the word “love” has been used. This has allowed me to conclude that the wealth, or overall language, is being spread across different words and is creating more consistency.

Word Count

## # A tibble: 1 x 1
##       n
##   <int>
## 1   442

While Dangerous Woman has fewer overall words than My Everything and Yours Truly with only 442 total words, it is evident that the more frequent words have carried the overall tone of the album.

Density

The density visualization reveals merely a very shallow curve in negative sentiment. This is fascinating because Dangerous Woman has less words overall, being used in more instances, which in turn creates less variability. While there are more overall frequently-used words, this does not seem to correlate with overall variability in density. This has actually created more consistency in density over the course of the album.

So What?

Dangerous Woman represents a turning point to me. It is intriguing to me to analyze that there is not much variability in this album, data-speaking. Furthermore, this album specifically was the only one to not hit number-one in the country. Could this be because of its more repetitive language and words used in the tracks? Potentially. However, there are several gueset appearances in some of Grande’s tracks. Could this relate? What has maintained true is the notion that the frequently-used words continue to influence the sentiment of the album.

Sweetener

Sweetener is pop-R&B trap record that includes elements of house, funk, neo soul and hip hop music on its beats and productions. The melodies and harmonies on the album are diverse and include uptempo songs and a variety of different downtempo, sentimental ballads.

Sentiment

The bing sentiment analysis reveals that the album is full of predominantly negative words. The upper portion of the chart is primarily negatively sentimeted, with words like “stole,” “bum” and “darkness.” Words like “happy” and “love” still speak to a feeling of positivity, but the album appears to be dominated by more negative words. This makes sense since this album specifically has a very diverse set of uptempo and downtempo songs.

Word Frequency

word n
breathin 32
stole 32
feel 29
light 28
pickin 27
boy 25
speak 25
surprise 25
wake 25
happy 24

This wordcloud is one of the largest. It is clear which words are the most frequent. Words like “breathin,” “stole,” “feel,” “light” and more are of the larger-sized words. This wordcloud is definitely the most varied, as it is comprised of more smaller-sized words than larger. This leads me to believe that the most frequently-used words are not being used as much as in previous albums. This suggestion is consistent, as “breathin” is only used 32 times. So, there are more different words being used in this album overall.

Word Count

## # A tibble: 1 x 1
##       n
##   <int>
## 1   452

The word count in this album is 452 words. This is a medium-sized album. While Dangerous Woman does not have the most words, it does not have the least.

Density

It makes sense that Sweetener has extreme variability in terms of density. The album is comprised of very upbeat and downbeat songs, which give off a different tone and feel. The album also includes influence from other genres of music besides pop, as previously stated.

So What?

The more frequently-used words in this album have both a positive and negative feel, as the density of Sweetener proves that the album musically has positive and negative essence. The number of instances observed of each of the most frequently-used words is pretty consistent throughout, with instances of between 24 and 32 being present. The album is not as drastically driven or swayed by outliers, or words that heavily carry the overall word frequency. Thus, this album has more overall varied frequency.

Thank U, Next

Ariana’s final and most recent album, Thank U, Next was released six months after Sweeter. Thank U, Next was created in the midst of Grande’s personal struggles and traumatic experiences, including the death of ex-boyfriend Mac Miller and her separation from then-fiancé Pete Davidson. Thank U, Next is primarily a pop album, incorporating R&B and trap.

Sentiment

Thank U, Next sways to be more negative on the bing sentiment scale. Negative words like “shift,” “ruin,” “pain,” “needy” and “falling,” while used in fewer instances than “love” or “woo,” help characterize the album as such. While this album is primarily negative, the word “love” is both positive and utilized the most. As this album describes some of Grande’s more personal experiences, like her breakup with Pete Davidson, and her coping process after the tragic death of ex-boyfriend Mac Miller, it is explanatory why this word would be used, as it has been in all of her other albums.

Word Frequency

word n
love 41
girlfriend 31
imagine 30
forget 27
space 24
bad 19
time 19
smile 17
fake 16
idea 16

What I noticed in the wordcloud for this last album is that there is a lot of variaility in size of the words across the board of the visualization itself. While words like “love,” girlfriend," “imagine,” “forget,” “space” and more are by far the largest in size and most frequent, there are other words, while smaller in size, that have occurred more than once throughout the album. In addition, the overall size of the wordcloud is larger like that of Sweetener.

Word Count

## # A tibble: 1 x 1
##       n
##   <int>
## 1   420

While the wordcloud for the Thank U, Next album is the largest, this does not always positively correlate with word count in a given album. This album actually has the least word count of 420 total words. This leads me to believe that the words that are being used in the album are being used more frequently to develop the tracks and sentiment, and that is what relates to variability.

Density

The density of Thank U, Next maintains the same idea I have presented in the word count analysis. There are numerous words being used, frequently and infrequently, in big and small instances. This correlates with overall variability. The density visualization adheres to the conclusion that - in a musical sense - Thank U, Next has a positive and negative sound and varies across the board.

So What?

In Grande’s Thank U, Next album, I learned that word count does not always correlate with word frequency. However, word frequency correlates with overall variety and variance in positive and negative sentiment, in terms of the bing analysis and density analysis.

Valence

Valence, included in the Spotify API anaylsis as a developer, is a measure from 0.0 to 1.0 that describes the musical positiveness conveyed by a track. Tracks with high valence sound more positive (i.e. happy, cheerful, upbeat, euphoric), while tracks with low valence sound more negative (i.e. sad, depressed, angry).

Initially, I coded to display the mean valence per track and then graph this visually. However, the valence was not being shown in the other albums within the chart, and the graph shows two other albums that I chose not to analyze.

library(spotifyr)
library(knitr)
library(textdata)

arianagrande <- get_artist_audio_features('Ariana Grande')

arianagrande %>% 
  arrange(-valence) %>% 
  group_by(album_name) %>% 
  select(track_name, valence) %>% 
  head(10) %>% 
  kable()
album_name track_name valence
Yours Truly You’ll Never Know 0.869
Yours Truly You’ll Never Know 0.869
Yours Truly You’ll Never Know 0.869
Yours Truly The Way 0.862
Yours Truly The Way 0.861
Yours Truly The Way 0.861
Sweetener blazed (feat. Pharrell Williams) 0.855
Sweetener blazed (feat. Pharrell Williams) 0.854
Dangerous Woman Greedy 0.844
Dangerous Woman Greedy 0.844
ggplot(arianagrande,aes(x=valence, y=album_name)) +
  geom_density_ridges()

I then edited my code to show a table of valence per album that I chose to analyze, and only those albums.

library(spotifyr)
library(knitr)
library(textdata)

arianagrande %>% 
  filter(album_name %in% c("Yours Truly", "My Everything", "Dangerous Woman", "Sweetener", "thank u, next")) %>% 
  group_by(album_name) %>% 
  summarise(mean(valence)) %>% 
  arrange(desc(`mean(valence)`)) %>% 
  kable()
album_name mean(valence)
Yours Truly 0.5527692
Sweetener 0.4638833
Dangerous Woman 0.4449765
thank u, next 0.3863750
My Everything 0.3486737

According to the table, Yours Truly has the most valence. It is interesting because this album specifically is rather close to a mean valence value of 0.5, which is right between musically negativeness and positiveness. Yours Truly, while it has the highest overall mean valence across the five albums, has a medium level of valence as a whole. What’s more, the wordcloud for Yours Truly happened to be the smallest, with few words like “love” and “baby” carrying majority of the frequently-used words, with counts of 161 and 111 instances of usage. The fact that these two words were used in the most concentrated way is consistent with my hypothesis that the frequent words directly correlate with overall valence. This album also had the second highest word count and a rather steep display of density on both a positive and negative scale.

It makes sense that Thank U, Next is lower on the valence scale, since the album was written following some tragic, heartbreaking experiences.

Danceability

Danceability is also an interesting measure associated with the Spotify API. Danceability describes how suitable a track is for dancing based on a combination of musical elements including tempo, rhythm stability, beat strength and overall regularity. A value of 0.0 is least danceable, and 1.0 is most danceable.

I first coded danceability based on track name and then across all of the albums. It was difficult to analyze the danceability of each track, as I had to look up which track belonged to which album. Furthermore, the first danceability graph showed data for the two albums again that I did not wish to analyze further.

library(spotifyr)
library(knitr)
library(textdata)

arianagrande <- get_artist_audio_features('Ariana Grande')

arianagrande %>% 
  arrange(-danceability) %>% 
  select(track_name, danceability) %>% 
  head(10) %>% 
  kable()
track_name danceability
the light is coming (feat. Nicki Minaj) 0.877
the light is coming (feat. Nicki Minaj) 0.876
Love Me Harder - Kassiano Remix 0.860
successful 0.848
bad idea 0.847
successful 0.847
bad idea 0.847
The Way - Sidney Samson Remix 0.832
R.E.M 0.831
R.E.M 0.831
ggplot(arianagrande, aes(x=danceability, y=album_name)) +
  geom_density_ridges()

I then recoded for danceability to be displayed in a table across the five albums that I was analyzing to get the following results:

library(spotifyr)
library(knitr)
library(textdata)

arianagrande %>% 
  filter(album_name %in% c("Yours Truly", "My Everything", "Dangerous Woman", "Sweetener", "thank u, next")) %>% 
  group_by(album_name) %>% 
  summarise(mean(danceability)) %>% 
  arrange(desc(`mean(danceability)`)) %>% 
  kable()
album_name mean(danceability)
Sweetener 0.6804000
thank u, next 0.6616250
Yours Truly 0.6379487
Dangerous Woman 0.5954082
My Everything 0.5770877

What is interesting to me is that, across all of the albums, Grande’s music is described as moderately to more danceable overall, as the danceability values across the five albums was higher than 0.5.

The album with the highest level of danceability is Sweetener, with a danceability of 0.6804. Sweetener had a very large wordcloud visualization, with more of a variety of frequently-used words. Even words that were less frequent were not necessarily the smallest. Sweetener also had musical influences from genres like house, funk, neo soul and hip hop to help create uptempo and downtempo songs.

Conclusion

Overall, my hypothesis was proven correct. While sentiment and word count may not directly correlate with valence and danceability, they are important measures that help characterize music albums. The word frequency is clear to have influence in valence and danceability, as previously discussed. Ariana Grande’s music, overall, is unique in tone, genre, sentiment and key, which gives aid to aspects like valence and danceability.