R Markdown

Introduction

For my text analysis project, I wanted to analyze Taylor Swift’s music over the years, across her nine albums. For this project, I chose to only analyze her original record albums, “Taylor Swift”, “Fearless”, “Speak Now”, “Red”, “1989”, “Reputation”, “Lover”, “Folklore”, and “Evermore.” All deluxe albums and songs are not included in this analysis, nor are any of her re-recorded albums. I obtained this dataset from https://www.kaggle.com/deepshah16/song-lyrics-dataset, and manually cleaned the data of any music outside of the nine albums listed above.

Going into this analysis, I predict to find a theme revolving around love and relationships. However, I am curious to see how each album differs (or is similar) to one another as she has gone through many phases over the years. Because of this, I predict that each album will be distinct from the others and have its own sentiment and most common words. I also predict that as time goes on and as we progress through her discography, each album will be decreasingly about love as she has become a strong advocate for female empowerment and storytelling.

library(tidyverse)
library(tidytext)
library(wordcloud2)
library(readxl)
tswizzle <- read_excel("Desktop/tswizzle.xlsx")
View(tswizzle)
tswizzle %>% 
  unnest_tokens(word, Lyric)  %>% 
  anti_join(stop_words) ->Cleaned_Taylor
## Joining, by = "word"

Overview

After importing my dataset and cleaning the data, I first wanted to look at Taylor Swift’s music as a whole to get a general idea of the main themes in her music.

Cleaned_Taylor %>% 
  count(word, sort = TRUE) %>%
  filter(!word %in% "taylor") %>% 
  filter(!word %in% "swift") %>% 
  head(10) %>% 
  ggplot(aes(x = reorder(word, n), y = n, fill=word)) + geom_col() +
  coord_flip() +
  ggtitle("Taylor Swift's Top 10 Most Used Words")

Cleaned_Taylor %>% 
  count(word) %>% 
  filter(!word %in% "taylor") %>% 
  filter(!word %in% "swift") %>% 
  wordcloud2()
word n
love 236
time 218
wanna 138
pre 133
baby 132
yeah 127
stay 110
gonna 97
ooh 94
night 88
bad 82
shake 81
call 80
ohoh 80
eyes 71
home 67
feel 61
girl 61
break 59
remember 54
Cleaned_Taylor %>% 
  filter(!word %in% "taylor") %>% 
  filter(!word %in% "swift") %>% 
  inner_join(get_sentiments('nrc')) %>% 
  count(word, sentiment, sort = TRUE) %>% 
  head(10) %>% 
  ggplot(aes(word, n, fill = sentiment)) + geom_col() +
  ggtitle("Taylor Swift's Sentiment")
## Joining, by = "word"

From these visualizations, we can tell that my hypothesis was correct that Taylor Swift commonly speaks about love. Love was her most commonly used word and was seen a total of 236 times over the course of her nine albums. Going into the breakdown of each album, I expect love to be seen in the list of top words for every album, but in decreasing proportions as her discography continues due to changes in her songwriting. For each album I am first going to find the 10 most popular words in the album. This will help me understand the themes of the songs and what each album was about. Next, I am going to generate a word cloud to determine more popular lyrics and gain a deeper understanding of common themes and words. Lastly, I am going to create a graph to look at the overall sentiment of the album. These will all be used to compare each album to each other and gain better insight into her song lyrics and themes over the course of her career.

Taylor Swift

Cleaned_Taylor %>% 
  filter(Album %in% "Taylor Swift") %>% 
  count(word, sort = TRUE) %>% 
  head(10) %>% 
  ggplot(aes(reorder(word, n), n, fill=word)) + geom_col() +
  ggtitle("Taylor Swift Top 10 Words") +
  coord_flip()

Here, we can see that the word love is seen as the fifth most popular word on Taylor Swift’s debut album titled, “Taylor Swift.” We see other love and hopeful words like “heart” and “hope.” From this, we can determine that Taylor Swift began singing about love and boys from early in her career.

Cleaned_Taylor %>% 
  filter(Album %in% "Taylor Swift") %>% 
  count(word, sort = TRUE) %>% 
  filter(!word %in% "taylor") %>% 
  filter(!word %in% "swift") %>% 
  wordcloud2()
word n
wanna 21
beautiful 20
should’ve 20
song 18
love 16
hope 13
eyes 12
break 11
heart 11
girl 10
home 10
pre 10
perfectly 9
car 8
drew 8
late 8
wishing 8
baby 7
light 7
night 7
Cleaned_Taylor %>% 
  filter(Album %in% "Taylor Swift") %>% 
  filter(!word %in% "taylor") %>% 
  filter(!word %in% "swift") %>% 
  inner_join(get_sentiments('nrc')) %>% 
  count(word, sentiment, sort = TRUE) %>% 
  head(10) %>% 
  ggplot(aes(word, n, fill = sentiment)) + geom_col() +
  ggtitle("Taylor Swift Album Sentiment")
## Joining, by = "word"

Taylor Swift’s debut album “Taylor Swift” is very positive overall. This is reflected in the fact that the sentiments of the album in relation to her top 10 most used words were anticipation, joy, positive, surprise, and trust. All of these words have positive connotations associated with them.

Fearless

Cleaned_Taylor %>% 
  filter(Album %in% "Fearless") %>% 
  count(word, sort = TRUE) %>% 
  filter(!word %in% "taylor") %>% 
  filter(!word %in% "swift") %>% 
  filter(!word %in% "colbie") %>% 
  filter(!word %in% "caillat") %>% 
  head(10) %>% 
  ggplot(aes(reorder(word, n), n, fill=word)) + geom_col() +
  ggtitle("Fearless Top 10 Words") +
  coord_flip()

Swift’s second album, Fearless, also commonly uses love and other positive sentiment words. This is shown by the fact that “love”, “loved”, “feeling”, “feel”, and “baby” were all seen in the top 10 words of the album.

Cleaned_Taylor %>% 
  filter(Album %in% "Fearless") %>% 
  count(word, sort = TRUE) %>% 
   filter(!word %in% "taylor") %>% 
  filter(!word %in% "swift") %>% 
  filter(!word %in% "colbie") %>% 
  filter(!word %in% "caillat") %>% 
  wordcloud2()
word n
la 26
feel 22
baby 16
love 16
time 15
feeling 13
belong 12
rains 12
loved 10
run 10
waiting 10
wanna 10
fearless 9
fifteen 9
gonna 9
night 9
pre 9
town 9
day 8
easy 8
Cleaned_Taylor %>% 
  filter(Album %in% "Fearless") %>% 
  filter(!word %in% "taylor") %>% 
  filter(!word %in% "swift") %>% 
  filter(!word %in% "colbie") %>% 
  filter(!word %in% "caillat") %>% 
  inner_join(get_sentiments('nrc')) %>% 
  count(word, sentiment, sort = TRUE) %>% 
  head(10) %>% 
  ggplot(aes(word, n, fill = sentiment)) + geom_col() +
  ggtitle("Fearless Sentiment") 
## Joining, by = "word"

Her second album, Fearless, reflects a similar sentiment to her debut album. The sentiment words attached to this album are anger, anticipation, disgust, fear, joy, and positive. The majority of these words were positive, however both disgust and anger, two words of negative sentiment, were not seen on her previous album.

Speak Now

Cleaned_Taylor %>% 
  filter(Album %in% "Speak Now") %>% 
  count(word, sort = TRUE) %>% 
  head(10) %>% 
  ggplot(aes(reorder(word, n), n, fill=word)) + geom_col()+
  ggtitle("Speak Now Top 10 Words") +
  coord_flip()

Speak Now does not have the word love in the top 10 most commonly used words on the album. Instead, there is a stronger theme of time as reflected by the fact that the words “live”, “grow”, “time” and “remember” are all seen in the top 10. This shows that this album does not fit the stereotype that all Taylor Swift music is about love.

Cleaned_Taylor %>% 
  filter(Album %in% "Speak Now") %>% 
  count(word, sort = TRUE) %>% 
   filter(!word %in% "taylor") %>% 
  filter(!word %in% "swift") %>% 
  wordcloud2()
word n
time 27
grow 21
meet 17
mind 17
pre 16
gonna 13
live 13
night 13
remember 13
eyes 12
forever 12
love 12
lights 11
yeah 11
december 10
life 10
someday 10
smile 9
sparks 9
wait 9
Cleaned_Taylor %>% 
  filter(Album %in% "Speak Now") %>% 
  inner_join(get_sentiments('nrc')) %>% 
  count(word, sentiment, sort = TRUE) %>% 
  head(10) %>% 
  ggplot(aes(word, n, fill = sentiment)) + geom_col() +
  ggtitle("Speak Now Sentiment") 
## Joining, by = "word"

This album is also very overwhelmingly positive. The sentiment words associated with this album are anticipation, joy, positive, surprise and trust. This is similar to her first album “Taylor Swift” as the same words were used to describe that album as well.

Red

Cleaned_Taylor %>% 
  filter(Album %in% "Red") %>% 
  count(word, sort = TRUE) %>% 
  filter(!word %in% "taylor") %>% 
  filter(!word %in% "swift") %>% 
  head(10) %>% 
  ggplot(aes(reorder(word, n), n, fill=word)) + geom_col() +
  ggtitle("Red Top 10 Words") +
  coord_flip()

The album “Red” goes back to the previous theme of Taylor Swift talking about love as seen in her first two albums. However, I think that it is interesting to note that we have yet to see the word “love” as the #1 most common word on any of her albums thus far. On this album we once again see the time of time shown by the words “stay” and “time” being seen in high amounts through this album.

Cleaned_Taylor %>% 
  filter(Album %in% "Red") %>% 
  count(word, sort = TRUE) %>% 
  filter(!word %in% "taylor") %>% 
  filter(!word %in% "swift") %>% 
  filter(!word %in% "ed") %>% 
  filter(!word %in% "sheeran") %>% 
  filter(!word %in% "gary") %>% 
  wordcloud2()
word n
time 66
stay 32
trouble 32
yeah 26
love 24
starlight 22
youre 22
dancing 18
talk 18
night 17
pre 17
red 16
mad 15
wanna 15
beautiful 13
follow 13
lucky 13
bet 12
home 12
loving 12
Cleaned_Taylor %>% 
  filter(Album %in% "Red") %>% 
  filter(!word %in% "taylor") %>% 
  filter(!word %in% "swift") %>% 
  inner_join(get_sentiments('nrc')) %>% 
  count(word, sentiment, sort = TRUE) %>% 
  head(10) %>% 
  ggplot(aes(word, n, fill = sentiment)) + geom_col() +
  ggtitle("Red Sentiment") 
## Joining, by = "word"

“Red” is the most negative album we have seen thus far in analyzing each album. The sentiment words anger, anticipation, disgust, fear, joy, negative, positive and sadness are all seen in relation to this album. This has by far the most negative sentiment words that we have seen yet when looking at individual albums by Taylor Swift.

1989

Cleaned_Taylor %>% 
  filter(Album %in% "1989") %>% 
  count(word, sort = TRUE) %>% 
  head(10) %>% 
  ggplot(aes(reorder(word, n), n, fill=word)) + geom_col() +
  ggtitle("1989 Top 10 Words") +
  coord_flip()

The overwhelmingly most popular word on Taylor Swift’s album 1989 was “shake.” This is not surprising as this album features the song “Shake It Off.” This song is very repetitive and uses the word “shake” very frequently, so this song alone put this word at the top of the list. This album is Taylor’s most romantic that we have seen thus far. The word love is seen over 75 times in this album. The album also features words like “baby” “stay” and “girl” which are frequently associated with themes of love, showing this album talks about this a lot.

Cleaned_Taylor %>% 
  filter(Album %in% "1989") %>% 
  count(word, sort = TRUE) %>% 
  wordcloud2()
word n
shake 78
love 77
woods 39
ohoh 33
stay 33
baby 32
gonna 30
york 30
girl 25
bad 23
hey 22
fake 18
mmm 18
hate 17
blood 16
play 16
break 15
pre 14
finally 13
forever 13
Cleaned_Taylor %>% 
  filter(Album %in% "1989") %>% 
  inner_join(get_sentiments('nrc')) %>% 
  count(word, sentiment, sort = TRUE) %>% 
  head(10) %>% 
  ggplot(aes(word, n, fill = sentiment)) + geom_col() +
  ggtitle("1989 Sentiment")
## Joining, by = "word"

Despite the fact that this album revolves around love quite a bit. There is still quite a bit of negative sentiment associated with the album. The sentiment words anger, disgust, fear, joy, negative, positive, and sadness were all generated through NRC. Of these seven words, only two of them are positive.

Reputation

Cleaned_Taylor %>% 
  filter(Album %in% "reputation") %>% 
  count(word, sort = TRUE) %>% 
  head(10) %>% 
  ggplot(aes(reorder(word, n), n, fill=word)) + geom_col() +
  ggtitle("Reputation Top 10 Words") +
  coord_flip()

Reputation is famously known as Taylor Swift’s most aggressive and negative album of her nine. However, after glancing at the top 10 most used words on the album, it does not appear to be strikingly negative. We see words like “baby” and “time” like we have seen previously. The word “bad” does appear on this list, but I am curious to see what the sentiment of the album looks like.

Cleaned_Taylor %>% 
  filter(Album %in% "reputation") %>% 
  count(word, sort = TRUE) %>% 
  wordcloud2()
word n
call 46
wanna 37
ooh 35
time 34
baby 33
yeah 32
bad 31
pre 27
hands 24
hold 23
car 22
dancing 22
getaway 22
waiting 21
feel 20
ha 20
game 19
whoa 19
love 17
gorgeous 16
Cleaned_Taylor %>% 
  filter(Album %in% "reputation") %>% 
  filter(!word %in% "taylor") %>% 
  filter(!word %in% "swift") %>% 
  inner_join(get_sentiments('nrc')) %>% 
  count(word, sentiment, sort = TRUE) %>% 
  head(10) %>% 
  ggplot(aes(word, n, fill = sentiment)) + geom_col()+
  ggtitle("Reputation Sentiment")
## Joining, by = "word"

Reputation is known as Taylor Swift’s darkest and most different album when compared to the rest of her discography. When looking at the sentiment of the overall album, this can be proven true. The sentiment words associated with the top 10 most commonly used words on the album are anger, anticipation, disgust, fear, joy, negative, positive, and sadness. Of these eight words only three of them (anticipation, joy and positive) are positive words. The overwhelming majority of words on this album are associated with negative connotations.

Lover

Cleaned_Taylor %>% 
  filter(Album %in% "Lover") %>% 
  count(word, sort = TRUE) %>% 
  head(10) %>% 
  ggplot(aes(reorder(word, n), n, fill=word)) + geom_col() +
  ggtitle("Lover Top 10 Words") +
  coord_flip()

Lover is known as Taylor Swift’s most positive and romantic album of her entire discography. This is reflected by the fact that “love” is her top most used word in the album, being used over 40 times. All of the other top 10 words are positive, and nothing overwhelmingly negative stands out from the list.

Cleaned_Taylor %>% 
  filter(Album %in% "Lover") %>% 
  filter(!word %in% "taylor") %>% 
  filter(!word %in% "swift") %>% 
  count(word, sort = TRUE) %>% 
  wordcloud2()
word n
love 44
wanna 42
daylight 40
ohoh 31
baby 29
ooh 25
pre 25
yeah 25
street 23
walk 19
home 18
bad 17
night 17
cornelia 16
bless 15
darling 15
gonna 15
boy 14
ah 13
hate 12
Cleaned_Taylor %>% 
  filter(Album %in% "Lover") %>% 
  filter(!word %in% "taylor") %>% 
  filter(!word %in% "swift") %>% 
  inner_join(get_sentiments('nrc')) %>% 
  count(word, sentiment, sort = TRUE) %>% 
  head(10) %>% 
  ggplot(aes(word, n, fill = sentiment)) + geom_col()+
  ggtitle("Lover Sentiment") 
## Joining, by = "word"

Despite the fact that the album Lover is a positive album, there are still negative sentiments attached to it. The words anger, disgust, fear, negative, and sadness were all generated by the get sentiments NRC function.

Folklore

Cleaned_Taylor %>% 
  filter(Album %in% "folklore") %>% 
  count(word, sort = TRUE) %>% 
  head(10) %>% 
  ggplot(aes(reorder(word, n), n, fill=word)) + geom_col() +
  ggtitle("Folklore Top 10 Words") +
  coord_flip()

Folklore was a very different change of pace for Taylor Swift. Many of the songs revolve around various characters and plot lines, not only of her own life as in previous albums. The album was also written and released during the height of the COVID-19 pandemic quarantine, reflecting a different time of life for everyone. The word “time” is seen the most throughout the album, appearing over 35 times. Other words like “hope”, “heart” and “love” were seen, reflecting a positive sentiment throughout the album.

Cleaned_Taylor %>% 
  filter(Album %in% "folklore") %>% 
  count(word, sort = TRUE) %>% 
  filter(!word %in% "taylor") %>% 
  filter(!word %in% "swift") %>% 
  wordcloud2()
word n
time 38
love 13
mad 13
call 11
hope 11
woman 11
mine 10
heart 9
pulled 9
signs 9
would’ve 9
sign 8
warning 8
watch 8
film 7
marvelous 7
pre 7
summer 7
times 7
town 7
Cleaned_Taylor %>% 
  filter(Album %in% "folklore") %>% 
  filter(!word %in% "taylor") %>% 
  filter(!word %in% "swift") %>% 
  inner_join(get_sentiments('nrc')) %>% 
  count(word, sentiment, sort = TRUE) %>% 
  head(10) %>% 
  ggplot(aes(word, n, fill = sentiment)) + geom_col() +
  ggtitle("Folklore Sentiment") 
## Joining, by = "word"

The sentiment words in relation to the top 10 words of the album are anger, anticipation, disgust, fear, joy, negative, positive, and sadness. Most of these words are associated with a negative connotation. This is not surprising to me as the album was written and produced during a relatively negative and dark time.

Evermore

Cleaned_Taylor %>% 
  filter(Album %in% "evermore") %>% 
  count(word, sort = TRUE) %>% 
  filter(!word %in% "taylor") %>% 
  filter(!word %in% "swift") %>% 
  head(10) %>% 
  ggplot(aes(reorder(word, n), n, fill=word)) + geom_col() +
  ggtitle("Evermore Top 10 Words") +
  coord_flip()

Love is the most commonly seen word on the album “Evermore.” Other words like “died” and “dead” are seen in the top 10 most seen words on the album, reflecting a more morbid sentiment.

Cleaned_Taylor %>% 
  filter(Album %in% "evermore") %>% 
  count(word, sort = TRUE) %>% 
  filter(!word %in% "taylor") %>% 
  filter(!word %in% "swift") %>% 
  wordcloud2()
word n
love 17
died 13
eyes 13
hand 13
post 13
stay 13
time 13
yeah 13
alive 12
dead 12
happiness 12
head 11
leave 11
ooh 11
call 10
closure 10
life 10
begging 9
evermore 9
house 9
Cleaned_Taylor %>% 
  filter(Album %in% "evermore") %>% 
  filter(!word %in% "taylor") %>% 
  filter(!word %in% "swift") %>% 
  inner_join(get_sentiments('nrc')) %>% 
  count(word, sentiment, sort = TRUE) %>% 
  head(10) %>% 
  ggplot(aes(word, n, fill = sentiment)) + geom_col() +
  ggtitle("Evermore Sentiment") 
## Joining, by = "word"

The sentiment words associated with the album Evermore are anticipation, joy, positive, and trust. This is the most positive album we have seen since some of her earlier work. This song was also written during the COVID-19 pandemic, but reflects a lighter sentiment than her previous album, “Folklore.”

Overall Sentiment Analysis

I next wanted to create playlists based on the sentiment associated with each song. I chose to cap each list at 15 songs, as most albums tend to be of this length. So each playlist includes the 15 most positive and 15 most negative songs of hers.

taylorsentiment <- Cleaned_Taylor%>%
  inner_join(get_sentiments("bing"))%>% 
  count(Album, Title, sentiment) %>%
  spread(sentiment, n, fill = 0) %>%
  mutate(sentiment = positive - negative)
## Joining, by = "word"

Positive Sentiment Playlist

taylorsentiment %>% 
  arrange(desc(sentiment)) %>% 
  head(15) %>% 
  knitr::kable()
Album Title negative positive sentiment
1989 This Love 14 61 47
Lover ME! 5 37 32
Lover London Boy 4 35 31
Red The Lucky One 2 23 21
Taylor Swift Stay Beautiful 1 21 20
reputation King of My Heart 5 24 19
Lover I Think He Knows 2 19 17
reputation Delicate 11 28 17
Red Sad Beautiful Tragic 14 30 16
Speak Now Enchanted 4 20 16
Fearless Hey Stephen 2 16 14
evermore ​coney island 12 25 13
reputation End Game 22 33 11
Fearless Breathe 11 21 10
Red Everything Has Changed 3 13 10

Here, we can see that eight out of nine albums are represented on this positive playlist, with Folklore being the only album not included. There is a three way tie between Red, Lover and Reputation for the most songs on the playlist.

Negative Sentiment Playlist

taylorsentiment %>% 
  arrange(sentiment) %>% 
  head(15) %>% 
  knitr::kable()
Album Title negative positive sentiment
1989 Shake It Off 137 0 -137
Red I Knew You Were Trouble. 58 1 -57
Speak Now The Story of Us 34 5 -29
evermore ​marjorie 28 4 -24
folklore ​mad woman 28 5 -23
evermore ​willow 28 6 -22
Lover Miss Americana & The Heartbreak Prince 35 13 -22
reputation So It Goes… 22 1 -21
1989 Bad Blood 28 10 -18
folklore ​hoax 21 3 -18
folklore ​illicit affairs 17 1 -16
Red Stay Stay Stay 24 9 -15
reputation Getaway Car 18 3 -15
Lover Afterglow 20 6 -14
1989 Blank Space 32 19 -13

Here, we can see that Swift’s first two albums Taylor Swift and Fearless are not seen on the list of the songs with the most negative sentiment. This shows that her songs have gotten more negative over time. Also, it should be noted that her song “Shake It Off” is listed as her most negative song, however, while there are negative words in this song, it is all based out of spite and is actually an extremely positive song. Clearly, each playlist is not perfect, but is solely based off of each song’s sentiment score.

Conclusion

My prediction about love being a common theme across all of Taylor Swift’s albums proved to be true. However, in both her albums Speak Now and Reputation the word “love” did not appear on the list of top 10 words. Also, I had initially predicted that the word “love” would become less prevalent over the course of her albums due to her increase in women’s empowerment, but I did not find this to be the case. Even if she does write about female empowerment, love is typically still tied into the story line of the song. I also found it interesting that the concept of time was frequently brought up, however, many listeners fail to mention this fact. Over the years, Taylor Swift has become a vocal advocate for female empowerment and has tried to distance herself from singly solely about love and her breakups as she has been branded as to only sing about such topics. I also thought that because of the fact that she has gone through numerous phases, this would mean all of her albums would be distinctly different from the rest. I did not find this to be true as themes of love and time, as mentioned earlier, were seen in most albums along with other themes.


Also, in some of my code you can see that I manually filtered out the words “taylor” and “swift” among other featured artists on an album. This is because the data set included who sang each line in a song if there were numerous artists. Had I noticed this earlier I would have included these words in the cleaned up data set, however, I caught this later in the process and figured manually fixing it where it was needed would save time.

Future Research

Because of how much music Taylor Swift has released, there is still so much to dive into and the analyses of Taylor Swift’s music is endless. As noted earlier, I did not include any deluxe editions of her albums or their songs, so it would be interesting to see if there is a difference between these songs and songs initially released on an album. Also, due to the fact that she is re-recording her albums one through six, and is including songs from the “vault” that were never put on any album, it would be interesting to see if there are any differences between these songs and songs that were already released. I am also hoping to expand upon my analysis and include data from the Spotify API to see if they match my findings in terms of an album’s sentiment.

References

I took inspiration and code from some of the following sources, however, all predictions, findings and conclusions are my own.
https://rpubs.com/ebunceelon/patda
https://rpubs.com/emilyrogers/albumanalysis