Question and Background

Taylor Swift is an American pop and country music singer-songwriter. She is one of the most successful and well-known artists today. Taylor’s self-titled debut album was released in 2006 and she has since released 8 other albums as well as the re-recordings of two albums, Red and Fearless. She has also released numerous singles, accumulating 206 songs. Taylor Swift has been creating music for over 15 years, so her writing style and music genres have changed from when she first began as an artist. We thought it would be interesting to use data science methods to analyze how Taylor Swift has evolved as an artist across her albums. Our main question is: How has Taylor Swift’s writing style and music evolved over time in the course of her career? In order to explore this, we plan to use clustering and text mining to analyze different aspects of her songs. We will use data from Spotify, found on Spotify’s web api. This data contains audio features of songs that we will use to draw conclusions about Taylor Swift’s music. We then want to do some text analysis on the lyrics of her songs. We think it will be interesting to see the sentiment analysis of Taylor’s lyrics for the different albums as some of them have different themes. We also want to see what specific words are very common among her lyrics across her different albums.

Taylor Swift Album Timeline

Spotify Audio Features

To fully capture Taylor’s evolution we wanted to consider both quantitative (audio features) and qualitative (natural language processing) aspects of her work. We hypothesized that we would see a progression in both her technical sound and the content of her songs as she pivoted from being a more acoustic, country artist to more of a pop artist.

To consider the technical sound aspects we used 11 quantitative audio features provided by Spotify: acousticness, danceability, energy, instrumentalness, key, liveness, loudness, mode, speechiness, tempo, and valence. For more information on these features click here.

Natural Language Processing

To determine how Taylor Swift has evolved lyrically, we will use natural language processing to see the kind of sentiments for each of her 9 albums. Using sentiment analysis with Affin, Bing, and NRC, we hope to see how Taylor Swift has evolved lyrically as well as what common themes are prevalent among her 9 albums.

Initial Exploratory Analysis - Song Metrics and Spotify Features

To consider Taylor’s music evolution we focused our attention on audio features we suspected would have changed the most from album to album: danceability, valence, energy, and length. Below are plots showing the change in the features over different albums.

# Danceability
# Average danceability = 0.5925
# Average valence = 0.4173
# Average energy = 0.5777
# Average length = 235662 milliseconds or 3.9 minutes

song_metrics$Album <- factor(song_metrics$Album,levels = c("Taylor Swift", "Fearless", "Speak Now", "Red", "1989", "Reputation", "Lover", "Folklore", "Evermore"))

# Song length for each album 
length <- ggplot(song_metrics, mapping = aes(x = Album, y = Length)) +
    geom_boxplot(alpha = 0) +
    geom_jitter(alpha = 5, color = "green") + labs(x='Album', y='Length', title="Boxplot of Song Lengths for Albums")
ggplotly(length)

# Song danceability for each album 
danceability <- ggplot(data = song_metrics, mapping = aes(x = Album, y = danceability)) +
    geom_boxplot(alpha = 0) +
    geom_jitter(alpha = 5, color = "tomato") + labs(x='Album', y='Danceability', title="Boxplot of Song Danceability for Albums")
ggplotly(danceability)

# Song energy for each album 
energy <- ggplot(data = song_metrics, mapping = aes(x = Album, y = energy)) +
    geom_boxplot(alpha = 0) +
    geom_jitter(alpha = 5, color = "blue") + labs(x='Album', y='Energy', title="Boxplot of Song Energy for Albums")
ggplotly(energy)

# Song valence for each album 
valence <- ggplot(data = song_metrics, mapping = aes(x = Album, y = valence)) +
    geom_boxplot(alpha = 0) +
    geom_jitter(alpha = 5, color = "orange") + labs(x='Album', y='Valence', title="Boxplot of Song Valence for Albums")
ggplotly(valence)

# Relationships between variables

vl <- ggplot(data = song_metrics, aes(x = valence, y = Length)) +
  geom_point(alpha = 2, aes(color = Album)) + labs(x='Valence', y='Length', title="Scatterplot of Valence vs. Length")
ggplotly(vl)

More positive songs that rank higher in valence are the shortest songs. The longest songs rank the lowest in valence so they have more sentiments that relate more to sadness, anger, etc. Overall, there is a good mix of songs from each album in terms of their valence.

raw <- read_csv(url("https://raw.githubusercontent.com/jesslaudie/DS3001-Final-Project/main/TaylorSwift_real.csv"))

## Rows: 160 Columns: 16

## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (3): ID, Name, Album
## dbl (13): Length, danceability, energy, key, loudness, mode, speechiness, ac...

## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

raw[,c(3:14)] <- lapply(raw[,c(3:14)], normalize)
grouped <- raw %>%
 group_by(Album) %>%
 summarise_if(is.numeric, mean)

df <- grouped %>% select(Release,danceability,valence, energy, Length)%>% gather(key="audio_feature", value = "Value", -Release)

# Progression over time for different audio features
value <- ggplot(df, aes(x = Release, y = Value))+
  geom_line(aes(color= audio_feature, linetype = audio_feature))+
  scale_color_manual(values = c('tomato','orange','blue','green')) + labs(x='Release Year', y='Audio Feature Value', title="Line Plot of Average Audio Features over Time")
ggplotly(value)

There does not seem to be a clear trend for any of the audio features. Some albums have higher values in some features and lower in others. There is a good amount of variation in these values across the years for when her albums were released. It is interesting to see what points (release years) were very different than the other release years for a specific audio feature.

Clustering Analysis

After developing a general sense of how Taylor’s audio features changed over time, we wanted to investigate how similar her songs are through clustering analysis. Since our initial analysis showed very different mean values for danceability, energy, length, and valence versus release dates, we hypothesized that song audio features would result in distinct clusters for each of the 9 albums considered.

Before clustering the data we first tried to determine the optimal number of clusters using the elbow graph below:

explained_variance = function(data_in, k){
  
  # Running the kmeans algorithm.
  set.seed(1)
  kmeans_obj = kmeans(data_in, centers = k, algorithm = "Lloyd", iter.max = 30)
  
  # Variance accounted for by clusters:
  # var_exp = intercluster variance / total variance
  var_exp = kmeans_obj$betweenss / kmeans_obj$totss
  var_exp  
}

# Determining the Optimal Number of Clusters
# HAD TO CHANGE TO 3:14
input1 = audio_features[,c(3:14)]

explained_var = sapply(1:10, explained_variance, data_in = input1)
elbow_data = data.frame(k = 1:10, explained_var)

elbow <- ggplot(elbow_data, 
       aes(x = k,  
           y = explained_var)) + 
  geom_point(size = 4) +           #<- sets the size of the data points
  geom_line(size = 1) +            #<- sets the thickness of the line
  xlab('k') + 
  ylab('Inter-cluster Variance / Total Variance') + 
  theme_light()
ggplotly(elbow)

From the plot we can see that the plot begins to flatten out at k = 3. This is surprising as we had suspected that the data would cluster around the 9 albums. Additionally, we can see that even with 9 centers the clustering still had a relatively low explained variance of a little more than 0.5.

Using 3 Clusters

Using three centers and plotting energy vs. valence, we can see the three distinct clusters. Cluster 1 is characterized by low valence and low energy, Cluster 2 is characterized by high energy and high valence, and Cluster 3 is characterized by lower valence but higher energy than Cluster 1. From the color code we can see that clusters are not indicative of the albums and instead shows that these two qualities are distributed fairly evenly across multiple albums.

Similar to the previous plot, the clustering shows that energy and acousticness features are not album-specific, but distributed across albums fairly evenly. Cluster 1 is characterized by low energy and high acousticness and Clusters 2 and 3 are fairly similar with lower acousticness and medium-high energy. The plot also helps show the non-linear nature of Taylor’s sound. For example, while her second album “Fearless” is highly acoustic, her next album “Speak Now” is on the opposite of the graph. We can determine that Taylor went back to her earlier sound in “Red” which is also clustered high in acousticness with Fearless. This plot shows that Taylor’s albums are not necessarily trending a certain way when it comes to these audio features.

Next we decided to consider how each cluster varied from each other by creating bar charts of the grouped means values:

From the bar graphs, it seems that Acousticness and Release are major factors which distinguish group 1, while valence, energy, and danceability are what distinguish group 2.

From the three center analyses, we can see that clustering using audio features does not seem to be great at distinguishing different albums. To confirm this suspicion we will increase the number of centers to 9.

Using 9 Clusters

Using 9 centers and recreating the graphs from the three center analysis, we can see that it becomes even harder to distinguish the albums from each other. In each cluster we have multiple different albums with very different release dates. Again, we recognize that Taylor’s albums span a variety of different audio features and are not necessarily trending a certain way.

Just like the three center clustering, the nine center clustering emphasizes how similar “Fearless” and “Red” were audio feature-wise. These albums appear to be mainly in Cluster 6. Most importantly, this clustering shows that Taylor manages to vary in both valence and energy across albums regardless of the release date. This is a similar conclusion from previously.

Next the acousticness vs. energy plot was created using the 9 center clustering data. Once again, “Fearless” and “Red” are highly concentrated in cluster 6 which is characterized by lower energy and higher acousticness. We also, however, see a variety of other albums like “1989”, “Taylor Swift”, and “Speak Now” in this cluster just not as often.

While not cluster specific, this plot also shows that her most recent albums (“Reputation”, “Lover”, “Evermore”, and “Folklore”) are lower in acousticness and higher in energy. Additionally, the album “1989” acts almost as a transition album between the two distinct zones, as it spans a bit all over.

Clustering Conclusion

Key takeaways:

Clustering using song audio features was not very insightful for distinguishing albums (also the explained variance was less than 0.6). This is likely because Taylor has used the same producers throughout her career and therefore achieves a similar balance of features across each album. Some albums like “Fearless” and “Red” were slightly easier to distinguish from other albums with clustering.
From clustering, we can see that the sound of Taylor’s earlier albums (“Taylor Swift”, “Fearless”, “Speak Now”, “Red”) had the greatest fluctuation in audio features, jumping between low energy and high acousticness and high energy and low acousticness.
Taylor’s album “1989” had the greatest variance across individual songs (in both clustering graphs) and acted almost as a transition album to her newer works which have concentrated in the medium to higher energy and lower acousticness zone.

Sentiment Analysis

We now want to explore the different sentiments for Taylor Swift’s song lyrics across her 9 albums. It will be interesting to see what albums are identified as positive and negative. We will use methods to identify the range of sentiment, find what words are popular in each album, and determine specific sentiments for each album. It will be interesting to see how Taylor Swift’s sentiments have changed over time between her albums.

# Taylor Swift Album
ts1 <- read_lines(url("https://raw.githubusercontent.com/jesslaudie/DS3001-Final-Project/main/album_lyrics/tswift"))
ts <- tibble(ts1)
ts$ts <- as.character(ts$ts1)
ts <- ts %>%
  unnest_tokens(word, ts)%>%
  anti_join(stop_words)%>% 
  count(word, sort=TRUE)

# Fearless Album
fearless1 <- read_lines(url("https://raw.githubusercontent.com/jesslaudie/DS3001-Final-Project/main/album_lyrics/fearless"))
fearless <- tibble(fearless1)
fearless$fearless <- as.character(fearless$fearless1)
fearless <- fearless %>%
  unnest_tokens(word, fearless)%>%
  anti_join(stop_words)%>% 
  count(word, sort=TRUE)

# Speak Now Album
speak1 <- read_lines(url("https://raw.githubusercontent.com/jesslaudie/DS3001-Final-Project/main/album_lyrics/speak_now")) 
speak <- tibble(speak1)
speak$speak <- as.character(speak$speak1)
speak <- speak %>%
  unnest_tokens(word, speak)%>%
  anti_join(stop_words)%>% 
  count(word, sort=TRUE)

# Red Album
red1 <- read_lines(url("https://raw.githubusercontent.com/jesslaudie/DS3001-Final-Project/main/album_lyrics/red"))
red <- tibble(red1)
red$red <- as.character(red$red1)
red <- red %>%
  unnest_tokens(word, red)%>%
  anti_join(stop_words)%>% 
  count(word, sort=TRUE)

# 1989 Album
nineteen891 <- read_lines(url("https://raw.githubusercontent.com/jesslaudie/DS3001-Final-Project/main/album_lyrics/1989"))
nineteen89 <- tibble(nineteen891)
nineteen89$nineteen89 <- as.character(nineteen89$nineteen891)
nineteen89<- nineteen89 %>%
  unnest_tokens(word, nineteen89)%>%
  anti_join(stop_words)%>% 
  count(word, sort=TRUE)

# Reputation Album
rep1 <- read_lines(url("https://raw.githubusercontent.com/jesslaudie/DS3001-Final-Project/main/album_lyrics/rep"))
rep <- tibble(rep1)
rep$rep <- as.character(rep$rep1)
rep <- rep %>%
  unnest_tokens(word, rep)%>%
  anti_join(stop_words)%>% 
  count(word, sort=TRUE)

# Lover Album
lover1 <- read_lines(url("https://raw.githubusercontent.com/jesslaudie/DS3001-Final-Project/main/album_lyrics/lover"))
lover <- tibble(lover1)
lover$lover <- as.character(lover$lover1)
lover <- lover %>%
  unnest_tokens(word, lover)%>%
  anti_join(stop_words)%>% 
  count(word, sort=TRUE)

# Folklore Album 
folklore1 <- read_lines(url("https://raw.githubusercontent.com/jesslaudie/DS3001-Final-Project/main/album_lyrics/folklore"))
folklore <- tibble(folklore1)
folklore$folklore <- as.character(folklore$folklore1)
folklore <- folklore %>%
  unnest_tokens(word, folklore)%>%
  anti_join(stop_words)%>% 
  count(word, sort=TRUE)

# Evermore Album 
evermore1 <- read_lines(url("https://raw.githubusercontent.com/jesslaudie/DS3001-Final-Project/main/album_lyrics/evermore"))
evermore <- tibble(evermore1)
evermore$evermore <- as.character(evermore$evermore1)
evermore <- evermore %>%
  unnest_tokens(word, evermore)%>%
  anti_join(stop_words)%>% 
  count(word, sort=TRUE)

Sentiment Ranges For Each Album

# Taylor Swift Album
ts_affin <- ts %>%
  inner_join(get_sentiments("afinn"))
tsaffin <- ggplot(data = ts_affin, 
       aes(x=value)
        )+
  geom_histogram(color="seagreen", fill="powderblue", binwidth=1)+
  ggtitle("Taylor Swift Album")+
  theme_minimal()

# Fearless Album
fearless_affin <- fearless %>%
  inner_join(get_sentiments("afinn"))
feaffin <- ggplot(data = fearless_affin, 
       aes(x=value)
        )+
  geom_histogram(color="burlywood4", fill="lightgoldenrod2", binwidth=1)+
  ggtitle("Fearless Album")+
  theme_minimal()

# Speak Now Album
speak_affin <- speak %>%
  inner_join(get_sentiments("afinn"))
spaffin <- ggplot(data = speak_affin, 
       aes(x=value)
        )+
  geom_histogram(color="darkmagenta", fill="deeppink3", binwidth=1)+
  ggtitle("Speak Now Album")+
  theme_minimal()

# Red Album
red_affin <- red %>%
  inner_join(get_sentiments("afinn"))
redaffin <-ggplot(data = red_affin, 
       aes(x=value)
        )+
  geom_histogram(color="red4", fill="indianred", binwidth=1)+
  ggtitle("Red Album")+
  theme_minimal()

# 1989 Album
nineteen89_affin <- nineteen89 %>%
  inner_join(get_sentiments("afinn"))
niaffin <- ggplot(data = nineteen89_affin, 
       aes(x=value)
        )+
  geom_histogram(color="blueviolet", fill="thistle2", binwidth=1)+
  ggtitle("1989 Album")+
  theme_minimal()

# Reputation Album
rep_affin <- rep %>%
  inner_join(get_sentiments("afinn"))
repaffin <- ggplot(data = rep_affin, 
       aes(x=value)
        )+
  geom_histogram(color="gray19", fill="gray82", binwidth=1)+
  ggtitle("Reputation Album")+
  theme_minimal()

# Lover Album
lover_affin <- lover %>%
  inner_join(get_sentiments("afinn"))
loaffin <- ggplot(data = lover_affin, 
       aes(x=value)
        )+
  geom_histogram(color="lightskyblue", fill="pink", binwidth=1)+
  ggtitle("Lover Album")+
  theme_minimal()

# Folklore Album
folklore_affin <- folklore %>%
  inner_join(get_sentiments("afinn"))
foaffin <- ggplot(data = folklore_affin, 
       aes(x=value)
        )+
  geom_histogram(color="gray68", fill="gray93", binwidth=1)+
  ggtitle("Folklore Album")+
  theme_minimal()

# Evermore Album
evermore_affin <- evermore %>%
  inner_join(get_sentiments("afinn"))
evaffin <- ggplot(data = evermore_affin, 
       aes(x=value)
        )+
  geom_histogram(color="coral3", fill="navajowhite3", binwidth=1)+
  ggtitle("Evermore Album")+
  theme_minimal()

grid.arrange(tsaffin, feaffin, spaffin, redaffin, niaffin, repaffin, loaffin, foaffin, evaffin, ncol=3)

Word Clouds

# Taylor Swift 
set.seed(42)
tscloud <- ggplot(ts[1:50,], aes(label = word, size = n, color = n)
       ) +
  geom_text_wordcloud(rm_outside =TRUE) +
  theme_minimal() + scale_color_gradient(low = "seagreen4", high = "turquoise3") + ggtitle("Taylor Swift Album")

# Fearless
set.seed(42)
fearcloud <- ggplot(fearless[1:50,], aes(label = word, size = n, color = n)
       ) +
  geom_text_wordcloud(rm_outside =TRUE) +
  theme_minimal() + scale_color_gradient(low = "goldenrod", high = "burlywood4")+ ggtitle("Fearless Album")

# Speak Now
set.seed(42)
speaknowcloud <- ggplot(speak[1:50,], aes(label = word, size = n, color = n)
       ) +
  geom_text_wordcloud(rm_outside =TRUE) +
  theme_minimal() + scale_color_gradient(low = "deeppink3", high = "darkmagenta") + ggtitle("Speak Now Album")

# Red
set.seed(42)
redcloud <- ggplot(red[1:50,], aes(label = word, size = n, color = n)
       ) +
  geom_text_wordcloud(rm_outside =TRUE) +
  theme_minimal() + scale_color_gradient(low = "indianred", high = "red4")+ ggtitle("Red Album")

# 1989
set.seed(42)
nineteen89cloud <-ggplot(nineteen89[1:50,], aes(label = word, size = n, color = n)
       ) +
  geom_text_wordcloud(rm_outside =TRUE) +
  theme_minimal() + scale_color_gradient(low = "mediumpurple1", high = "blueviolet")+ ggtitle ("1989 Album")

# Reputation
set.seed(42)
reputationcloud <- ggplot(rep[1:50,], aes(label = word, size = n, color = n)
       ) +
  geom_text_wordcloud(rm_outside =TRUE) +
  theme_minimal() + scale_color_gradient(low = "gray66", high = "gray19")+ggtitle("Reputation Album")

# Lover
set.seed(42)
lovercloud <- ggplot(lover[1:50,], aes(label = word, size = n, color = n)
       ) +
  geom_text_wordcloud(rm_outside =TRUE) +
  theme_minimal() + scale_color_gradient(low = "palevioletred1", high = "lightskyblue")+ ggtitle("Lover Album")

# Folklore
set.seed(42)
folklorecloud <- ggplot(folklore[1:50,], aes(label = word, size = n, color = n)
       ) +
  geom_text_wordcloud(rm_outside =TRUE) +
  theme_minimal() + scale_color_gradient(low = "gray68", high = "gray55")+ggtitle("Folklore Album")

# Evermore 
set.seed(42) 
evermorecloud <- ggplot(evermore[1:50,], aes(label = word, size = n, color = n)
       ) +   geom_text_wordcloud_area(rm_outside =TRUE) +
  theme_minimal() + scale_color_gradient(low = "navajowhite3", high = "lightsalmon2") + ggtitle("Evermore Album") 

require(gridExtra)
grid.arrange(tscloud, fearcloud, speaknowcloud, redcloud, nineteen89cloud, reputationcloud, lovercloud, folklorecloud, evermorecloud, ncol=3)

Overall, the word clouds for each album are relatively different, but there are still some words that stick out among all of the albums. Words like “love”, “time”, and “baby” are fairly popular among these albums. There are, however, clear contrasts between certain albums. For example, in Taylor Swift’s self-titled debut album, you can see “love” accompanied by words like “beautiful” and “hope”, while you see “love” accompanied by words like “died” and “lost” in her most recent album, Evermore. While there are similar themes of love throughout each album, they are distinct in the various ways they are portrayed as Taylor sings about different experiences throughout her life. It is interesting to see how lyrically Taylor Swift has evolved. This is especially prevalent in her two most recent albums, Folklore and Evermore, where we see words with darker, more serious sentiments like “died”, “mad”, “lost”, and “closure”.

Bing Analysis

# Bing Analysis
# TS Album
ts_bing <- ts %>%
  inner_join(get_sentiments("bing"))
# neg 42 pos 24

# Fearless
fearless_bing <- fearless %>%
  inner_join(get_sentiments("bing"))
# neg 47 pos 44

# Speak Now
speak_bing <- speak %>%
  inner_join(get_sentiments("bing"))
#neg 89 pos 47

# Red
red_bing <- red %>%
  inner_join(get_sentiments("bing"))
# neg 79 pos 59

# 1989
nineteen89_bing <- nineteen89 %>%
  inner_join(get_sentiments("bing"))
# neg 74 pos 27

# Reputation
rep_bing <- rep %>%
  inner_join(get_sentiments("bing"))
# neg 112 pos 53

# Lover
lover_bing <- lover %>%
  inner_join(get_sentiments("bing"))
# neg 99 pos 55

# Folklore
folklore_bing <- folklore %>%
  inner_join(get_sentiments("bing"))
# neg 103 pos 35

# Evermore
evermore_bing <- evermore %>%
  inner_join(get_sentiments("bing"))
# neg 87 pos 55

# Creating a dataframe with the negative and positive values for each album and release dates 
negative <- c(42, 47, 89, 79, 74, 112, 99, 103, 87)
positive <- c(24, 44, 47, 59, 27, 53, 55, 35, 55)
album <- c("Taylor Swift", "Fearless", "Speak Now", "Red", "1989", "Reputation", "Lover", "Folklore", "Evermore")
release_date <- c(2006, 2008, 2010, 2012, 2014, 2017, 2019, 2020, 2020)
sentiment <- data.frame(album, release_date, negative, positive, stringsAsFactors=TRUE)
DT::datatable(sentiment)

# Normalizing the values for pos and neg
normalize <- function(x){
  (x - min(x)) / (max(x) - min(x))
}
sentiment$negative <- normalize(sentiment$negative)
sentiment$positive <- normalize(sentiment$positive)
DT::datatable(sentiment)

# Creating graph with just positive and negative values 
plot <- ggplot(sentiment, aes(x=positive, y=negative, color = `album`)) + geom_text(label=album) + ggtitle("Negative vs. Positive Sentiment of Albums") + theme_light() + theme(legend.position = "none") + labs(x='Positive', y='Negative', title='Sentiment Graph') 
ggplotly(plot)

# Graphing values in 3D plot using 3 variables (neg, pos, and release date)
library(plotly)
fig <- plot_ly(sentiment, 
               type = "scatter3d",
               mode="markers",
               x = ~`release_date`, 
               y = ~`positive`, 
               z = ~`negative`,
               color = ~`album`,
               text = ~paste('Album:',album))
fig

NRC Analysis

Taylor Swift Album

ts_nrc <- ts %>%
  inner_join(get_sentiments("nrc"))

Sentiment_Value <- c(17, 24, 14, 18, 28, 34, 48, 21, 16, 28)
Sentiment <- c('anger', 'anticipation', 'disgust', 'fear', 'joy', 'negative', 'positive', 'sadness', 'surprise', 'trust')
tssentiment <- data.frame(Sentiment, Sentiment_Value, stringsAsFactors=TRUE)
DT::datatable(tssentiment)

Fearless Album

fearless_nrc <- fearless %>%
  inner_join(get_sentiments("nrc"))

Sentiment_Value <- c(22, 25, 10, 25, 38, 57, 67, 27, 19, 46)
Sentiment <- c('anger', 'anticipation', 'disgust', 'fear', 'joy', 'negative', 'positive', 'sadness', 'surprise', 'trust')
fearsentiment <- data.frame(Sentiment, Sentiment_Value, stringsAsFactors=TRUE)
DT::datatable(fearsentiment)

Speak Now Album

speak_nrc <- speak %>%
  inner_join(get_sentiments("nrc"))

Sentiment_Value <- c(33, 41, 19, 50, 48, 81, 74, 50, 32, 49)
Sentiment <- c('anger', 'anticipation', 'disgust', 'fear', 'joy', 'negative', 'positive', 'sadness', 'surprise', 'trust')
speaksentiment <- data.frame(Sentiment, Sentiment_Value, stringsAsFactors=TRUE)
DT::datatable(speaksentiment)

Red Album

red_nrc <- red %>%
  inner_join(get_sentiments("nrc"))

Sentiment_Value <- c(32, 36, 23, 39, 40, 68, 84, 36, 20, 44)
Sentiment <- c('anger', 'anticipation', 'disgust', 'fear', 'joy', 'negative', 'positive', 'sadness', 'surprise', 'trust')
redsentiment <- data.frame(Sentiment, Sentiment_Value, stringsAsFactors=TRUE)
DT::datatable(redsentiment)

1989 Album

nineteen89_nrc <- nineteen89 %>%
  inner_join(get_sentiments("nrc"))

Sentiment_Value <- c(27, 20, 17, 36, 22, 58, 39, 33, 12, 21)
Sentiment <- c('anger', 'anticipation', 'disgust', 'fear', 'joy', 'negative', 'positive', 'sadness', 'surprise', 'trust')
nineteen89sentiment <- data.frame(Sentiment, Sentiment_Value, stringsAsFactors=TRUE)
DT::datatable(nineteen89sentiment)

Reputation Album

rep_nrc <- rep %>%
  inner_join(get_sentiments("nrc"))

Sentiment_Value <- c(49, 32, 32, 62, 45, 99, 79, 47, 26, 43)
Sentiment <- c('anger', 'anticipation', 'disgust', 'fear', 'joy', 'negative', 'positive', 'sadness', 'surprise', 'trust')
repsentiment <- data.frame(Sentiment, Sentiment_Value, stringsAsFactors=TRUE)
DT::datatable(repsentiment)

Lover Album

lover_nrc <- lover %>%
  inner_join(get_sentiments("nrc"))

Sentiment_Value <- c(43, 44, 25, 63, 45, 89, 76, 44, 25, 50)
Sentiment <- c('anger', 'anticipation', 'disgust', 'fear', 'joy', 'negative', 'positive', 'sadness', 'surprise', 'trust')
loversentiment <- data.frame(Sentiment, Sentiment_Value, stringsAsFactors=TRUE)
DT::datatable(loversentiment)

Folklore Album

folklore_nrc <- folklore %>%
  inner_join(get_sentiments("nrc"))

Sentiment_Value <- c(51, 34, 34, 54, 41, 97, 90, 53, 24, 44)
Sentiment <- c('anger', 'anticipation', 'disgust', 'fear', 'joy', 'negative', 'positive', 'sadness', 'surprise', 'trust')
folksentiment <- data.frame(Sentiment, Sentiment_Value, stringsAsFactors=TRUE)
DT::datatable(folksentiment)

Evermore Album

evermore_nrc <- evermore %>%
  inner_join(get_sentiments("nrc"))

Sentiment_Value <- c(34, 50, 21, 42, 46, 83, 85, 44, 28, 44)
Sentiment <- c('anger', 'anticipation', 'disgust', 'fear', 'joy', 'negative', 'positive', 'sadness', 'surprise', 'trust')
eversentiment <- data.frame(Sentiment, Sentiment_Value, stringsAsFactors=TRUE)
DT::datatable(eversentiment)

Sentiment Analysis Conclusion

By utilizing bing analysis, we were able to get an idea of the general sentiments conveyed by Taylor Swift in her music throughout the years. The lyrics in her music were predominately classified as “negative” for each of her albums. It is important to note computer programming does have limitations when it comes to the nuance required to analyze certain texts. This means that it is unable to consistently interpret things like sarcasm, jokes, or exaggerations found in her lyrics. Failing to recognize this can lead to biased results and incorrect interpretations, therefore skewing the results. That is why we found it important to use other methods of analysis to see if we could get results that were more specific and gave us a better idea of the sentiments portrayed in her songs. Through nrc analysis, we can see that her music has more anger, sadness, and fear as we progress through each album. Some of the negative sentiments were much higher for her later albums than her earlier ones. Many of her albums, however, were nearly as positive as negative. While nrc analysis is more useful in seeing what specific sentiments are relevant to the lyrics, it is still important to note that some words can be interpreted incorrectly, skewing the results. For example, “Red” is identified as more positive than negative, but every Taylor Swift fan knows this was truly a “heartbreak” album, so one would expect it to have more of a negative sentiment. It would be interesting to dive deeper into what words are being counted as specific sentiments for further analysis. Because of this analysis, it can be said that Taylor Swift writes in various sentiments across her albums.

Conclusion and Future Work

Evidently, Taylor Swift’s discography is marked by 9 distinct eras that showcase her immense versatility and growth throughout the years. Each album captures and reflects upon her unique experiences in life starting out at a young age in the music industry all the way up until now as she dominates the charts. Her music has evolved from country to pop to a more alternative/indie style. This is seen in the differences observed in the Spotify audio features analysis as well as the sentiment analyses of all of her lyrics in each 9 of her albums. While Taylor Swift has held common themes among all of her albums like love and romance, she has also explored different themes for more of her recent albums. It will be interesting to see how Taylor Swift continues to evolve as an artist. By putting data science methods to use, it is helpful to see this evolution take place. In the future, we could do analysis on Taylor Swift’s re-recordings of Fearless and Red compared to the original works. The re-recordings include additional songs “From the Vault”, so these could impact the albums’ sentiment analysis. Some of the re-recorded songs from the original albums also have slightly different audio features such as danceability, energy, tempo, etc. This could have an impact on the Spotify audio features, changing how the album is perceived musically.

The Evolution of Taylor Swift

Jess Laudie, Julia Burek, Kara Koopman, Ruth Efrem

12/8/2021