A Brief Overview of Miss Americana
What To Know Going In:
Retrieving & Cleaning Our Spotify Data
Analyzing Spotify’s Data
Retrieving & Cleaning Lyric Data
- Lyric Wordcloud
- Location Bubble Map

A Brief Overview of Miss Americana

Singer, songwriter, director, actor - Taylor Swift is a household name and one of the biggest artists of all time. This is in large part due to her amazing talent of being able to reinvent herself often to hold the media’s attention, which usually coincides with the release of a new album. However, Taylor Swift is not in the music business - she is in the personality business, selling a different version of herself at each recreation. This reinvention is objectively impressive, but an important factor to consider is why it is needed. Swift herself has remarked that she “[wants] to work really hard… while society is still tolerating [her] being successful,” due to the intense double standard that women face in trying to keep themselves relevant to fans. ¹

All of this is to say that Swift’s musical talent has an enormous width and breadth. In fact, she has earned 3 Grammys for Album Of The Year for 3 albums of different genres, in 3 different decades.² Thus, I thought it would be interesting to take a deep dive into the data of Swift’s music. How similar are her albums? What changes over time? What keeps fans coming back for more after more than 15 years?

But first, we need some background for the uninitiated. For the Swifties - press here to skip to the analysis.

What To Know Going In:

Going into our analysis, there are a few things to know about Taylor Swift. She has 9 albums (Self-titled, Fearless, Speak Now, Red, 1989, reputation, Lover, folklore, and evermore). Special note - these albums all have color themes associated with them, which corresponds to how I have displayed them in this entire report. However, we are going to be analyzing her 11 albums today. The discrepancy arises because of her recent need to rerecord her albums - 2 of which have been released (Fearless Taylor’s Version and Red Taylor’s Version). I chose to include the original and Taylor’s Versions of these albums, as her rerecording era is indeed another self reinvention. However, she isn’t pursuing these re-recordings just for fun. She had her original masters stolen from her, and is rerecording in order to own her own catalog of music and devalue the stolen masters. Along the way, she is also busting out songs “From The Vault”; songs she had written at the time, but that did not make the cut for the album. Thus, I thought it important to include so as to analyze how these vault tracks have shaped the profile of Fearless and Red Taylor’s Versions respectively.

Retrieving & Cleaning Our Spotify Data

Skip to the analysis. I chose to import data on Taylor Swift from Spotify’s API. In doing this, we get a huge dataframe that is very cumbersome and has quite a few redundant variables. Here is the code I used to clean it up, along with comments explaining what we’re doing!

# Saving all of Spotify's data on Swift to object "ts"
ts <- get_artist_audio_features("taylor swift")


# Data Cleaning ####

# Setting up factors (multiple discrete values) and new (more useful) duration
# column. Also turning Boolean "explicit" column into a binary
ts$album_name <- as.factor(ts$album_name)
ts$album_release_date <- as.Date(ts$album_release_date)
ts$key_name <- as.factor(ts$key_name)
ts$key_mode <- as.factor(ts$key_mode)
ts$mode_name <- as.factor(ts$mode_name)

ts$duration <- (ts$duration_ms/1000)/60 # Milliseconds -> minutes

ts$explicit <- as.logical(ts$explicit*1)


# We have a lot of unnecessary columns (ex: the link to the song does not 
# matter to us). Lets remove these columns from our dataframe
ts <- subset(ts, select = -c(artist_id, album_id, album_type, album_images, 
                             artist_name, album_release_date_precision, 
                             track_id, analysis_url, available_markets, 
                             disc_number, track_href, is_local,
                             track_preview_url, type, track_uri, 
                             external_urls.spotify, duration_ms))


# Cleaning our dataset - TS has a lot of repeat songs uploaded due to deluxe 
# vs. normal editions, karaoke versions, extra "chapters" listed for 
# searchability, etc. We just want her main albums (but all songs included 
# - so deluxe versions!). Including all listed albums
ts <- subset(ts, album_name == "1989 (Deluxe)" | # 1989 OR
                 album_name == "evermore (deluxe version)" | # evermore OR
                 album_name == "Fearless Platinum Edition" | # Fearless, etc.
                 album_name == "Fearless (Taylor's Version)" | 
                 album_name == "folklore (deluxe version)" |
                 album_name == "Lover" |
                 album_name == "Red (Deluxe Edition)" |
                 album_name == "Red (Taylor's Version)" | 
                 album_name == "reputation" | 
                 album_name == "Speak Now (Deluxe Edition)" | 
                 album_name == "Taylor Swift")


# Deleting non unique rows with the distinct_all() function
ts <- distinct_all(ts)


# We still have repeats -  the amount of songs we should see should be around
# 175 - 225, not 292. When looking at the data, some albums are listed twice.
ts <- ts[-c(31, 32, 76:90, 108:124, 174:195, 218:238), ] # Removing repeat albums


# Reorder chronologically by album
ts$album_name <- reorder(ts$album_name, ts$album_release_date)

Analyzing Spotify’s Data

Now that our data is cleaned, we can finally dig into it and see what we find. Spotify gives us a few metrics to evaluate songs by - danceability, energy, speechiness, acousticness, instrumentalness, liveness, and valence. Spotify themselves use these metrics when giving people song recommendations, so I thought it would be interesting to examine a few of these metrics and see what we find across all of Swift’s albums. I also decided to include a few more features to analyze Swift’s albums.

Danceability

Danceability is defined as “how suitable a track is for dancing based on a combination of musical”elements including tempo, rhythm stability, beat strength, and overall regularity.”³

I used a ridgeline plot to display the danceability of all of Swift’s albums.

ggplot(ts, aes(x = danceability, y = album_name, fill = album_name)) +
          geom_density_ridges2(scale = 1.6,
                               alpha = .95,
                               quantile_lines = T,
                               quantiles = 2) +
          coord_cartesian(xlim = c(0.2, .98),
                          ylim = c(1.57, 12)) +
          guides(fill = "none") +
          scale_fill_manual(values = c("#50a7e0", "#ddc477", "#923c81", "#951e1a", 
                                       "azure2", "#353839", "hotpink", "#bababa", 
                                       "#994914", "#f6ed95", "#951e1a")) +
          theme_light(base_size = 12) +
          labs(title = "Danceability by Taylor Swift Albums",
               x = "Danceability",
               y = "Album Name")

We can see that Lover is clearly contains her most danceable songs, and with the wedding anthem that is the track Lover, this makes a lot of sense. Lover also seems to have the most range in terms of danceability, likely due to slow songs such as “Soon You’ll Get Better” which focused on her mother’s cancer diagnosis and has little room for dance. Joining Lover on the lower end of the danceability distribution is Swift’s sister folk albums folklore and evermore. These albums are not pop nor country albums, so their low ranking under danceability should be unsurprising. reputation seems to have the smallest distribution, with all its songs being relatively danceable - considering the heavy beats and large presence that this album brings, the fact that it is danceable coincides heavily with its identity. Both Taylor’s Versions score significantly lower than their original counterparts. This could either be due to the vault tracks being less danceable, or Swift’s re-recordings of the originals being less danceable overall.

Energy

Energy is defined as “a perceptual measure of intensity and activity…energetic tracks feel fast, loud, and noisy.”⁴

Surprisingly, the energy distributions seem quite different than the danceability distribution even though one would assume these would go hand in hand. However, here are some similarities between the two - folklore is by far Swift’s least energetic album. 1989 scores as Swift’s most energetic album (think of hits such as Shake It Off, Blank Space, and How You Get The Girl), followed closely by the pop-rock Speak Now (think Haunted, The Story of Us, Superman). However, Speak Now also has low energy songs such as Last Kiss and Innocent, which we see in the somewhat long left tail. Both versions of Red and Lover have an extremely wide distribution, likely due to the scatter shot nature of both albums. In fact, Swift has said that “Red resembled a heartbroken person” in that “it was all over the place, a fractured mosaic of feelings that somehow all fit together in the end.”⁵ Lover represented a more mature take on this disorderdly composition, where Swift “[experiences] all the beautiful complexities, unwrapped feelings, and trials of [love].”⁶ In essence, Lover is an album about every type of love, and its diversity is a recurring theme in our analysis. The Taylor’s Versions are very similar in energy to the original versions, with similar medians and somewhat similar distributions.

Valence

Valence “[describes] the musical positiveness conveyed by a track. Tracks with high valence sound more positive…while tracks with low valence sound more negative.”⁷

In terms of valence, most albums seem to have a very large range, spanning most of the 0 - 1 range. This means that Swift’s albums typically include a wide range of sad and happy songs throughout the albums. However, reputation seems to be an outlier. reputation has a far smaller range than the other albums, and is on the lower end. Considering reputations dark and rebellious theme, it should be no surprise that it scores lowest on Spotify’s measure of positivity. 1989 has the highest median valence, slightly edging out the surprising contender Red. Red Taylor’s Version scores lower than Red Original Version, yet Fearless Taylor’s Version handily beats Fearless Original Version.

Tempo

Speak Now interestingly has the highest median tempo, with Lover having the lowest median tempo. Lover makes sense at the bottom with its slower songs such as Soon You’ll Get Better, and Lover. However, Speak Now also has slower songs, but is seemingly carried by its pop rock anthems to the highest median tempo of all of Swift’s album. Lover and Debut have the widest distribution of tempos, with Red Original Version having the thinnest tempo distribution.

Song Length By Album

I also thought it would be interesting to see how Swift’s approach to song length has changed over time. Across the industry, song length has been declining since 1990, and some speculate that this is due to a decreasing attention span of users and/or algorithms preference towards number of streams (thus preferring shorter, more re-playable songs).⁸

Looking at our data, we can see Swift increased song length up to Speak Now, which has the highest median song length of all her albums. After Speak Now, song length tapered off until Lover, her album with the shortest songs. From there, it increased again with folklore and evermore and the Taylor’s Versions, especially with the extreme outlier on Red (Taylor’s Version) that is All Too Well Ten Minute Version. One interesting thing to note is that the Taylor’s Versions have very similar song lengths to their original counterparts. While this may seem like a no brainer, this means that the new songs added From The Vault typically are around the same length as the Original Album’s song lengths.

Number of Songs Per Album

As opposed to the length of each song in the album, we are just comparing how many songs are on each of Swift’s albums.

# Constructing a data frame that has the sum of entries that correspond to each listed
# criteria (rows where album_name = "Taylor Swift", etc.)
albumLength <- data.frame(c("Taylor Swift", "Fearless Platinum Edition", 
                            "Speak Now (Deluxe Edition)", "Red (Deluxe Edition)",
                            "1989 (Deluxe)", "reputation", "Lover", 
                            "folklore (deluxe version)", "evermore (deluxe version)", 
                            "Fearless (Taylor's Version)", "Red (Taylor's Version)"),
                          c(sum(ts$album_name == "Taylor Swift"), 
                            sum(ts$album_name == "Fearless Platinum Edition"),
                            sum(ts$album_name == "Speak Now (Deluxe Edition)"),
                            sum(ts$album_name == "Red (Deluxe Edition)"),
                            sum(ts$album_name == "1989 (Deluxe)"),
                            sum(ts$album_name == "reputation"),
                            sum(ts$album_name == "Lover"),
                            sum(ts$album_name == "folklore (deluxe version)"),
                            sum(ts$album_name == "evermore (deluxe version)"),
                            sum(ts$album_name == "Fearless (Taylor's Version)"),
                            sum(ts$album_name == "Red (Taylor's Version)")
                            )
                          )

# Changing column names
colnames(albumLength) <- c("album_name", "album_length")

# Setting album_name as a factor for this data frame (set distinct values)
albumLength$album_name <- factor(albumLength$album_name, levels = albumLength$album_name)

# Bar Plot of Album Length
ggplot(albumLength, aes(x = album_name, y = album_length, fill = album_length)) +
  geom_col(width = .75,
           col = "black",
           fill = c("#50a7e0", "#ddc477", "#923c81", "#951e1a", "azure2", "#353839", 
                    "hotpink", "#bababa", "#994914", "#f6ed95", "#951e1a")) +
  coord_flip() +
  labs(title = "Number of Songs by Taylor Swift Album",
       x = "Album Name",
       y = "# of Songs") +
  scale_y_continuous(expand = c(0, 0), limits = c(0, 31)) +
  theme_light()

As we can see from the bar chart, Red (Taylor’s Version) has the most amount of songs of all of Swift’s albums at 30. In general, the Taylor’s Versions sweep this section (which makes sense considering the songs From The Vault). It will definitely be interesting to see how many songs Speak Now (Taylor’s Version) has on it as it is already one of her longer albums. In terms of already released albums however, reputation scores the lowest amount of tracks at 15.

Keys Used

As we can see, Swift’s most used keys are (in order) G Major, C Major, and D Major. These align directly with most popular keys on Spotify overall.⁹ Typically these keys are used because they have relatively few sharps/flats, and this means they have a low barrier to entry, and are more versatile when contrasted with keys with more technicalities.

Retrieving & Cleaning Lyric Data

Taylor Swift is also known for her amazing songwriting.

# Retrieving Song Lyric Data ####
# Importing all of Swift's lyrics as a data frame
tsLyrics <- as.data.frame(pdf_text(pdf = "lyrics.pdf"))

# Renaming the column to "value"
colnames(tsLyrics) <- "value"

# Separating our lyrics into a data frame where each row is one word
tidyLyrics <- tsLyrics %>% unnest_tokens(word, value)

# Loading the pre-selected "stop words" (ex: the, of, like, etc.)
data(stop_words)

# Removing all stop words from our dataframe
tidyLyrics <- tidyLyrics %>% anti_join(stop_words)

# Counting the frequency of lyrics, sorting in descending
lyricCount <- tidyLyrics %>% count(word, sort = T)

However, we are not out of the woods yet - looking at the first few entries of our “lyricCount” object, we can see some stop words that weren’t caught by tidytext, such as “ooh”, “wanna”, and “yeah”.

# Deleting remaining stop word rows - we don't need to scour the entire
# df, as we are only showing 150 words in our cloud
lyricCount <- lyricCount[-c(3, 5:6, 9:10, 29, 50, 88, 123, 151), ]

Lyric Wordcloud

Now that we have all of our lyrics tokenized and cleaned, we can put them into a wordcloud to highlight which words Swift uses the most in her entire discography. We correlate the size of the word with the number of times it is used.

# Setting display margins to be smaller to accommodate a large word cloud
par(mar = c(2, 2, 2, 2))

# Plotting our wordcloud
wordcloud(words = lyricCount$word, 
          freq = lyricCount$n,
          scale = c(6, .5),
          max.words = 150, 
          random.order = F)

Time and Love seem to be the largest focus in Swift’s music (no surprise there), and it seems easy to recognize why this may be. Songs such as This Love and You are in Love come to mind for “Love”, while “Time” seems to be a word that isn’t repeated in one song necessarily, but is referenced quite a lot in general.

Location Bubble Map

Swift is also known for referencing New York (among other places) in her songs, having multiple songs titled with locations in New York (Welcome To New York, Cornelia Street, coney island, etc.). She even gave the commencement speech for New York University’s Class of ’22! I thought it would be interesting to plot all the locations Swift has mentioned along with how many times they’ve been mentioned in her lyrics to see what other places she frequently mentions.

Luckily for me, the blog taylorswiftandx on tumblr has synthesized all the locations mentioned in Swift’s songs (among many, many other things).¹⁰ So, I put these into an excel workbook and made a bubble map!

# Importing our dataframe
tsCounts <- read_excel("ts_locations.xlsx", "Counts")

# Text for the tooltip:
text <- paste("Location: ", tsCounts$Location, "<br/>",
              "Mentions: ", tsCounts$Count) %>%
  lapply(htmltools::HTML)

# Making our Bubble Map
leaflet(tsCounts) %>%
  addTiles() %>%
  setView(lat = 40, lng = 0, zoom = 2) %>%
  addProviderTiles("OneMapSG") %>%
  addCircleMarkers(lng = tsCounts$Longitude, 
                   lat = tsCounts$Latitude,
                   label = text,
                   radius = ~ Count,
                   stroke = T,
                   fillOpacity = .33,
                   color = "orchid")

In case you’re curious about the specific lyrics:

Our map tells quite the story - Miss Americana definitely focuses on America. However, Swift mentions London quite a lot, along with Windermere, a lake in England. This can be attributed to the fact that her current partner of 4 years, Joe Alwyn, is from England, and they have vacationed to Windermere together. London Boy also mentions many locations in London, contributing to its prominence on the map. Back in America, New York is obviously king here, but Los Angeles is quite prominent. The Angel City is a prominent place for celebrities, and Swift owns property there. Tennessee is also mentioned a significant amount of times, probably due to the fact that Swift moved to Nashville to pursue her music career, and still owns property there as well. The other locations mentioned are one-offs, and typically occur in folklore and evermore, her two storytelling albums that aren’t necessarily about her own life experiences.

love	time	ooh	baby	wanna	yeah
320	294	208	185	164	131

TayloR Swift - An Analysis

Shreyush Shankar

2022-06-25