Taylor Swift is one of the most successful artists in the world, breaking record after record and creating albums that span across multiple genres. She has one of the biggest followings in musical history, and continuously ‘breaks’ Spotify’s servers when she releases new music due to an overwhelmingly amount of ‘Swifties’ trying to listen. As a huge Taylor Swift fan, I developed an analysis that compares all 10 of Taylor Swift’s albums and how they differ in terms of danceability, popularity, energy, acousticness, valence, and length. Because Taylor Swift has been releasing music since 2006, this analysis evidences the shift of her musical creativity from genre to genre and other differences between her albums over the years, including exploring theories that ‘Swifties’ have about her albums.
The dataset in this analysis has 1,265 songs and contains 21 variables, including fields such as song name, album name, and scores such as danceability, energy, and instrumentalness. It was pulled from Kaggle, but originally the data was sourced from Spotify’s API. Included in this dataset are all songs included in the Spotify search for ‘Taylor Swift’, so not every song included in this dataset was actually released by Taylor. For example, nearly 90% of the songs included in this dataset are karaoke versions of albums, deluxe versions, albums in different languages, etc. Because I wanted to focus in on the differences between Swift’s 10 studio albums, I filtered out much of the dataset and only included songs that were a part of one of these 10 albums: Taylor Swift, Fearless (Taylor’s Version), Speak Now, Red (Taylor’s Version), 1989 (Deluxe Edition), reputation, Lover, folklore, evermore, and Midnights. Keeping all 21 variables, this left a dataset of 182 songs. Diving either further, I didn’t want an album such as Red (Taylor’s Version) to outweigh other scores (such as the sum of popularity) due to it’s sizable 30 tracks, so I only used the first 15 tracks from each album so each album could be compared equally.
From a high-level perspective, there were many findings comparing all 10 of Swift’s studio albums. In terms of album popularity, Swift’s ‘reputation’ ranked the highest, with 100% of its tracks having high danceability ratings. In terms of Swift’s musical shift, the ‘Albums by Energy’ and ‘Albums by Acoustiness’ both evidenced there was a significant change in Taylor’s music, going from high energy to low energy, and not acoustic to acoustic. In addition, I found that song length seemed to remain consistent across the albums, with ‘Speak Now’ having slightly longer songs than the other albums. Lastly, I had the chance to dive in to ‘Swiftie’ theories to find that track number 5 was the track with the lowest valence score (i.e., Taylor’s most sad songs) across 50% of the albums - evidencing that track 5 may have special meaning to Swift.
This stacked barchart shows Taylor Swift’s most popular albums given the sum of each track number’s popularity score. It answers the question, “What is Taylor Swift’s most popular album?” As shown, in its whole, ‘reputation’ has the most popular tracks, with ‘Lover’ and ‘Midnights’ falling close behind. In addition to the popularity scores, each album is filled by how many of the tracks are highly danceable or lowly danceable. A track is considered as “High Danceability” if the danceability score is greater than or equal to 0.5, and “Low Danceability” if otherwise. As seen in the graph, ‘reputation’ is the only album where all of the songs are considered to be highly danceable. In addition to the highest popularity score, ‘Taylor Swift’ ranked the lowest in popularity, making sense as this was her debut album and before she established herself in the music industry.
#group albums by sum of popularity
dance_tot <- new_df %>%
select(album, popularity) %>%
group_by(album) %>%
summarise(tot = sum(popularity), .groups="keep") %>%
arrange(-tot) %>%
data.frame()
#group albums based on danceability score
pop_tot <- new_df %>%
mutate(danceRating = ifelse(danceability>=0.5,"High Danceability","Low Danceability")) %>%
select(danceRating, album, popularity) %>%
group_by(album, danceRating) %>%
summarise(totpop=sum(popularity), .groups='keep') %>%
data.frame()
#visualization #1: bar char analyzing top 10 albums by popularity (popularity is a score given from 0-100 based on how popular a song is)
max_y <- round_any(max(dance_tot$tot), 300, ceiling)
ggplot(data=pop_tot, aes(x=reorder(album, totpop), y=totpop, fill=danceRating)) +
geom_bar(position = position_stack(reverse=TRUE), stat="identity") +
labs(title = "Taylor Swift Album Popularity by Dance Rating",
x="Album Title", y="Total Popularity Score", fill="Dance Rating",
caption="Dance Rating is High Danceability if track has a danceability score >= 0.5, otherwise it is Low Danceability.") +
theme(plot.title = element_text(hjust = 0.5)) +
theme_light() +
theme(plot.caption = element_text(face="italic", hjust=0.5, size=8)) +
geom_text(data = dance_tot, aes(x = album, y = tot, label = scales::comma(tot), fill=NULL), hjust=0.4, size=4) +
coord_flip()+
scale_fill_brewer(palette ="RdPu") +
scale_y_continuous(labels = comma,
breaks= seq(0, max_y, by = 250),
limits=c(0, max_y))
These bar charts look into the question, “Do energy levels change from album to album?” The energy rating was based on the track’s energy score. If a track had an energy rating of greater than or equal to 0.5, it was categorized as ‘High Energy’, and ‘Low Energy’ if otherwise. As you can see, for her first 7 albums, nearly all the tracks were considered to be ‘High Energy’, while in her three most recently released albums, there was a shift in energy where nearly all the tracks are considered ‘Low Energy’. These graphs help evidence the shift of Swift’s musical creativity, with her shift from country and pop to more indie and folk genres.
energy_df <- new_df %>%
mutate(energyRating = ifelse(energy>=0.5,"High Energy","Low Energy")) %>%
data.frame()
myalbums <- c("Taylor Swift","Fearless (Taylor's Version)", "Speak Now", "Red (Taylor's Version)", "1989 (Deluxe Edition)","reputation", "Lover", "folklore","evermore", "Midnights")
energy_df$album <- factor(energy_df$album, level=myalbums)
ggplot(energy_df, aes(x=track_number, y=energy, fill=energyRating)) +
geom_bar(stat="identity", position='dodge') +
theme_light() +
theme(plot.title = element_text(hjust=0.5),
strip.text = element_text(size = 5.5),
plot.caption = element_text(face="italic", hjust=0.5, size=8)) +
scale_y_continuous(labels=comma) +
labs(title="Energy Score by Album by Track Number",
x = "Track Number",
y = "Energy Score",
fill = "Energy Rating",
caption = "High Energy is any track with an Energy Score >= 0.5.
Low Energy is any track with an Energy Score < 0.5.") +
scale_fill_brewer(palette ="RdPu") +
facet_wrap(~album,ncol=5,nrow=2)
These pie charts look into the question of “How do acoustic levels change from album to album?” Ranging from 0.0 to 1.0, I categorized a track as “Acoustic” if its acousticness score was greater than or equal to 0.5, and “Not Acoustic” if otherwise. Nearly all of Swift’s albums are significantly not acoustic, especially ‘Fearless (Taylor’s Version)’ which is 100% not acoustic. Standing out are the albums ‘folklore’ and ‘evermore’, as they are the only two albums that are almost entirely acoustic. Once again, this graph evidences Swift’s shift from pop/country music to indie.
acoustic_df <- new_df %>%
mutate(acousticRating = ifelse(acousticness>=.5,"Acoustic","Not Acoustic")) %>%
data.frame()
pie_df <- acoustic_df %>%
select(acousticRating,album) %>%
group_by(album, acousticRating) %>%
summarise(n=length(acousticRating), .groups='keep') %>%
group_by(album) %>%
mutate(percent_of_total = round(100*n/sum(n),1)) %>%
ungroup() %>%
data.frame()
myalbums <- c("Taylor Swift","Fearless (Taylor's Version)", "Speak Now", "Red (Taylor's Version)", "1989 (Deluxe Edition)","reputation", "Lover", "folklore","evermore", "Midnights")
pie_df$album <- factor(pie_df$album, level=myalbums)
pie_df$acousticRating = factor(pie_df$acousticRating, levels=c("Acoustic", "Not Acoustic"))
ggplot(data=pie_df, aes(x="",y=n, fill=acousticRating)) +
geom_bar(stat="identity", position="fill") +
coord_polar(theta="y", start=0) +
labs(fill="Acoustic Rating",x=NULL,y=NULL, title="Acoustic Rating by Album",
caption = "Acoustic is any track with an acoustic score >=0.5, and Not Acoustic if otherwise.")+
theme_light() +
theme(plot.title = element_text(hjust=0.5),
axis.text = element_blank(),
axis.ticks = element_blank(),
panel.grid = element_blank(),
plot.caption = element_text(face="italic", hjust=0.5, size=8),
strip.text = element_text(size = 6)) +
facet_wrap(~album, ncol=5, nrow=2) +
scale_fill_brewer(palette="RdPu") +
geom_text(aes(x=1.7, label=ifelse(percent_of_total>5,paste0(percent_of_total, "%"),"")),
size=2,
position = position_fill(vjust=0.5))
This heatmap looks at albums by track number’s duration. The field used for song length was duration_ms, which I later converted to minutes for this analysis. I was curious to see if there were any track numbers that consistently were longer or shorter across all albums, or if any albums had individual tracks that were significantly longer or shorter. From the heatmap, it seems as though Speak Now had consistently longer tracks than the other albums, while all the other albums seemed to have relatively similar track lengths.
length_df <- new_df %>%
select(album, track_number, duration_ms) %>%
group_by(album, track_number) %>%
summarise(tot= round(sum(duration_ms)/1000/60,2), .groups='keep') %>%
data.frame()
myalbums <- c("Midnights","evermore","folklore","Lover", "reputation","1989 (Deluxe Edition)","Red (Taylor's Version)","Speak Now","Fearless (Taylor's Version)","Taylor Swift")
length_df$album <- factor(length_df$album, level=myalbums)
breaks = c(seq(min(length_df$tot), max(length_df$tot), by=0.75))
x_axis_labels = min(length_df$track_number):max(length_df$track_number)
ggplot(length_df, aes(x=track_number, y=album, fill=tot)) +
geom_tile(color='black') +
geom_text(aes(label=(tot)), size=2) +
coord_equal(ratio=2) +
labs(title="Song Length by Track Number",
x="Track Number",
y="Album",
fill="Song Length in Minutes") +
theme_minimal() +
theme(plot.title=element_text(hjust=0.4)) +
scale_x_continuous(labels=x_axis_labels, breaks= x_axis_labels, minor_breaks=NULL) +
scale_y_discrete(limits=rev(levels(length_df$album))) +
scale_fill_continuous(low="white",high='maroon',breaks=breaks) +
guides(fill=guide_legend(reverse=FALSE, averride.aes=list(colour="black")))
For the last part of this analysis, I thought it would be fun to answer the ‘Swiftie’ question, “Is track number 5 always Taylor Swift’s most sad and personal song on the album?” This analysis looks at valence score by track number for each album. Spotify defines the valence score as “A measure from 0.0 to 1.0 describing the musical positiveness conveyed by a track. Tracks with high valence sound more positive (e.g. happy, cheerful, euphoric), while tracks with low valence sound more negative (e.g. sad, depressed, angry).” As it turns out, 5 out of Taylor’s 10 albums have track number 5 as the song with the lowest valence score on the album. This gives proof that the Swifties are on to something, as track 5 has the lowest valence score in ‘Speak Now’, ‘Red (Taylor’s Version)’, ‘reputation’, ‘Lover’, and ‘folklore’. It seems as though track 5 is definitely something special for Miss Swift.
hi_lo <- new_df %>%
group_by(album) %>%
filter(valence == min(valence)) %>%
data.frame()
myalbums <- c("Taylor Swift","Fearless (Taylor's Version)", "Speak Now", "Red (Taylor's Version)", "1989 (Deluxe Edition)","reputation", "Lover", "folklore","evermore", "Midnights")
new_df$album <- factor(new_df$album, level=myalbums)
hi_lo$album <- factor(hi_lo$album, level=myalbums)
ggplot(new_df, aes(x=track_number, y=valence)) +
geom_line(color="darkgrey", linewidth=1) +
theme_light() +
theme(plot.title = element_text(hjust=0.5),
plot.caption = element_text(face="italic", hjust=0.5, size=8),
strip.text = element_text(size = 7)) +
geom_point(data = hi_lo, aes(x=track_number, y=valence), shape=21, size=4, fill='pink', color='pink') +
scale_y_continuous(labels=comma) +
labs(title="Album Track Number by Valence", x='Track Number', y='Valence Score',
caption='*Pink point signifies track number with the valence low for that album; i.e., the saddest song on the album.') +
scale_x_continuous(breaks=c(seq(0,15,by=5))) +
facet_wrap(~album, ncol=5, nrow=2)
This analysis gave a deeper dive into Taylor Swift’s albums and was actually able to visualize her musical shift from genre to genre given data on each song. Swift’s shift from country/pop genres to indie is evidenced through the progression of her albums, where the energy has gone down in her more recent albums and acousticness has been increasing. In terms of album popularity, another key takeaway from this analysis is that ‘reputation’ is Taylor Swift’s most popular album given Spotify’s popularity scores for each track. In addition, ‘reputation’ is also the only album that all tracks had dancebility scores over 0.5. When comparing song length across all albums, most of Swift’s song lengths are consistently similar, with ‘Speak Now’ seeming to be the album having the longest songs. Lastly, another useful finding, especially useful for ‘Swifties’, is that 50% of all Taylor Swift albums have Track Number 5 as the track with the lowest valence score on the album, evidencing that ‘Swiftie’ theories are correct - Track 5’s are some of Swift’s most sad and personal songs. These findings are interesting as there is actual data to model Swift’s transition between different genres, something that I did not know could be visualized, but rather only heard. This analysis can help bring insight to differences between Swift’s albums, and maybe even prove a ‘Swiftie’ theory or two along the way.