This document aims to demonstrate my knowledge of the R programming language from the Coding 3 course. For this analysis, an online dataset from TidyTuesday was utilized, which is available on the following link: TidyTuesdayData. The aim of the analysis is to find correlations between the different attributes of Taylor Swift’s songs and her success and to observe how they have changed over time during her career.
The analysis utilized 2 separate datasets from TidyTuesdayData, one that include details of the albums, and one that includes the different attributes of the songs. As primary key, I used the ‘album_name’ variable to merge these two dataset. In the analysis onwards, I used the aggregated table in a data.table format to make the visualization and filtering smoother.
# Loading data from Github and merging two tables to a dt
songs <- fread("https://raw.githubusercontent.com/torokpe/Coding3./refs/heads/main/taylor_all_songs.csv")
albums <- fread("https://raw.githubusercontent.com/torokpe/Coding3./refs/heads/main/taylor_albums.csv")
dt <- merge(songs, albums[, .SD, .SDcols = !c("ep", "album_release")], by = "album_name")
# Renaming columns to be precise
setnames(dt, old = "metacritic_score", new = "album_critic_score")
setnames(dt, old = "user_score", new = "album_user_score")
### Removing 'lyrics' and 'artist' columns because of redundancy
dt[, lyrics := NULL]
dt[, artist := NULL]
# Display the full merged dataset (dt)
reactable(dt)
In this section I analyse Taylor Swift songs using data aggregation, filtering and grouping.
reactable(dt[ep == FALSE, .(
mean_user_score = mean(album_user_score, na.rm = TRUE),
mean_critic_score = mean(album_critic_score, na.rm = TRUE) / 10
), by = album_name]
)
dt[, track_year := as.integer(substring(track_release, 1, 4))]
reactable(dt[, .(num_tracks = .N), by = track_year][order(track_year, decreasing = FALSE)])
library(reactable)
# Filtering out missing values in key_name
filtered_dt <- dt[!is.na(key_name), .(
mean_danceability = mean(danceability, na.rm = TRUE),
mean_energy = mean(energy, na.rm = TRUE)
), by = key_name]
reactable(filtered_dt)
explicit_summary <- dt[ep == FALSE, .(
percent_explicit = mean(explicit, na.rm = TRUE) * 100, # Convert mean to percentage
percent_implicit = (1 - mean(explicit, na.rm = TRUE)) * 100 # Complement of explicit
), by = album_name]
# Display the result as a table
reactable(explicit_summary,
columns = list(
percent_explicit = colDef(format = colFormat(digits = 1), name = "Explicit (%)"),
percent_implicit = colDef(format = colFormat(digits = 1), name = "Implicit (%)")
),
bordered = TRUE, striped = TRUE, highlight = TRUE)
In this section I will analyse Taylor Swift’s art on album level using data visualization.
ggplot(dt, aes(x = track_release)) +
stat_summary(
aes(y = danceability, color = "Danceability"),
fun = mean, geom = "line", size = 1
) +
stat_summary(
aes(y = danceability, color = "Danceability"),
fun = mean, geom = "point", size = 3
) +
stat_summary(
aes(y = energy, color = "Energy"),
fun = mean, geom = "line", size = 1
) +
stat_summary(
aes(y = energy, color = "Energy"),
fun = mean, geom = "point", size = 3
) +
scale_color_manual(values = c("Danceability" = "#5ab4ac", "Energy" = "#d8b365")) +
labs(
title = "Danceability and Energy Trends at Album Level",
x = "Album Release Date",
y = "Average Value (0-1)",
color = "Attributes"
) +
theme_linedraw( # Adding theme
base_size = 11,
base_family = "",
base_line_size = 11/22,
base_rect_size = 11/22) +
theme(
legend.position = "right", # Keep the legend on the right
plot.title = element_text(hjust = 0.5, face = "bold", size = 14) # Center-align and style the title
)
This line chart shows the trends of danceability and energy in Taylor
Swift’s music at the album level over time. Each point represents the
average value for the respective attribute on a scale from 0 to 1, with
danceability in teal and energy in gold. The plot highlights significant
variability in both attributes, with some albums exhibiting high energy
but moderate danceability, such as releases around 2015. Notably,
danceability and energy tend to follow independent trajectories,
showcasing Taylor Swift’s ability to explore diverse musical styles and
tones across her discography.
dt_album <- dt %>%
group_by(album_name) %>%
summarize(
avg_energy = mean(energy, na.rm = TRUE),
avg_danceability = mean(danceability, na.rm = TRUE),
album_user_score = first(na.omit(album_user_score)) # Handle NAs in album_user_score
)
# Create the bubble chart
ggplot(dt_album, aes(x = avg_energy, y = avg_danceability, size = album_user_score)) +
geom_point(alpha = 0.7, color = "#5ab4ac") + # Bubbles with transparency
geom_text_repel(aes(label = album_name), size = 3, color = "#d8b365") + # Add album names
labs(
title = "Energy, Danceability and album user score of albums",
x = "Average track energy",
y = "Average track danceability",
size = "Album user score"
) +
theme_linedraw( # Adding theme
base_size = 11,
base_family = "",
base_line_size = 11 / 22,
base_rect_size = 11 / 22
) +
theme(
legend.position = "right", # Keep the legend on the right
plot.title = element_text(hjust = 0.5, face = "bold", size = 14) # Center-align and style the title
)
Albums like “Reputation” and “Lover” exhibit both high energy and
danceability, reflecting their vibrant and upbeat nature. In contrast,
albums such as “Folklore” and “Evermore” have lower energy and
danceability, aligning with their more mellow and introspective tones.
The size of the bubbles indicates user ratings, with highly rated albums
like “1989” standing out for their combination of danceability and
energy. This visualization highlights the diverse musical styles present
across Taylor Swift’s discography.
ggplot(dt, aes(x = "", fill = factor(ep))) +
geom_bar(width = 1, aes(y = ..count..)) + # Bar chart for the base
coord_polar(theta = "y") + # Convert to a pie chart
geom_text(
aes(
y = ..count..,
label = paste0(round((..count..) / sum(..count..) * 100, 1), "%")
),
stat = "count", # Use stat = "count" to compute percentages dynamically
position = position_stack(vjust = 0.5), # Position the labels in the middle of slices
color = "white",
size = 4
) +
scale_fill_manual(
values = c("TRUE" = "#d8b365", "FALSE" = "#5ab4ac"), # Custom colors
labels = c("TRUE" = "Extended play", "FALSE" = "Album release") # Custom labels
) +
labs(title = "Distribution of EP Status", fill = "EP Status") +
theme_linedraw( # Adding theme
base_size = 11,
base_family = "",
base_line_size = 11 / 22,
base_rect_size = 11 / 22
) +
theme(
axis.text.y = element_blank(), # Remove y-axis labels
axis.text.x = element_blank(), # Remove x-axis labels
axis.ticks.y = element_blank(), # Remove y-axis ticks
axis.ticks.x = element_blank(), # Remove x-axis ticks
axis.title.y = element_blank(), # Remove y-axis title
axis.title.x = element_blank(), # Remove x-axis title
legend.position = "right", # Keep the legend on the right
plot.title = element_text(hjust = 0.5, face = "bold", size = 14) # Center-align and style the title
)
This pie chart illustrates the distribution of EP (Extended Play) status
in Taylor Swift’s discography. The majority of her releases (95.1%) are
full-length albums, while only a small fraction (4.9%) are categorized
as extended plays. This emphasizes Taylor Swift’s focus on producing
complete albums, which align with her storytelling approach and artistic
style.
ggplot(dt, aes(x = album_name, y = duration_ms)) +
geom_boxplot(fill = "#5ab4ac", color = "#01665e", outlier.color = "#d8b365", outlier.shape = 16, alpha = 0.7) +
labs(
title = "Distribution of Song Durations by Album",
x = "Album Name",
y = "Duration (ms)"
) +
theme_linedraw( # Adding theme
base_size = 11,
base_family = "",
base_line_size = 11 / 22,
base_rect_size = 11 / 22
) +
theme(
axis.text.x = element_text(angle = 45, hjust = 1), # Rotate x-axis labels for better readability
plot.title = element_text(hjust = 0.5, face = "bold", size = 14), # Center-align and style the title
axis.title = element_text(size = 12)
)
The duration of songs generally falls between 2 and 4 minutes across
most albums, with a few albums like “Red (Taylor’s Version)” and
“Fearless (Taylor’s Version)” showing a wider range, reflecting longer
songs or additional tracks. This visualization highlights the
consistency in song lengths while showcasing variability in specific
albums.
In this section I will analyse Taylor Swift’s art on track level using data visualization.
# Prepare the data
dt_cumulative <- dt %>%
arrange(track_release) %>% # Order by release date
mutate(cumulative_songs = row_number()) # Cumulative count
# Create the timeline plot
ggplot(dt_cumulative, aes(x = track_release, y = cumulative_songs, color = cumulative_songs)) +
geom_line(size = 1) + # Line for cumulative count
geom_point(size = 2) + # Points for each release
scale_color_distiller(palette = "Set2", direction = 1) + # ColorBrewer palette
labs(
title = "Cumulative number of songs released over time",
x = "Year",
y = "Total songs released",
color = "Release Year" # Legend title
) +
theme_linedraw( # Adding theme
base_size = 11,
base_family = "",
base_line_size = 11/22,
base_rect_size = 11/22
) +
theme(
legend.position = "right", # Keep the legend on the right
plot.title = element_text(hjust = 0.5, face = "bold", size = 14) # Center-align and style the title
)
Taylor Swift’s cumulative song releases show steady growth over time,
with significant increases during major album release periods,
particularly around 2014-2015 and 2020-2021. The sharp rise in recent
years reflects her prolific output, including re-recorded albums and new
projects, demonstrating her sustained creativity and enduring impact in
the music industry.
# Step 1: Reshape data into long format
dt_top <- dt %>%
rowwise() %>%
mutate(total_score = sum(c_across(danceability:valence), na.rm = TRUE)) %>% # Summing attributes
ungroup() %>%
arrange(desc(total_score)) %>% # Sort by total score
slice(1:12) # Keep only the top 12 songsú
dt_long <- dt_top %>%
pivot_longer(
cols = c(danceability, energy, speechiness, acousticness, liveness, valence),
names_to = "attribute",
values_to = "value"
)
# Step 2: Create the faceted bar chart
ggplot(dt_long, aes(x = attribute, y = value, fill = factor(attribute))) +
geom_bar(stat = "identity", alpha = 0.8) + # Bar chart
facet_wrap(~track_name, scales = "free_x") + # Facet by track name
scale_fill_brewer(palette = "Set3") + # Use a ColorBrewer palette
labs(
title = "Top 10 Songs: Track Attributes Visualized by Song",
x = "Attributes",
y = "Value (0-1)"
) +
theme_linedraw( # Adding theme
base_size = 11,
base_family = "",
base_line_size = 11 / 22,
base_rect_size = 11 / 22
) +
theme(
legend.position = "right", # Keep the legend on the right
plot.title = element_text(hjust = 0.5, face = "bold", size = 14), # Center-align and style the title
axis.text.x = element_blank() # Remove x-axis labels
)
Generally, the energy level and danceability of Taylor Swift’s top songs
are considerably high, while speechiness and liveness tend to lag behind
other attributes. This pattern suggests that Taylor Swift intentionally
and thoughtfully crafts her songs with attributes that are expected to
resonate more with her audience.
ggplot(dt, aes(x = track_year, y = energy)) +
stat_summary(
fun = mean,
geom = "col",
fill = "#5ab4ac"
) +
labs(
title = "Average energy score by year",
x = "Track release date",
y = "Average energy score of tracks (0-1)"
) +
theme_linedraw( # Adding theme
base_size = 11,
base_family = "",
base_line_size = 11 / 22,
base_rect_size = 11 / 22
) +
theme(
legend.position = "right", # Keep the legend on the right
plot.title = element_text(hjust = 0.5, face = "bold", size = 14) # Center-align and style the title
)
The energy levels are relatively high in the early and mid-2010s,
peaking around 2015, indicating a period where her music featured a more
energetic tone. Conversely, certain years show a slight drop in energy,
reflecting possible stylistic shifts in her music. This visualization
highlights how the energy in Taylor Swift’s songs has varied over time,
reflecting evolving musical trends and artistic exploration.
ggplot(dt, aes(x = danceability, y = energy)) +
geom_point(color = "#5ab4ac", size = 3, alpha = 0.7) + # Scatterplot points
geom_smooth(method = "loess", color = "#d8b365", se = TRUE, size = 1) + # LOWESS curve
labs(
title = "Relationship between danceability and energy of tracks",
x = "Danceability",
y = "Energy"
) +
theme_linedraw( # Adding theme
base_size = 11,
base_family = "",
base_line_size = 11 / 22,
base_rect_size = 11 / 22
) +
theme(
legend.position = "right", # Keep the legend on the right
plot.title = element_text(hjust = 0.5, face = "bold", size = 14) # Center-align and style the title
)
The lowess reveals a nonlinear relationship: energy initially increases
with danceability, peaking around a danceability score of 0.6, before
gradually declining. The shaded gray region indicates the confidence
interval around the curve. This suggests that while higher danceability
often corresponds to higher energy, extremely danceable tracks may not
always exhibit the highest energy levels.
ggplot(dt, aes(x = duration_ms/60000)) +
geom_histogram(fill = "#5ab4ac", color = "#d8b365", alpha = 0.7) +
labs(
title = "Histogram of duration of songs",
x = "Duration (min)",
y = "Frequency"
) +
theme_linedraw( # Adding theme
base_size = 11,
base_family = "",
base_line_size = 11 / 22,
base_rect_size = 11 / 22
) +
theme(
legend.position = "right", # Keep the legend on the right
plot.title = element_text(hjust = 0.5, face = "bold", size = 14) # Center-align and style the title
)
Most songs have a duration between 3 and 5 minutes, with a peak around 4
minutes, suggesting this is the typical length of her tracks. A few
outliers exist with durations exceeding 6 minutes, but these are rare.
This distribution highlights a standard song length consistent with
industry norms, optimizing accessibility and listener engagement.
ggplot(dt, aes(x = key_name)) +
geom_bar(fill = "#5ab4ac", color = "#d8b365", alpha = 0.8) + # Bar chart for counts
labs(
title = "Number of tracks by keys",
x = "Key",
y = "Number of tracks"
) +
theme_linedraw( # Adding theme
base_size = 11,
base_family = "",
base_line_size = 11 / 22,
base_rect_size = 11 / 22
) +
theme(
legend.position = "right", # Keep the legend on the right
plot.title = element_text(hjust = 0.5, face = "bold", size = 14) # Center-align and style the title
)+
theme(
plot.title = element_text(hjust = 0.5, face = "bold", size = 14), # Center-align and style title
axis.title.x = element_text(size = 12),
axis.title.y = element_text(size = 12),
axis.text = element_text(size = 10)
)
The bar chart above illustrates the distribution of Taylor Swift’s songs
across different musical keys. The horizontal axis represents the keys
(e.g., A, C, G#), while the vertical axis displays the number of tracks
in each key. It is evident that keys such as C and G are prominently
used, with over 40 tracks each, making them the most frequent keys in
her music. Conversely, keys like A#, D#, and G# are less commonly used.
The chart provides insight into the tonal preferences in Taylor Swift’s
discography, highlighting a tendency toward specific keys, which may
reflect her musical style and composition choices.
Taylor Swift’s popularity can be attributed to her conscious and intentional approach to songwriting, crafting tracks that resonate deeply with her audience. The high levels of energy and danceability observed in her music, particularly in albums like 1989 and Reputation, align with attributes that are universally appealing and engaging. At the same time, her ability to balance these with more introspective and emotionally rich albums like Folklore and Evermore showcases her versatility and deep connection with her listeners. The consistency in song durations and her sustained creative output further reflect her keen understanding of audience expectations and industry dynamics. This analysis suggests that her success stems from a deliberate effort to create music that blends emotional storytelling with accessible and impactful musical elements.