1. Executive summary

This document aims to demonstrate my knowledge of the R programming language from the Coding 3 course. For this analysis, an online dataset from TidyTuesday was utilized, which is available on the following link: TidyTuesdayData. The aim of the analysis is to find correlations between the different attributes of Taylor Swift’s songs and her success and to observe how they have changed over time during her career.


2. Loading, merging and cleaning data

The analysis utilized 2 separate datasets from TidyTuesdayData, one that include details of the albums, and one that includes the different attributes of the songs. As primary key, I used the ‘album_name’ variable to merge these two dataset. In the analysis onwards, I used the aggregated table in a data.table format to make the visualization and filtering smoother.

# Loading data from Github and merging two tables to a dt
songs <- fread("https://raw.githubusercontent.com/torokpe/Coding3./refs/heads/main/taylor_all_songs.csv")
albums <- fread("https://raw.githubusercontent.com/torokpe/Coding3./refs/heads/main/taylor_albums.csv")

dt <- merge(songs, albums[, .SD, .SDcols = !c("ep", "album_release")], by = "album_name")

# Renaming columns to be precise
setnames(dt, old = "metacritic_score", new = "album_critic_score")
setnames(dt, old = "user_score", new = "album_user_score")


### Removing 'lyrics' and 'artist' columns because of redundancy
dt[, lyrics := NULL]
dt[, artist := NULL]

# Display the full merged dataset (dt)
reactable(dt)

3. Data aggregation and analysis

In this section I analyse Taylor Swift songs using data aggregation, filtering and grouping.

3.1 Comparing user and critic scores by albums (for albums that are not EPs)

reactable(dt[ep == FALSE, .(
  mean_user_score = mean(album_user_score, na.rm = TRUE),
  mean_critic_score = mean(album_critic_score, na.rm = TRUE) / 10
), by = album_name]
)

3.2 Checking the number of track per years

dt[, track_year := as.integer(substring(track_release, 1, 4))]
reactable(dt[, .(num_tracks = .N), by = track_year][order(track_year, decreasing = FALSE)])

3.3 Checking average danceability and energy of tracks by musical keys

library(reactable)

# Filtering out missing values in key_name
filtered_dt <- dt[!is.na(key_name), .(
  mean_danceability = mean(danceability, na.rm = TRUE),
  mean_energy = mean(energy, na.rm = TRUE)
), by = key_name]

reactable(filtered_dt)

3.4 Checking the composite of explicit and implicit tracks on album level

explicit_summary <- dt[ep == FALSE, .(
  percent_explicit = mean(explicit, na.rm = TRUE) * 100,  # Convert mean to percentage
  percent_implicit = (1 - mean(explicit, na.rm = TRUE)) * 100  # Complement of explicit
), by = album_name]

# Display the result as a table
reactable(explicit_summary, 
          columns = list(
            percent_explicit = colDef(format = colFormat(digits = 1), name = "Explicit (%)"),
            percent_implicit = colDef(format = colFormat(digits = 1), name = "Implicit (%)")
          ),
          bordered = TRUE, striped = TRUE, highlight = TRUE)

4. Data visualization

4.1 Album level insights

In this section I will analyse Taylor Swift’s art on album level using data visualization.

4.1.2 Visualizing the relationship between energy, danceability and user score on album level [DONE]

  dt_album <- dt %>%
  group_by(album_name) %>%
  summarize(
    avg_energy = mean(energy, na.rm = TRUE),
    avg_danceability = mean(danceability, na.rm = TRUE),
    album_user_score = first(na.omit(album_user_score))  # Handle NAs in album_user_score
  )

# Create the bubble chart
ggplot(dt_album, aes(x = avg_energy, y = avg_danceability, size = album_user_score)) +
  geom_point(alpha = 0.7, color = "#5ab4ac") +  # Bubbles with transparency
  geom_text_repel(aes(label = album_name), size = 3, color = "#d8b365") +  # Add album names
  labs(
    title = "Energy, Danceability and album user score of albums",
    x = "Average track energy",
    y = "Average track danceability",
    size = "Album user score"
  ) +
  theme_linedraw(  # Adding theme
    base_size = 11,
    base_family = "",
    base_line_size = 11 / 22,
    base_rect_size = 11 / 22
  ) +
  theme(
    legend.position = "right",  # Keep the legend on the right
    plot.title = element_text(hjust = 0.5, face = "bold", size = 14)  # Center-align and style the title
  )

Albums like “Reputation” and “Lover” exhibit both high energy and danceability, reflecting their vibrant and upbeat nature. In contrast, albums such as “Folklore” and “Evermore” have lower energy and danceability, aligning with their more mellow and introspective tones. The size of the bubbles indicates user ratings, with highly rated albums like “1989” standing out for their combination of danceability and energy. This visualization highlights the diverse musical styles present across Taylor Swift’s discography.

4.1.3 Visualizing the number of tracks released in an album or in extended play

ggplot(dt, aes(x = "", fill = factor(ep))) +
  geom_bar(width = 1, aes(y = ..count..)) +  # Bar chart for the base
  coord_polar(theta = "y") +  # Convert to a pie chart
  geom_text(
    aes(
      y = ..count.., 
      label = paste0(round((..count..) / sum(..count..) * 100, 1), "%")
    ),
    stat = "count",  # Use stat = "count" to compute percentages dynamically
    position = position_stack(vjust = 0.5),  # Position the labels in the middle of slices
    color = "white",
    size = 4
  ) +
  scale_fill_manual(
    values = c("TRUE" = "#d8b365", "FALSE" = "#5ab4ac"),  # Custom colors
    labels = c("TRUE" = "Extended play", "FALSE" = "Album release")  # Custom labels
  ) +
  labs(title = "Distribution of EP Status", fill = "EP Status") +
  theme_linedraw( # Adding theme
    base_size = 11,
    base_family = "",
    base_line_size = 11 / 22,
    base_rect_size = 11 / 22
  ) +
  theme(
    axis.text.y = element_blank(),  # Remove y-axis labels
    axis.text.x = element_blank(),  # Remove x-axis labels
    axis.ticks.y = element_blank(),  # Remove y-axis ticks
    axis.ticks.x = element_blank(),  # Remove x-axis ticks
    axis.title.y = element_blank(),  # Remove y-axis title
    axis.title.x = element_blank(),  # Remove x-axis title
    legend.position = "right",  # Keep the legend on the right
    plot.title = element_text(hjust = 0.5, face = "bold", size = 14)  # Center-align and style the title
  )

This pie chart illustrates the distribution of EP (Extended Play) status in Taylor Swift’s discography. The majority of her releases (95.1%) are full-length albums, while only a small fraction (4.9%) are categorized as extended plays. This emphasizes Taylor Swift’s focus on producing complete albums, which align with her storytelling approach and artistic style.

4.1.4 Visualizing track duration distribution on album level

ggplot(dt, aes(x = album_name, y = duration_ms)) +
  geom_boxplot(fill = "#5ab4ac", color = "#01665e", outlier.color = "#d8b365", outlier.shape = 16, alpha = 0.7) +
  labs(
    title = "Distribution of Song Durations by Album",
    x = "Album Name",
    y = "Duration (ms)"
  ) +
  theme_linedraw(  # Adding theme
    base_size = 11,
    base_family = "",
    base_line_size = 11 / 22,
    base_rect_size = 11 / 22
  ) +
  theme(
    axis.text.x = element_text(angle = 45, hjust = 1),  # Rotate x-axis labels for better readability
    plot.title = element_text(hjust = 0.5, face = "bold", size = 14),  # Center-align and style the title
    axis.title = element_text(size = 12)
  )

The duration of songs generally falls between 2 and 4 minutes across most albums, with a few albums like “Red (Taylor’s Version)” and “Fearless (Taylor’s Version)” showing a wider range, reflecting longer songs or additional tracks. This visualization highlights the consistency in song lengths while showcasing variability in specific albums.

4.1 Track level insights

In this section I will analyse Taylor Swift’s art on track level using data visualization.

4.2.1 Visualizing track duration distribution on album level

# Prepare the data
dt_cumulative <- dt %>%
  arrange(track_release) %>%  # Order by release date
  mutate(cumulative_songs = row_number())  # Cumulative count

# Create the timeline plot
ggplot(dt_cumulative, aes(x = track_release, y = cumulative_songs, color = cumulative_songs)) +
  geom_line(size = 1) +  # Line for cumulative count
  geom_point(size = 2) +   # Points for each release
  scale_color_distiller(palette = "Set2", direction = 1) +  # ColorBrewer palette
  labs(
    title = "Cumulative number of songs released over time",
    x = "Year",
    y = "Total songs released",
    color = "Release Year"  # Legend title
  ) +
  theme_linedraw(  # Adding theme
    base_size = 11,
    base_family = "",
    base_line_size = 11/22,
    base_rect_size = 11/22
  ) +
  theme(
    legend.position = "right",  # Keep the legend on the right
    plot.title = element_text(hjust = 0.5, face = "bold", size = 14)  # Center-align and style the title
  )

Taylor Swift’s cumulative song releases show steady growth over time, with significant increases during major album release periods, particularly around 2014-2015 and 2020-2021. The sharp rise in recent years reflects her prolific output, including re-recorded albums and new projects, demonstrating her sustained creativity and enduring impact in the music industry.

4.2.2 Visualizing the different attributes of top tracks

  # Step 1: Reshape data into long format
  dt_top <- dt %>%
  rowwise() %>%
  mutate(total_score = sum(c_across(danceability:valence), na.rm = TRUE)) %>%  # Summing attributes
  ungroup() %>%
  arrange(desc(total_score)) %>%  # Sort by total score
  slice(1:12)  # Keep only the top 12 songsú

  dt_long <- dt_top %>%
  pivot_longer(
    cols = c(danceability, energy, speechiness, acousticness, liveness, valence),
    names_to = "attribute",
    values_to = "value"
  )

# Step 2: Create the faceted bar chart
ggplot(dt_long, aes(x = attribute, y = value, fill = factor(attribute))) +
  geom_bar(stat = "identity", alpha = 0.8) +  # Bar chart
  facet_wrap(~track_name, scales = "free_x") +  # Facet by track name
  scale_fill_brewer(palette = "Set3") +  # Use a ColorBrewer palette
  labs(
    title = "Top 10 Songs: Track Attributes Visualized by Song",
    x = "Attributes",
    y = "Value (0-1)"
  ) +
  theme_linedraw( # Adding theme
    base_size = 11,
    base_family = "",
    base_line_size = 11 / 22,
    base_rect_size = 11 / 22
  ) +
  theme(
    legend.position = "right",  # Keep the legend on the right
    plot.title = element_text(hjust = 0.5, face = "bold", size = 14),  # Center-align and style the title
    axis.text.x = element_blank()  # Remove x-axis labels
  )

Generally, the energy level and danceability of Taylor Swift’s top songs are considerably high, while speechiness and liveness tend to lag behind other attributes. This pattern suggests that Taylor Swift intentionally and thoughtfully crafts her songs with attributes that are expected to resonate more with her audience.

4.2.3 Visualizing the mean energy score of TS tracks by year

ggplot(dt, aes(x = track_year, y = energy)) +
  stat_summary(
    fun = mean,  
    geom = "col", 
    fill = "#5ab4ac"
  ) +
  labs(
    title = "Average energy score by year",
    x = "Track release date",
    y = "Average energy score of tracks (0-1)"
  ) +
  theme_linedraw(  # Adding theme
    base_size = 11,
    base_family = "",
    base_line_size = 11 / 22,
    base_rect_size = 11 / 22
  ) +
  theme(
    legend.position = "right",  # Keep the legend on the right
    plot.title = element_text(hjust = 0.5, face = "bold", size = 14)  # Center-align and style the title
  )

The energy levels are relatively high in the early and mid-2010s, peaking around 2015, indicating a period where her music featured a more energetic tone. Conversely, certain years show a slight drop in energy, reflecting possible stylistic shifts in her music. This visualization highlights how the energy in Taylor Swift’s songs has varied over time, reflecting evolving musical trends and artistic exploration.

4.2.4 Visualizing the danceability and energy of tracks

ggplot(dt, aes(x = danceability, y = energy)) +
  geom_point(color = "#5ab4ac", size = 3, alpha = 0.7) +  # Scatterplot points
  geom_smooth(method = "loess", color = "#d8b365", se = TRUE, size = 1) +  # LOWESS curve
  labs(
    title = "Relationship between danceability and energy of tracks",
    x = "Danceability",
    y = "Energy"
  ) +
  theme_linedraw( # Adding theme
    base_size = 11,
    base_family = "",
    base_line_size = 11 / 22,
    base_rect_size = 11 / 22
  ) +
  theme(
    legend.position = "right",  # Keep the legend on the right
    plot.title = element_text(hjust = 0.5, face = "bold", size = 14)  # Center-align and style the title
  )

The lowess reveals a nonlinear relationship: energy initially increases with danceability, peaking around a danceability score of 0.6, before gradually declining. The shaded gray region indicates the confidence interval around the curve. This suggests that while higher danceability often corresponds to higher energy, extremely danceable tracks may not always exhibit the highest energy levels.

4.2.5 Visualizing the distribution of track durations

ggplot(dt, aes(x = duration_ms/60000)) +
  geom_histogram(fill = "#5ab4ac", color = "#d8b365", alpha = 0.7) +
  labs(
    title = "Histogram of duration of songs",
    x = "Duration (min)",
    y = "Frequency"
  ) +
  theme_linedraw(  # Adding theme
    base_size = 11,
    base_family = "",
    base_line_size = 11 / 22,
    base_rect_size = 11 / 22
  ) +
  theme(
    legend.position = "right",  # Keep the legend on the right
    plot.title = element_text(hjust = 0.5, face = "bold", size = 14)  # Center-align and style the title
  )

Most songs have a duration between 3 and 5 minutes, with a peak around 4 minutes, suggesting this is the typical length of her tracks. A few outliers exist with durations exceeding 6 minutes, but these are rare. This distribution highlights a standard song length consistent with industry norms, optimizing accessibility and listener engagement.

4.2.6 Visualizing the number of tracks per keys

ggplot(dt, aes(x = key_name)) +
  geom_bar(fill = "#5ab4ac", color = "#d8b365", alpha = 0.8) +  # Bar chart for counts
  labs(
    title = "Number of tracks by keys",
    x = "Key",
    y = "Number of tracks"
  ) +
  theme_linedraw( # Adding theme
    base_size = 11,
    base_family = "",
    base_line_size = 11 / 22,
    base_rect_size = 11 / 22
  ) +
  theme(
    legend.position = "right",  # Keep the legend on the right
    plot.title = element_text(hjust = 0.5, face = "bold", size = 14)  # Center-align and style the title
  )+
  theme(
    plot.title = element_text(hjust = 0.5, face = "bold", size = 14),  # Center-align and style title
    axis.title.x = element_text(size = 12),
    axis.title.y = element_text(size = 12),
    axis.text = element_text(size = 10)
  )

The bar chart above illustrates the distribution of Taylor Swift’s songs across different musical keys. The horizontal axis represents the keys (e.g., A, C, G#), while the vertical axis displays the number of tracks in each key. It is evident that keys such as C and G are prominently used, with over 40 tracks each, making them the most frequent keys in her music. Conversely, keys like A#, D#, and G# are less commonly used. The chart provides insight into the tonal preferences in Taylor Swift’s discography, highlighting a tendency toward specific keys, which may reflect her musical style and composition choices.


5. Conclusion

Taylor Swift’s popularity can be attributed to her conscious and intentional approach to songwriting, crafting tracks that resonate deeply with her audience. The high levels of energy and danceability observed in her music, particularly in albums like 1989 and Reputation, align with attributes that are universally appealing and engaging. At the same time, her ability to balance these with more introspective and emotionally rich albums like Folklore and Evermore showcases her versatility and deep connection with her listeners. The consistency in song durations and her sustained creative output further reflect her keen understanding of audience expectations and industry dynamics. This analysis suggests that her success stems from a deliberate effort to create music that blends emotional storytelling with accessible and impactful musical elements.