Overview of Datasets & Sources

Datasets

This analysis uses three Spotify tables: a 2025 top songs table, a 2025 top artists table, and an all-time top songs table. The main analysis focuses on the two song tables because they share the same song-level variables and can be compared directly.

Key Variables:

streams - The number of streams

danceability, energy, valence, acousticness, bpm - Audio features

primary_genre, release_year, dataset - context and grouping variables

Libraries & Loading Data

# Libraries
library(ggplot2)
library(plotly)
library(dplyr)

# Loading Data
top_songs_2025 <- read.csv("spotify_wrapped_2025_top50_songs.csv")
top_artists_2025 <- read.csv("spotify_wrapped_2025_top50_artists.csv")
top_songs_all <- read.csv("spotify_alltime_top100_songs.csv")

Cleaning Data

top_songs_2025 <- top_songs_2025 %>% 
  rename(streams = streams_2025_billions)
top_songs_all <- top_songs_all %>% 
  rename(streams = total_streams_billions)

common_cols <- c("song_title", "artist", "primary_genre", "release_year",
                 "bpm", "streams", "danceability", "energy", "valence",
                 "acousticness")

top_songs_2025_clean <- top_songs_2025 %>% select(all_of(common_cols))
top_songs_all_clean <- top_songs_all %>% select(all_of(common_cols))

top_songs_2025_clean$dataset <- "2025"
top_songs_all_clean$dataset <- "All Time"

combined_songs <- bind_rows(top_songs_2025_clean, top_songs_all_clean)
combined_songs$dataset <- factor(
  combined_songs$dataset, levels = c("2025", "All Time"))

3D Plotly: Danceability, Energy, Streams

3D Plot Analysis

Danceability - Most songs cluster at moderate to high danceability levels, indicating that popular songs tend to share similar audio characteristics. However, this clustering suggests that once songs meet a certain threshold of danceability, further increases do not significantly impact popularity.

Energy - Energy levels are generally high across both datasets, reinforcing that popular songs tend to follow a similar energetic profile. This lack of variation indicates that energy alone does not differentiate highly streamed songs.

Streams - Despite similar audio characteristics, all-time songs show much higher stream counts. This suggests that differences in popularity are not driven by audio features alone, but are likely influenced by external factors such as time exposure, artist popularity, and platform dynamics.

Plotly Scatterplot: Danceability vs Streams

Scatter Plot Analysis: Danceability vs Streams

The scatter plot shows no clear correlation between danceability and streams, as high and low stream values occur across similar danceability levels. This suggests that danceability is not a strong predictor of popularity.

Songs from both datasets overlap in danceability, but all-time songs reach much higher stream counts, likely due to longer exposure and accumulation over time rather than differences in audio features alone.

ggplot Bar Chart: Energy Distribution

Bar Chart Analysis: Energy Distribution

Energy levels for both datasets are concentrated in the mid-to-high range, indicating that most popular songs tend to have moderate to high energy. There is no significant difference between 2025 and all-time songs, suggesting that energy remains a consistent characteristic of popular music over time.

ggplot Boxplot: Streams by Dataset

Statistical Analysis

combined_songs %>%
  group_by(dataset) %>%
  summarise(
    Mean_Danceability = round(mean(danceability, na.rm = TRUE), 2),
    Mean_Energy = round(mean(energy, na.rm = TRUE), 2),
    Mean_Streams = round(mean(streams, na.rm = TRUE), 2),
    SD_Streams = round(sd(streams, na.rm = TRUE), 2)
  )
## # A tibble: 2 × 5
##   dataset  Mean_Danceability Mean_Energy Mean_Streams SD_Streams
##   <fct>                <dbl>       <dbl>        <dbl>      <dbl>
## 1 2025                  0.68        0.65         0.93       0.3 
## 2 All Time              0.61        0.64         2.52       0.85

Statistical Analysis Interpretation

Comprehensive Analysis:

  • Danceability & Energy: 2025 songs have slightly higher average danceability (0.68 vs 0.61) and similar energy levels, indicating modern songs are marginally more danceable.

  • Streams: All-time songs have significantly higher average streams (2.52 vs 0.93), reflecting long-term accumulation of popularity.

  • Variation: All-time songs show much higher variability in streams (SD = 0.85 vs 0.30), suggesting a wider spread in popularity.

  • Conclusion: While audio features remain relatively consistent across datasets, popularity differs greatly, with older songs benefiting from sustained exposure over time.

Key Findings

  • Danceability & Energy: Both 2025 and all-time songs show similar levels, with 2025 songs being slightly more danceable. This suggests modern songs follow similar audio patterns to past hits.

  • Streams: All-time songs have significantly higher streams, as shown across the scatter plot, 3D plot, and boxplot. This is largely due to longer exposure over time.

  • Variation in Popularity: All-time songs exhibit much greater variability in streams, indicating a wider range of popularity compared to 2025 songs.

  • Relationship between Features & Streams: The scatter plot shows no clear correlation between danceability and streams, suggesting that popularity is not driven by these audio features alone.

  • Overall Insight: While song characteristics remain fairly consistent across time, popularity is influenced more by external factors (e.g., time, exposure, trends) than by audio features alone.

Thank You