2025-11-02

The Dataset

  • For this project, I have chosen a Spotify data set from Kaggle.

  • This data set contains over 900 songs along with audio features from Spotify’s API (e.g. danceability, energy, valence)

  • The goal of this project is to analyze what makes a song popular and the correlation between a track’s popularity and it’s danceability, energy, and valence.

Data

spotify_data = read.csv("spotify-2023.csv", sep = ",", header = TRUE, 
                        fileEncoding = "Latin1", stringsAsFactors = FALSE)
names(spotify_data)

spotify_cleaned <- spotify_data %>%
  filter(!is.na(track_name)) %>%
  distinct(track_name, .keep_all = TRUE)

##  [1] "track_name"           "artist.s._name"       "artist_count"        
##  [4] "released_year"        "released_month"       "released_day"        
##  [7] "in_spotify_playlists" "in_spotify_charts"    "streams"             
## [10] "in_apple_playlists"   "in_apple_charts"      "in_deezer_playlists" 
## [13] "in_deezer_charts"     "in_shazam_charts"     "bpm"                 
## [16] "key"                  "mode"                 "danceability_."      
## [19] "valence_."            "energy_."             "acousticness_."      
## [22] "instrumentalness_."   "liveness_."           "speechiness_."

Table of Contents

  • Scatter Plot: shows the top 50 most streamed songs up until 2023 with chart rankings and release year.

  • Pie Chart: Illustrates the top artists behind the Top 50 most streamed songs.

  • Bar Chart: Top 10 songs by their valence.

  • Line Plot: Shows the danceability of top 10 songs along with their release year compared to their valence.

  • Statistical Analysis: Shows the average valence, energy and danceability of the top 50 songs .

Get Top 50

After data has been cleaned, take top 50 of the highest streamed values.

top_50 <- spotify_cleaned %>%
  mutate(streams = as.numeric(gsub(",", "", streams)),
         streams = round(streams, 0)) %>%
  filter(!is.na(streams)) %>%
  arrange(desc(streams)) %>%
  slice_head(n = 50)

Scatter Plot

In this scatter plot, I have plotted 3 different values, comparing the top 50 most streamed songs up until the year 2023 with their Spotify chart rankings and colored by their release year.

Pie Chart

The following is a pie chart demonstrating the top 10 artists of the songs listed in the Top 50 most streamed.

Bar Chart

Top 10 songs by their valence. Valence being the positivity of the song’s musical content.

Bar Chart Analysis

Upon further investigation, we can see an interesting trend in this bar chart. While this data set shows the top of the spotify charts for the year 2023, none of the most streamed songs were released in the year 2023. Not only this, but the valence of these songs vary and there is no direct correlation between a song being more positive and it’s popularity.

Line Plot

How danceability behaves throughout the years

Statistical analysis

Average valence, danceability and energy of most streamed songs up until 2023.

##   average_valence average_danceability average_energy
## 1           47.92                 61.2          60.74

Conclusion

My initial theory was that there would be a direct correlation between the valence, energy and danceability of top songs but these values seem to vary. The valence of a song does not determine its danceability.