An Analysis of Music Streaming Data

Author

J. Farmer

Introduction

I am going to be performing an analysis on streaming data for the most streamed songs of 2023 on Spotify. Each row is a track. This data set has charts and playlists for both Spotify and Apple Music, artists, track name, release date, key, beats per minute (bpm), mode, different metrics for each song (displayed as a percentage), and more. Some of these metrics include danceability, valence, energy, etc. More information can be found here: https://www.kaggle.com/datasets/nelgiriyewithana/most-streamed-spotify-songs-2024. You can download the data here.

spotify_data <- read_csv("https://myxavier-my.sharepoint.com/:x:/g/personal/farmerj5_xavier_edu/Ee5fUBDiG7ZDg7DOyp1lhWMBwAKYi8Vat0P4eioOhsCmeg?download=1")

Research Question

Does release month have any effect on streams? I want to know if the month a song was released changes the amount of streams. I intend to have release month on the x axis. On the y axis, I intend to have the average number of streams, grouped by release month.

spotify_data$streams <- as.numeric(spotify_data$streams, na.rm = TRUE)

spotify_data %>% 
  select(released_month, streams) %>% 
  group_by(released_month) %>% 
  summarize(`Average Streams` = mean(streams, na.rm = TRUE)) %>% 
  mutate(released_month = month(released_month, label = TRUE)) %>% 
  arrange(desc(`Average Streams`)) %>% 
  mutate(released_month = factor(released_month, levels = unique(released_month))) %>%
  ggplot(aes(x = as.factor(released_month), y=`Average Streams`)) +
  geom_bar(stat = "identity") +
  scale_y_continuous(name = "Average number of Streams", 
                     labels = scales::comma) +
  scale_x_discrete(name = "Release Month") +
  ggtitle("Streams and Release Month", 
          subtitle = "Average number of Streams by Release Month")

We see that the months of September and January have the highest average number of streams for songs released in those respective months. Months like December and February have lower average views.