An Analysis of Music Streaming Data

Author

J. Farmer

Introduction

I am going to be performing an analysis on streaming data for the most streamed songs of 2023 on Spotify. Each row is a track. This data set has charts and playlists for both Spotify and Apple Music, artists, track name, release date, key, beats per minute (bpm), mode, different metrics for each song (displayed as a percentage), and more. Some of these metrics include danceability, valence, energy, etc. More information can be found here: https://www.kaggle.com/datasets/nelgiriyewithana/most-streamed-spotify-songs-2024. You can download the data here.

spotify_data <- read_csv("https://myxavier-my.sharepoint.com/:x:/g/personal/farmerj5_xavier_edu/Ee5fUBDiG7ZDg7DOyp1lhWMBwAKYi8Vat0P4eioOhsCmeg?download=1")

Research Question

Does release month have any effect on streams? I want to know if the month a song was released changes the amount of streams. I intend to have release month on the x axis. On the y axis, I intend to have the average number of streams, grouped by release month.

spotify_data$streams <- as.numeric(spotify_data$streams, na.rm = TRUE)

spotify_data %>% 
  select(released_month, streams) %>% 
  group_by(released_month) %>% 
  summarize(`Average Streams` = mean(streams, na.rm = TRUE)) %>% 
  mutate(released_month = month(released_month, label = TRUE)) %>% 
  arrange(desc(`Average Streams`)) %>% 
  mutate(released_month = factor(released_month, levels = unique(released_month))) %>%
  ggplot(aes(x = as.factor(released_month), y=`Average Streams`)) +
  geom_bar(stat = "identity") +
  scale_y_continuous(name = "Average number of Streams", 
                     labels = scales::comma) +
  scale_x_discrete(name = "Release Month") +
  ggtitle("Streams and Release Month", 
          subtitle = "Average number of Streams by Release Month")

We see that the months of September and January have the highest average number of streams for songs released in those respective months. Months like December and February have lower average views.

--- title: "An Analysis of Music Streaming Data" author: "J. Farmer" editor: visual toc: true # Generates an automatic table of contents. format: # Options related to formatting. html: # Options related to HTML output. code-tools: TRUE # Allow the code tools option showing in the output, can switch to false embed-resources: TRUE # Embeds all components into a single HTML file. execute: # Options related to the execution of code chunks. warning: FALSE # FALSE: Code chunk sarnings are hidden by default. message: FALSE # FALSE: Code chunk messages are hidden by default. echo: TRUE # TRUE: Show all code in the output, can change to false --- ## Introduction I am going to be performing an analysis on streaming data for the most streamed songs of 2023 on Spotify. Each row is a track. This data set has charts and playlists for both Spotify and Apple Music, artists, track name, release date, key, beats per minute (bpm), mode, different metrics for each song (displayed as a percentage), and more. Some of these metrics include danceability, valence, energy, etc. More information can be found here: <https://www.kaggle.com/datasets/nelgiriyewithana/most-streamed-spotify-songs-2024>. You can download the data [here](https://myxavier-my.sharepoint.com/:x:/g/personal/farmerj5_xavier_edu/Ee5fUBDiG7ZDg7DOyp1lhWMBwAKYi8Vat0P4eioOhsCmeg?download=1). ```{r} #| include: FALSE library(tidyverse) library(skimr) ``` ```{r} #| label: load data #| message: FALSE spotify_data <- read_csv("https://myxavier-my.sharepoint.com/:x:/g/personal/farmerj5_xavier_edu/Ee5fUBDiG7ZDg7DOyp1lhWMBwAKYi8Vat0P4eioOhsCmeg?download=1") ``` ## Research Question Does release month have any effect on streams? I want to know if the month a song was released changes the amount of streams. I intend to have release month on the x axis. On the y axis, I intend to have the average number of streams, grouped by release month. ```{r} spotify_data$streams <- as.numeric(spotify_data$streams, na.rm = TRUE) spotify_data %>% select(released_month, streams) %>% group_by(released_month) %>% summarize(`Average Streams` = mean(streams, na.rm = TRUE)) %>% mutate(released_month = month(released_month, label = TRUE)) %>% arrange(desc(`Average Streams`)) %>% mutate(released_month = factor(released_month, levels = unique(released_month))) %>% ggplot(aes(x = as.factor(released_month), y=`Average Streams`)) + geom_bar(stat = "identity") + scale_y_continuous(name = "Average number of Streams", labels = scales::comma) + scale_x_discrete(name = "Release Month") + ggtitle("Streams and Release Month", subtitle = "Average number of Streams by Release Month") ``` We see that the months of September and January have the highest average number of streams for songs released in those respective months. Months like December and February have lower average views.