I’m interested in exploring global music streaming trends and music consumption across different platforms. YouTube and Spotify are two of the most influential platforms in today’s music space. I’m curious about cross-platform trends and influences, specifically if artists with a strong YouTube presence also have a large listener base on Spotify.
While I plan to explore other aspects of the data, my core research question is, “Is there a relationship between artist engagement on YouTube and streams on Spotify?”
Approach to Answering Question
To conduct my exploration, I used https://kworb.net/, a site containing comprehensive music streaming data that is updated on an ongoing basis. Importantly, this site relies on chart data, so its streaming data is more of an estimate - not every stream is counted, particularly for less popular artists that don’t chart.
I also collected YouTube data from this site, choosing to focus specifically on the top ten artists in terms of YouTube viewership. I based my analysis on the premise that these artists have high visibility and a strong cross-platform presence that will enable me to explore trends. This focus also enabled me to compare top performers on Spotify to popular artists on YouTube so I could identify any overlap or discrepancies between the platforms.
To ensure ethical scraping, I checked the terms of use (robots.txt) to check for any restrictions on scraping the page. I also used a user agent to clearly identify myself prior to scraping the site.
Importing Data
library(readr)library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ purrr 1.1.0
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.2 ✔ tibble 3.3.0
✔ lubridate 1.9.4 ✔ tidyr 1.3.1
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Rows: 3000 Columns: 6
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (1): Artist
dbl (1): Daily
num (4): Streams, As.lead, Solo, As.feature
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
New names:
Rows: 10 Columns: 4
── Column specification
──────────────────────────────────────────────────────── Delimiter: "," chr
(1): artist dbl (1): ...1 num (2): Total_Views, Daily_Avg
ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
Specify the column types or set `show_col_types = FALSE` to quiet this message.
• `` -> `...1`
Data Wrangling
Data Wrangling: Artists Data
# rename columnsartists_data <- artists_data %>%rename(Streams_as_Lead = As.lead) %>%rename(Streams_as_Solo = Solo) %>%rename(Streams_as_Feature = As.feature)# calculate proportion of total streams as lead artistartists_data <- artists_data %>%mutate(Proportion_as_Solo =round(Streams_as_Solo / Streams, 2))
# create vector and named vectorYouTube_views <- YouTube_stats$Total_Views_Billionsnames(YouTube_views) <- YouTube_stats$Artist# add column to artists_dataartists_data$Total_Views_Billions <- YouTube_views[artists_data$Artist]# make column numericartists_data <- artists_data %>%mutate(Total_Views_Billions =as.numeric(Total_Views_Billions))
Analysis and Results
Spotify Total Streams vs. Daily Streams
artists_data %>%ggplot(aes(x=Streams, y=Daily)) +geom_point() +labs(title ="Relationship between Streams and Daily", x ="Streams", y ="Daily")
For a preliminary analysis, I investigated the relationship between total and daily streams on Spotify. My goal was to understand how daily engagement contributes to overall streaming success and why certain artists consistently rank across multiple metrics. As I expected, there appears to be a strong positive correlation between total Spotify streams and daily streams. This makes logical sense, as higher daily listener engagement will drive higher total streaming counts.
Top 10 Artists on Spotify
artists_data %>%arrange(desc(Streams)) %>%slice_max(Streams, n =10) %>%ggplot(aes(x =reorder(Artist, Streams), y = Streams)) +geom_col(fill ="green") +coord_flip() +labs(title ="Top 10 Artists in Spotify Streams",y ="Streams (in Millions)")
Drake, Taylor Swift, and Bad Bunny have a strong presence on Spotify in terms of listeners. Other top performers on the platform include The Weeknd, Justin Bieber, Ariana Grande, Ed Sheeran, Travis Scott, Eminem, and Kanye West. This ranking demonstrates which artists dominate listener engagement on Spotify. This visualization serves as a foundation for exploring cross-platform influences.
YouTube Views for Top Artists
# visualizing YouTube viewership for top artistsYouTube_stats %>%ggplot(aes(x =reorder(Artist, Total_Views_Billions), y = Total_Views_Billions)) +geom_col(fill ="red") +coord_flip() +labs(title ="Total YouTube Views for Top 10 Artists", x ="Artist", y ="Total YouTube Views")
Bad Bunny and Taylor Swift rank strongly on both YouTube and Spotify, indicating that they have strong cross-platform appeal. Justin Bieber and Ed Sheeran, and Eminem also appear in both rankings, suggesting an alignment between Spotify and YouTube engagement. However, artists like Katy Perry and Karol G appear to have strong YouTube viewership but not Spotify streams. While there are some cross-engagement trends, these discrepancies indicate that cross-platform influence may vary based on artist.
Spotify Streams for Artists with Strongest YouTube Presence
# visualizing total streams for top YouTube artistsartists_data %>%filter(tolower(Artist) %in%tolower(artist_names)) %>%# used tolower to filter without being case sensitiveggplot(aes(x =reorder(Artist, Streams), y = Streams)) +geom_col(fill ="green") +coord_flip() +labs(title ="Spotify Streams for Top 10 YouTube Artists", x ="Artist", y ="Total Spotify Streams (in Millions)")
Here, I ranked the top artists with a strong YouTube presence by their Spotify streaming numbers. This visualization reinforces that top artists on YouTube - like Taylor Swift and Bad Bunny - also have strong streaming numbers on Spotify.
I find it interesting that some artists that are popular on YouTube do not appear to compete as well with Spotify streams. For instance, Blackpink has significantly fewer Spotify streams compared to other artists, despite having the 7th highest YouTube views.
On the flip side, there are several artists with strong Spotify streams - like Drake, The Weeknd, and Ariana Grande - that are not in the top 10 YouTube artists. This suggests that artists do not necessarily need a strong YouTube engagement to achieve streaming success. Factors like audience preferences and music style could potentially impact how artists engage with their listeners.
Proportion of Spotify Streams as Lead Artist for Artists with Strong YouTube Presence
# visualizing proportion of streams as lead artistartists_data %>%filter(Artist %in% artist_names) %>%ggplot(aes(x =reorder(Artist, Proportion_as_Solo), y = Proportion_as_Solo)) +geom_col(fill ="green") +coord_flip() +labs(title ="Proportion of Spotify Streams as Solo Artist", x ="Artist", y ="Proportion as Solo")
I also examined the proportion of artists’ streams as a solo artist to add context to my analysis. This visualization reveals that artists like Taylor Swift, BTS, and Ed Sheeran have high proportions of solo streams - meaning that much of their streaming success comes from their solo work. Bad Bunny, in contrast, has a lower proportion of solo streams. This indicates that some artists may gain visibility and engagement through collaborations.
Relationship between Spotify Streams and YouTube Views
artists_data %>%filter(Artist %in% artist_names) %>%ggplot(aes(x = Streams, y = Total_Views_Billions, color = Artist)) +geom_point() +labs(title ="Relationship between Spotify Streams and YouTube Views for Top Artists", x ="Spotify Streams (in Millions)", y ="Total YouTube Views (in Billions")
# correlation test between Spotify streams and YouTube viewscor_test <-cor.test(artists_data$Streams, artists_data$Total_Views_Billions)cor_test
Pearson's product-moment correlation
data: artists_data$Streams and artists_data$Total_Views_Billions
t = 1.5695, df = 6, p-value = 0.1676
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
-0.2664806 0.9014630
sample estimates:
cor
0.5395007
This scatterplot visualizes the relationship between Spotify streams and YouTube views. Generally, the points slope slightly upward, indicating that Spotify artists with higher Spotify streams also tend to have higher views on YouTube. Notably, the points are spread out, versus clustered together; thus, some artists may be more popular on one platform over the other. To quantify this relationship, I calculated the correlation - resulting in a value of ~0.54. This suggests a moderate positive relationship between Spotify streams and YouTube views. Overall, this reinforces that cross-platform popularity varies by artist. Although there is a moderate positive relationship here, success on Spotify does not necessarily correlate with popularity on YouTube for all artists.
Conclusion
My analysis shows some alignment between YouTube viewership and Spotify streams among the top 10 artists with a strong YouTube presence. While some of the most popular artists - like Taylor Swift and Bad Bunny - dominate both platforms, other artists perform differently on each platform. Taylor Swift and Bad Bunny’s domination suggests that having a strong presence and engagement on both platforms is characteristic of top performance and popularity.
One of the biggest limitations to my analysis was having a small sample size of 10 artists due to web scraping challenges when trying to collect a larger volume of data. While YouTube is one of the top platforms for video consumption, the data is limited to just one music streaming platform, which may not necessarily reflect some listening demographics.
As for future analysis, I would be interested in incorporating more artists into the analysis. I would like to look at smaller, emerging artists versus more established performers, as I did here. I would also love to examine if factors like genre of music or marketing tactics impact cross-platform engagement.