── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ purrr 1.1.0
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.2 ✔ tibble 3.3.0
✔ lubridate 1.9.4 ✔ tidyr 1.3.1
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
An Exploration of Music Streaming Behaviors
Introduction to Analysis
As someone who loves listening to music and digging into the statistics behind my streaming habits, I’ve always been intrigued by how people listen to music, how streaming behaviors evolve, and how artists find success across different platforms.
The rise of major streaming networks has only deepened my interest in the music streaming landscape. Streaming has transformed the music industry, becoming the primary way most people access their favorite songs. In fact, the number of music streaming subscribers has recently climbed to new heights, with data showing that over 700 million listeners now discover or revisit their favorite artists through online platforms (Statista).
As an avid music listener, I’m particularly drawn to the intersection of artists, platforms, and the numbers that reflect listening behavior. I enjoy breaking down the data behind popular tracks and trending artists.
I’m especially curious about how listener preferences shape an artist’s overall popularity and reach. That being said, my core research question centers on how listening habits contribute to artists’ success and streaming performance.
Beyond my main research question, I’m also interested in exploring broader trends, such as top artists and genres, subscription types, and how artist popularity differences by country. My exploration will also address several secondary questions, including the following:
Who are some of the most popular artists globally?
Does the choice of streaming platform or subscription type influence listening time and engagement?
Do listener behaviors and preferences align with top Spotify artists’ global streaming performance?
To explore these questions, I will analyze a multi-platform dataset that focuses on individual listeners. I will supplement this data with Spotify-specific artist streaming data to compare individual listening preferences and platform-wide streaming performance.
Data Source
Rows: 5000 Columns: 12
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (7): User_ID, Country, Streaming Platform, Top Genre, Most Played Artist...
dbl (5): Age, Minutes Streamed Per Day, Number of Songs Liked, Discover Week...
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Overview of Data
The dataset that I am using for my exploration is sourced from Kaggle - a platform containing a large repository of public datasets. The dataset I selected provides insights into global music streaming trends, spanning 2018 to 2024. This data was collected across popular streaming platforms such as Apple Music, Spotify, and YouTube Music. It contains information regarding streaming preferences and behaviors for individual listeners. Not only does it cover engagement metrics like streaming minutes and liked songs, but it also contains demographic information such as a user’s age and country of residence.
The dataset contains 12 variables - including 7 categorical and 5 continuous - and 5,000 observations. That being said, I believe this data is robust enough to provide valuable insight into the music industry, global streaming trends, and listening behaviors.
Data Dictionary
The dataset includes the following variables:
User ID
Age
Country
Streaming Platform
Top Genre
Minutes Streamed
Number of Liked Songs
Most Played Artist
Subscription Type
Listening Time
Discover Weekly Engagement (%)
Repeat Song Rate (%)
A full data dictionary can be found here: Data Dictionary.xlsx
Summary Statistics
User_ID Age Country Streaming Platform
Length:5000 Min. :13.00 Length:5000 Length:5000
Class :character 1st Qu.:25.00 Class :character Class :character
Mode :character Median :37.00 Mode :character Mode :character
Mean :36.66
3rd Qu.:49.00
Max. :60.00
Top Genre Minutes Streamed Per Day Number of Songs Liked
Length:5000 Min. : 10.0 Min. : 1.0
Class :character 1st Qu.:161.0 1st Qu.:126.0
Mode :character Median :316.0 Median :254.0
Mean :309.2 Mean :253.5
3rd Qu.:457.2 3rd Qu.:382.0
Max. :600.0 Max. :500.0
Most Played Artist Subscription Type Listening Time (Morning/Afternoon/Night)
Length:5000 Length:5000 Length:5000
Class :character Class :character Class :character
Mode :character Mode :character Mode :character
Discover Weekly Engagement (%) Repeat Song Rate (%)
Min. :10.02 Min. : 5.00
1st Qu.:30.15 1st Qu.:24.20
Median :50.42 Median :41.96
Mean :50.30 Mean :42.39
3rd Qu.:70.34 3rd Qu.:60.74
Max. :89.99 Max. :79.99
| Name | global_music_data |
| Number of rows | 5000 |
| Number of columns | 12 |
| _______________________ | |
| Column type frequency: | |
| character | 7 |
| numeric | 5 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| User_ID | 0 | 1 | 5 | 5 | 0 | 5000 | 0 |
| Country | 0 | 1 | 2 | 11 | 0 | 10 | 0 |
| Streaming Platform | 0 | 1 | 5 | 12 | 0 | 6 | 0 |
| Top Genre | 0 | 1 | 3 | 9 | 0 | 10 | 0 |
| Most Played Artist | 0 | 1 | 3 | 13 | 0 | 10 | 0 |
| Subscription Type | 0 | 1 | 4 | 7 | 0 | 2 | 0 |
| Listening Time (Morning/Afternoon/Night) | 0 | 1 | 5 | 9 | 0 | 3 | 0 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| Age | 0 | 1 | 36.66 | 13.76 | 13.00 | 25.00 | 37.00 | 49.00 | 60.00 | ▇▇▇▇▇ |
| Minutes Streamed Per Day | 0 | 1 | 309.24 | 172.03 | 10.00 | 161.00 | 316.00 | 457.25 | 600.00 | ▇▇▇▇▇ |
| Number of Songs Liked | 0 | 1 | 253.52 | 146.37 | 1.00 | 126.00 | 254.00 | 382.00 | 500.00 | ▇▇▇▇▇ |
| Discover Weekly Engagement (%) | 0 | 1 | 50.30 | 23.17 | 10.02 | 30.15 | 50.42 | 70.34 | 89.99 | ▇▇▇▇▇ |
| Repeat Song Rate (%) | 0 | 1 | 42.39 | 21.44 | 5.00 | 24.20 | 41.96 | 60.74 | 79.99 | ▇▇▇▇▇ |
Based on the summary statistics, the dataset is complete with no missing values, which indicates that it is reliable and prepared for analysis.
Music Streaming Time: On average, users stream approximately 309 minutes per day (~5 hours). However, this average could be influenced by extreme listeners, as there is a high standard deviation of 172 for this variable; this suggests that there could be substantial variability in daily listening habits.
Listener Engagement: Users also show a strong tendency to engage with the content they like, with there being an average of 254 liked songs per person; again, the high standard deviation of 146 shows diversity in engagement. The average Discover Weekly Engagement is 50.29%, indicating moderate engagement with content that is recommended by platforms’ algorithms. The average repeat song rate is 21.4%, which suggests that many streams are replays. Thus, there is a balance between revisiting songs and exploring new music.
Listener Demographics: The user base spans a broad age range, with the average age being 37 years.
Overall, the summary statistics show a fairly engaged and diverse user base with high variation in listening time and engagement.
Descriptive Analysis
# A tibble: 10 × 2
`Most Played Artist` `Number of Listeners`
<chr> <int>
1 Bad Bunny 528
2 Adele 519
3 Post Malone 519
4 Dua Lipa 502
5 Ed Sheeran 500
6 Taylor Swift 496
7 The Weeknd 490
8 BTS 488
9 Drake 481
10 Billie Eilish 477
This table highlights the top ten most played artists amongst listeners in the dataset: Bad Bunny, Adele, Post Malone, Dua Lipa, Ed Sheeran, Taylor Swift, The Weeknd, BTS, Drake, and Billie Eilish. These artists appear to be popular among individual users, representing favorites of many listeners. Later in the analysis, I will explore whether this popularity aligns with broader streaming trends on platforms like Spotify, or if it reflects the preferences of more dedicated, individual users rather than platform-wide listening patterns. This list also reflects diverse genre tastes, including rap, hip-hop, reggaeton, and pop, suggesting that listeners engage with a variety of musical styles.
I analyzed the top genres by the number of listeners who identified each as their favorite. While no single genre overwhelmingly dominates, reggae, jazz, and EDM emerge as the most popular among listeners, suggesting a fairly diverse range of music preferences overall.
`summarise()` has grouped output by 'Country'. You can override using the
`.groups` argument.
Out of curiosity, I decided to explore geographic differences in streaming habits; I opted to narrow my focus to the U.S. and Canada due to their proximity. The visualization shows that in Canada, the most popular top genres are EDM and jazz, while metal and classical are the least popular. In the U.S., EDM and rock lead in popularity, whereas hip-hop and R&B are the least popular. Both countries show similar listener numbers for EDM and reggae. A notable difference is R&B, which is more popular in Canada than in the U.S., where it ranks as the second least popular genre.
To assess user engagement, I calculated an engagement rate by dividing the number of songs liked by minutes streamed per day. I opted to use this new value to measure user engagement, rather than the “Discover Weekly Engagement”, to gauge how much users interact with music relative to their listening time - rather than relying on a single playlist-based metric. I then grouped the data by subscription type and computed the average engagement rate for each group. This analysis allowed me to compare how users with free versus premium subscriptions interact with the platform relative to the time they spend listening. Unsurprisingly, premium platform users demonstrate stronger engagement/interaction with music. This aligns with the idea that they are more invested in music and are therefore willing to pay for better streaming access or listening perks.
To investigate platform differences in streaming time across platforms, I created a boxplot to show the distribution of minutes streamed per day for each platform. The visualization conveys that both the median and variability of listening time are relatively similar across platforms, generally ranging from around 150 to 450 minutes. This indicates that users spend similar amounts of time streaming music regardless of their chosen platform.
I also wanted to examine whether streaming minutes vary based on time of day when users tend to listen the most. The boxplot shows that there is not much difference in listening time across preferred listening periods. If anything, users who primarily listen in the morning show slightly less variability in their streaming minutes. Overall, this suggests that a user’s preferred listening time does not strongly influence the amount of time they spend streaming.
[1] 0.004165853
I initially hypothesized that streaming minutes might vary with age, expecting younger individuals to stream more. However, the scatterplot and correlation analysis do not support this idea. With a correlation of just 0.004, there is essentially no relationship between age and streaming minutes, suggesting that music streaming habits are relatively consistent across age groups.
[1] 0.02348691
Beyond demographic factors like age, I also explored the relationship between streaming minutes and the number of songs a user liked, expecting a positive correlation. However, the correlation was only 0.02, indicating virtually no relationship.
[1] -0.004280388
The correlation between streaming minutes and Discover Weekly Engagement was even lower, at -0.004, further suggesting that engagement metrics do not strongly relate to total listening time.
Secondary Data Source
The data source that I chose to supplement my analysis focuses on artist data. Since the primary dataset features data for individual users, I collected additional data on artist streaming numbers to provide a broader perspective. I sourced this data from https://kworb.net/, a site containing comprehensive music streaming data that is updated on an ongoing basis. This site relies on chart data and provides insight into artist music success from the most-streamed artists of all time.
Notably, this data is limited to Spotify, so it does not capture streaming behavior on other platforms. Despite this limitation, it provides valuable information, including total Spotify streams, average daily streams, and a breakdown of streams by the artist’s role - whether the artist was a solo, primary lead, or feature.
With Spotify being the most popular music streaming provider globally and accounting for over 30% of all streaming subscribers (Statista), this data is relevant for understanding artist-level performance and music streaming trends.
I collected this data using ethical scraping practices; I checked the terms of use (robots.txt) to check for any restrictions on scraping the page and used a user agent to clearly identify myself prior to scraping the site.
This dataset contains 7 variables - 2 categorical and 5 continuous and 3,000 observations. It includes the following variables:
Artist Ranking - based on total number of streams
Artist
Streams - total Spotify streams in millions
Daily - average daily Spotify streams in millions per day
As.lead - streams on songs where the artist in the primary lead
Solo - streams on songs where the artist is the only singer
As.feature - songs from streams where the artist is featured
Note: Through data wrangling, I added an additional variable to represent the proportion of an artist’s total streams that stem from solo songs.
New names:
Rows: 3000 Columns: 7
── Column specification
──────────────────────────────────────────────────────── Delimiter: "," chr
(1): Artist dbl (2): ...1, Daily num (4): Streams, As.lead, Solo, As.feature
ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
Specify the column types or set `show_col_types = FALSE` to quiet this message.
• `` -> `...1`
Data Wrangling
Artist Ranking Artist Streams Daily
Min. : 1.0 Length:3000 Min. : 1209 Min. : 0.082
1st Qu.: 750.8 Class :character 1st Qu.: 1606 1st Qu.: 0.636
Median :1500.5 Mode :character Median : 2382 Median : 1.179
Mean :1500.5 Mean : 4586 Mean : 2.247
3rd Qu.:2250.2 3rd Qu.: 4375 3rd Qu.: 2.384
Max. :3000.0 Max. :123555 Max. :65.124
Streams_as_Lead Streams_as_Solo Streams_as_Feature Proportion_as_Solo
Min. : 0.3 Min. : 0.1 Min. : 0.1 Min. :0.0000
1st Qu.: 1194.8 1st Qu.: 509.7 1st Qu.: 124.8 1st Qu.:0.2100
Median : 1809.8 Median : 1289.6 Median : 570.3 Median :0.5400
Mean : 3351.2 Mean : 2402.3 Mean : 1387.8 Mean :0.5344
3rd Qu.: 3315.0 3rd Qu.: 2469.2 3rd Qu.: 1444.9 3rd Qu.:0.8900
Max. :111989.9 Max. :102299.7 Max. :39244.5 Max. :1.0000
NA's :26 NA's :123 NA's :269 NA's :123
| Name | artists |
| Number of rows | 3000 |
| Number of columns | 8 |
| _______________________ | |
| Column type frequency: | |
| character | 1 |
| numeric | 7 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| Artist | 0 | 1 | 1 | 45 | 0 | 2998 | 0 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| Artist Ranking | 0 | 1.00 | 1500.50 | 866.17 | 1.00 | 750.75 | 1500.50 | 2250.25 | 3000.00 | ▇▇▇▇▇ |
| Streams | 0 | 1.00 | 4585.58 | 7310.67 | 1208.70 | 1606.08 | 2381.75 | 4374.92 | 123555.10 | ▇▁▁▁▁ |
| Daily | 0 | 1.00 | 2.25 | 3.61 | 0.08 | 0.64 | 1.18 | 2.38 | 65.12 | ▇▁▁▁▁ |
| Streams_as_Lead | 26 | 0.99 | 3351.22 | 5620.19 | 0.30 | 1194.75 | 1809.80 | 3315.02 | 111989.90 | ▇▁▁▁▁ |
| Streams_as_Solo | 123 | 0.96 | 2402.30 | 4393.09 | 0.10 | 509.70 | 1289.60 | 2469.20 | 102299.70 | ▇▁▁▁▁ |
| Streams_as_Feature | 269 | 0.91 | 1387.84 | 2783.06 | 0.10 | 124.85 | 570.30 | 1444.90 | 39244.50 | ▇▁▁▁▁ |
| Proportion_as_Solo | 123 | 0.96 | 0.53 | 0.35 | 0.00 | 0.21 | 0.54 | 0.89 | 1.00 | ▆▃▃▃▇ |
Notably, some values are missing for the variables representing an artist’s role on a song. This may occur because certain artists exclusively appear as featured performers and do not have solo tracks, which results in the absence of data for these categories.
[1] 0.8745859
For a preliminary analysis, I investigated the relationship between total and average daily streams on Spotify. My goal was to understand how daily engagement contributes to overall streaming success and why certain artists consistently rank across multiple metrics. As I expected, there appears to be a strong positive correlation between total Spotify streams and daily streams. To quantify this, I calculated the correlation - resulting in 0.87, suggesting a strong positive relationship between average daily streams and total streams. This makes logical sense, as higher daily listener engagement will drive higher total streaming counts.
Drake, Taylor Swift, and Bad Bunny have a strong presence on Spotify in terms of listeners. Other top performers on the platform include The Weeknd, Justin Bieber, Ariana Grande, Ed Sheeran, Travis Scott, Eminem, and Kanye West. This ranking demonstrates which artists dominate listener engagement on Spotify.
I wanted to identify what proportion of popular artists’ streams come from solo songs versus collaborations, as this can provide insight into how individual popularity versus partnerships can contribute to overall streaming success. For instance, Taylor Swift derives the majority of her streams from solo songs, indicating that her own personal fanbase strongly drives her streaming numbers. Ed Sheeran and Ariana Grande also have high proportions of streams from solo tracks, which suggests that these artists’ individual appear to play a role in their popularity on Spotify. In contrast, Justin Bieber, Drake, and Bad Bunny derive most of their streams from collaborations or features, suggesting that these artists may benefit from other artists’ visibility. Looking at these proportions can help contextualize how much of an artist’s success on Spotify is driven by their solo work versus collaborations.
I investigated whether users’ most played artists align with streaming trends on Spotify. In the global user streaming data, which includes multiple platforms, Bad Bunny, Post Malone, and Adele were among the most popular artists. In contrast, the Spotify-specific artist data shows that Drake, Taylor Swift, and Bad Bunny are the top-streamed artists. Notably, Bad Bunny, Drake, Taylor Swift, The Weeknd, and Ed Sheeran appear in both groups, suggesting strong popularity among individual listeners and on Spotify. However, Post Malone, Adele, Dua Lipa, BTS, and Billie Eilish rank highly in the multi-platform user data but are not represented in Spotify’s top 10. This indicates that these artists could perhaps be popular amongst fewer, dedicated fans, or that these artists may derive more of their streams from other platforms besides Spotify. Conversely, Ariana Grande, Travis Scott, Eminem, and Kanye West had high streaming numbers on Spotify but were not represented as users’ most played artists, showing that some artists’ Spotify streams may not fully reflect broader user listening habits.
Overall, this suggests that individual users’ favorite artists do not always match the artist with the highest total streams on Spotify. Some artists, like Drake and Taylor Swift, may have widespread streaming across many listeners, while others, like Post Malone or Adele, may be highly favored by certain individuals but have lower overall Spotify total streaming numbers. This highlights how personal listening habits may not always reflect overall platform-wide streaming popularity.
Conclusion & Key Takeaways
Through this analysis, I explored music streaming behavior from the lens of both listeners and artists. I supplemented global streaming data focusing on listener preferences with streaming data from Spotify - one of the largest and most relevant streaming platforms. I wanted to understand how listening habits, engagement, and other factors relate to artist popularity and overall platform performance. Altogether, my analysis provided insight into top artists and listener engagement patterns and preferences.
While my analysis covered a wide range of different music streaming topics, it faced several limitations. Firstly, the global streaming data was limited to 5,000 listeners. Although sizable, this ultimately represents a small fraction of the total streaming population, meaning that underrepresented demographics or listening patterns could have skewed the results. A second limitation is that the secondary dataset only focused on Spotify, whereas the primary dataset covered multiple platforms. Despite these constraints, I was still able to conduct a thorough analysis and extract meaningful insights about listening behaviors and artist popularity.
Overall, my analysis resulted in several key takeaways:
Listener preferences are diverse, with no one genre dominating user preferences significantly.
Premium subscribers are more engaged with music relative to their total listening time compared to free users.
Streaming minutes do not vary significantly by platform, preferred listening time, or age. Additionally, total streaming minutes show minimum correlation with engagement metrics like liked songs and Discover Weekly playlist interaction. This suggests that listening time and engagement are two distinct dimensions of listening behavior.
Popularity among individual users does not always align with overall platform-wide streams.
- For instance, Adele and Post Malone’s strong popularity amongst individual users did not match Spotify’s streaming numbers.
Some artists derive most of their streams from solo songs, suggesting that individual popularity strongly drives their Spotify success.
- Taylor Swift and Ed Sheeran are prime examples of this.