I learned that, if requested, Spotify sends users their extended streaming history data. Since I’m excited about data science, of course I had to do it. My idea with this script is to slowly write code over time, as I have spare time, to analyze my data. Additionally, I plan on periodically request my streaming data.
In case you’d like to try it for yourself, feel free to request your own extended streaming data and copy any chunks of code I wrote. As a disclaimer, I’m not trying to code the most efficient way in here. It’s just supposed to be fun. Don’t judge my code!
The data are in multiple .json files with the same prefix. So, first I read the files and merged them into a single data frame, and then removed unnecessary columns.
library(jsonlite)
library(tidyverse)
library(viridis)
library(treemapify)
library(ggrepel)
'%&%' = function(a,b) paste (a,b,sep='')
# list of subfiles to read
list_of_json_files <- list.files(pattern='Streaming_History_Audio')
# read each file and append to a merged data frame
for (f in list_of_json_files){
tmp <- fromJSON(readLines(f))
if (exists('full_df')){
full_df <- rbind(full_df, tmp)
rm(tmp)
} else { full_df <- tmp }
}
# only keep columns I want
full_df <- full_df %>% select(master_metadata_album_artist_name, master_metadata_album_album_name, master_metadata_track_name, ts, ms_played) %>% drop_na()
# make sure the timestamp column is in the correct format
full_df$ts <- as_datetime(full_df$ts)
# renaming first two columns (names are too big!)
colnames(full_df)[1:3] <- c('artist_name','album_name','track_name')
The original data contains a ‘skipped’ column with Boolean values, however most cells were empty. Thus, I decided to implement my own ‘Did I skip this song?’ algorithm, which basically is: if I listened to a song for less than 10 seconds, it’s considered as skipped.
# keeps songs that I listened to for less than 10 seconds
songs_skipped <- full_df %>% filter(ms_played<=9999)
# analyze how much I've skipped songs
songs_skipped_summary <- songs_skipped %>% group_by(track_name, artist_name) %>%
summarise(times_skipped=n()) %>% unique()
# plot a histogram
ggplot(songs_skipped_summary, aes(x=times_skipped)) + geom_histogram() +
xlab('Number of times I have skipped a song') + ylab('Count') +
ggtitle('How many times I have skipped songs') + theme_bw()
As we can see, most songs I have skipped few times. Those are probably songs Spotify recommended to me once or twice, and I skipped them. However, there are songs I have skipped a lot! Let’s see the top 10 most skipped songs of all time:
# 10 most skipped songs of all time
songs_skipped_summary %>% arrange(desc(times_skipped)) %>% head(n=10)
## # A tibble: 10 × 3
## # Groups: track_name [10]
## track_name artist_name times_skipped
## <chr> <chr> <int>
## 1 Bubblegum Bitch MARINA 361
## 2 Daddy Issues The Neighbourhood 300
## 3 Sweater Weather The Neighbourhood 296
## 4 How to Be a Heartbreaker MARINA 259
## 5 Primadonna MARINA 257
## 6 Why'd You Only Call Me When You're High? Arctic Monkeys 250
## 7 Oh No! MARINA 248
## 8 Diet Mountain Dew Lana Del Rey 213
## 9 Toxic Britney Spears 212
## 10 jealousy, jealousy Olivia Rodrigo 210
Bingo! Those are songs I definitely enjoy, but sometimes I just don’t wanna listen to them.
OK, so now that I was able to briefly analyze the data corresponding to my skipped songs, let’s take a look at the ones I actually listened to. First, let’s see my top 100 most streamed songs of all my time as a Spotify user.
# keeps songs that I listened to for at least 10 seconds
songs_listened <- full_df %>% filter(ms_played>=10000)
# analyze how much I've streamed songs
songs_listened_summary <- songs_listened %>% group_by(track_name, artist_name) %>%
summarise(times_streamed=n()) %>% unique()
# 100 most streamed songs of all time
songs_listened_summary %>% arrange(desc(times_streamed)) %>% print(n=100)
## # A tibble: 17,002 × 3
## # Groups: track_name [15,507]
## track_name artist_name times_streamed
## <chr> <chr> <int>
## 1 Mariners Apartment Complex Lana Del Rey 478
## 2 American Money BØRNS 451
## 3 Daddy Issues The Neighbo… 447
## 4 Crier tout bas Cœur De Pir… 389
## 5 Moondust Jaymes Young 386
## 6 Pretty When You Cry Lana Del Rey 369
## 7 Sweater Weather The Neighbo… 353
## 8 Young And Beautiful Lana Del Rey 353
## 9 Bubblegum Bitch MARINA 350
## 10 Ultraviolence Lana Del Rey 346
## 11 Cinnamon Girl Lana Del Rey 341
## 12 Hypnotic Zella Day 323
## 13 Doin' Time Lana Del Rey 321
## 14 The Emotion BØRNS 318
## 15 Diet Mountain Dew Lana Del Rey 317
## 16 Place de la République Cœur De Pir… 309
## 17 Liability Lorde 307
## 18 People Watching Conan Gray 300
## 19 Background Barcelona 293
## 20 Lunchbox Friends Melanie Mar… 292
## 21 Ride Lana Del Rey 291
## 22 Why'd You Only Call Me When You're High? Arctic Monk… 291
## 23 Cherry Lana Del Rey 289
## 24 Snap Out Of It Arctic Monk… 283
## 25 Somebody Else The 1975 283
## 26 West Coast Lana Del Rey 279
## 27 I'll Be Good Jaymes Young 276
## 28 jealousy, jealousy Olivia Rodr… 268
## 29 Fake Happy Paramore 267
## 30 National Anthem Lana Del Rey 263
## 31 Softcore The Neighbo… 263
## 32 Dernière danse Indila 260
## 33 Chemtrails Over The Country Club Lana Del Rey 259
## 34 Therapy All Time Low 259
## 35 How to Be a Heartbreaker MARINA 257
## 36 Somewhere Only We Know Keane 256
## 37 Dark Paradise Lana Del Rey 254
## 38 Oh No! MARINA 247
## 39 Cough Syrup Young the G… 242
## 40 Tourner Dans Le Vide Indila 242
## 41 Infinity Jaymes Young 240
## 42 Sur la lune Bigflo & Oli 240
## 43 Electric Love BØRNS 238
## 44 Mini World Indila 238
## 45 Elle me dit MIKA 236
## 46 Supermassive Black Hole Muse 234
## 47 Before the Worst The Script 231
## 48 All I Wanted Paramore 229
## 49 Circus Britney Spe… 229
## 50 The Blackest Day Lana Del Rey 228
## 51 Icy Kim Petras 226
## 52 Thnks fr th Mmrs Fall Out Boy 226
## 53 Friends Chase Atlan… 225
## 54 J'suis pas dupe Pomme 223
## 55 Buzzcut Season Lorde 222
## 56 Man's World MARINA 222
## 57 Physical Dua Lipa 222
## 58 Past Lives BØRNS 221
## 59 Judas Lady Gaga 220
## 60 Primadonna MARINA 220
## 61 Sex, Drugs, Etc. Beach Weath… 220
## 62 Venus Fly Trap MARINA 220
## 63 S.O.S Indila 219
## 64 Comment je vais faire Hoshi 217
## 65 Six Degrees of Separation The Script 215
## 66 Take Me To Church Hozier 215
## 67 Wires The Neighbo… 215
## 68 yes, and? Ariana Gran… 215
## 69 Emperor's New Clothes Panic! At T… 213
## 70 Chasing Cars Snow Patrol 212
## 71 Complicated Avril Lavig… 212
## 72 Remembering Sunday All Time Low 211
## 73 Brooklyn Baby Lana Del Rey 208
## 74 High School Sweethearts Melanie Mar… 208
## 75 Don’t Blame Me Taylor Swift 205
## 76 When You're Gone Avril Lavig… 205
## 77 Happy MARINA 202
## 78 My Immortal Evanescence 202
## 79 Criminal Britney Spe… 201
## 80 More Than a Friend girli 201
## 81 Hard Times Paramore 200
## 82 Misery Business Paramore 200
## 83 Nightcall London Gram… 200
## 84 The Climb Miley Cyrus 200
## 85 7 rings Ariana Gran… 199
## 86 Born To Die Lana Del Rey 198
## 87 Norman fucking Rockwell Lana Del Rey 198
## 88 Washing Machine Heart Mitski 198
## 89 White Mustang Lana Del Rey 198
## 90 Feel Something Jaymes Young 197
## 91 Maniac Conan Gray 197
## 92 Sign of the Times Harry Styles 197
## 93 Lies MARINA 196
## 94 Fuck it I love you Lana Del Rey 195
## 95 brutal Olivia Rodr… 195
## 96 Sofisticated Fuck Princess Please Leave Me Alone Carissa's W… 194
## 97 Toxic Britney Spe… 194
## 98 Womanizer Britney Spe… 194
## 99 Stargirl Interlude The Weeknd 193
## 100 All You Wanna Do SIX 192
## # ℹ 16,902 more rows
I feel like any comment about my top 100 list would be TMI. It is what it is.
Now, let’s see the top 10 songs per year.
# sum amount of times I've streamed each song per year
songs_listened_summary <- songs_listened %>% group_by(track_name, artist_name, year(ts)) %>%
summarise(times_streamed=n()) %>% unique()
# rename "year(ts)" column
colnames(songs_listened_summary)[3] <- c('year')
# get top 10 per year
top10songs_listened <- songs_listened_summary %>% group_by(year) %>% arrange(desc(times_streamed)) %>%
slice(1:10)
# print full data frame
top10songs_listened %>% arrange(desc(times_streamed)) %>% print(n=nrow(top10songs_listened))
## # A tibble: 97 × 4
## # Groups: year [11]
## track_name artist_name year times_streamed
## <chr> <chr> <dbl> <int>
## 1 yes, and? Ariana Gra… 2024 215
## 2 Good Luck, Babe! Chappell R… 2024 179
## 3 American Money BØRNS 2022 177
## 4 J'suis pas dupe Pomme 2020 168
## 5 Icy Kim Petras 2022 166
## 6 JOYRIDE Kesha 2024 164
## 7 Crier tout bas Cœur De Pi… 2020 163
## 8 we can't be friends (wait for your love) Ariana Gra… 2024 160
## 9 Houdini Dua Lipa 2024 154
## 10 Comment je vais faire Hoshi 2020 149
## 11 Mini World Indila 2020 145
## 12 jealousy, jealousy Olivia Rod… 2024 145
## 13 Dernière danse Indila 2020 140
## 14 Diet Mountain Dew Lana Del R… 2024 140
## 15 Washing Machine Heart Mitski 2024 140
## 16 the boy is mine Ariana Gra… 2024 137
## 17 Cinnamon Girl Lana Del R… 2024 132
## 18 Moondust Jaymes You… 2022 130
## 19 Avenir Louane 2020 129
## 20 Sur la lune Bigflo & O… 2020 129
## 21 THE QUIET Troye Sivan 2022 128
## 22 National Anthem Lana Del R… 2023 127
## 23 All You Wanna Do SIX 2022 125
## 24 Get Down SIX 2022 125
## 25 Ta reine Angèle 2020 124
## 26 Lunchbox Friends Melanie Ma… 2019 122
## 27 Balance ton quoi Angèle 2020 121
## 28 Écoute Chérie Vendredi s… 2020 120
## 29 Doin' Time Lana Del R… 2023 119
## 30 Do Me a Favour Arctic Mon… 2022 116
## 31 Softcore The Neighb… 2022 116
## 32 Why'd You Only Call Me When You're High? Arctic Mon… 2023 116
## 33 Crier tout bas Cœur De Pi… 2019 114
## 34 Diet Mountain Dew Lana Del R… 2023 114
## 35 Infinity Jaymes You… 2021 110
## 36 Worth It for the Feeling Rebecca Bl… 2022 109
## 37 Liability Lorde 2019 106
## 38 Man's World MARINA 2021 104
## 39 Moondust Jaymes You… 2021 101
## 40 Mariners Apartment Complex Lana Del R… 2019 100
## 41 Bubblegum Bitch MARINA 2022 100
## 42 Somewhere Only We Know Keane 2019 99
## 43 Don’t Blame Me Taylor Swi… 2023 99
## 44 More Than a Friend girli 2021 97
## 45 Nightcall London Gra… 2019 93
## 46 Sign of the Times Harry Styl… 2019 93
## 47 Dernière danse Indila 2019 92
## 48 Place de la République Cœur De Pi… 2019 92
## 49 Snap Out Of It Arctic Mon… 2023 92
## 50 Friends Chase Atla… 2021 91
## 51 Karma Taylor Swi… 2023 91
## 52 Softcore The Neighb… 2023 91
## 53 My Immortal Evanescence 2019 90
## 54 Buzzcut Season Lorde 2023 90
## 55 Judas Lady Gaga 2023 90
## 56 I'll Be Good Jaymes You… 2021 87
## 57 La baie Clara Luci… 2021 87
## 58 Crier tout bas Cœur De Pi… 2018 85
## 59 Hypnotic Zella Day 2021 82
## 60 Venus Fly Trap MARINA 2021 81
## 61 American Money BØRNS 2021 79
## 62 When You're Gone Avril Lavi… 2018 78
## 63 Place de la République Cœur De Pi… 2018 74
## 64 Wires The Neighb… 2018 73
## 65 Mariners Apartment Complex Lana Del R… 2018 69
## 66 Before the Worst The Script 2018 67
## 67 Six Degrees of Separation The Script 2018 66
## 68 Ultraviolence Lana Del R… 2018 66
## 69 Russian Roulette Rihanna 2018 65
## 70 The Blackest Day Lana Del R… 2018 65
## 71 Partition Beyoncé 2015 5
## 72 Booty Jennifer L… 2015 2
## 73 Don't Cha The Pussyc… 2015 2
## 74 Flawless Remix (feat. Nicki Minaj) Beyoncé 2015 2
## 75 G.U.Y. Lady Gaga 2015 2
## 76 Gasolina - DJ Buddah Remix Daddy Yank… 2015 2
## 77 Rabiosa Shakira 2015 2
## 78 Sweet Dreams Beyoncé 2015 2
## 79 Your Body Christina … 2015 2
## 80 How Long Tove Lo 2025 2
## 81 Mr. Brightside The Killers 2025 2
## 82 Toy Soldier Britney Sp… 2025 2
## 83 WUTD Genesis Ow… 2025 2
## 84 4 Minutes (feat. Justin Timberlake and Timb… Madonna 2015 1
## 85 Back To Black Amy Wineho… 2016 1
## 86 Far Side Of The Moon Tinashe 2016 1
## 87 Impossible - Main Shontelle 2016 1
## 88 Somebody That I Used To Know Gotye 2016 1
## 89 Til It Happens To You Lady Gaga 2016 1
## 90 Always On the Run Yuksek 2017 1
## 91 Je bois et puis je danse Aline 2017 1
## 92 A Year Without Rain Selena Gom… 2025 1
## 93 Accidentally In Love Counting C… 2025 1
## 94 Adoleta Kelly Key 2025 1
## 95 All The Things She Said t.A.T.u. 2025 1
## 96 Amor de Que (Brega Funk Remix) Pabllo Vit… 2025 1
## 97 Apologize Timbaland 2025 1
It is interesting to see my top 10 songs of each year, because I definitely see how it changes over the years. It is also possible to notice that I basically did not use Spotify between 2015-2017, so I will make sure to remove those years from my data frame. Also, I think I should remove the current year from the list.
# only keep streams that occurred in 2018 or after
songs_listened <- songs_listened %>% filter(year(ts)>2017, year(ts)!=2025)
Now, what are the artists I listen to the most? I have my own guesses, but let’s see what the data tell us.
# get the total amount of times I've listened to each artist
artists_frequency <- songs_listened %>% group_by(artist_name) %>%
summarise(times_listened=n())
# print the top 100
artists_frequency %>% arrange(desc(times_listened)) %>% print(n=100)
## # A tibble: 4,443 × 2
## artist_name times_listened
## <chr> <int>
## 1 Lana Del Rey 12307
## 2 MARINA 4519
## 3 Britney Spears 4367
## 4 All Time Low 4309
## 5 Paramore 4240
## 6 Taylor Swift 3888
## 7 The Neighbourhood 3365
## 8 Lady Gaga 3073
## 9 Barcelona 2891
## 10 Melanie Martinez 2878
## 11 Fall Out Boy 2820
## 12 Arctic Monkeys 2743
## 13 Jaymes Young 2664
## 14 Dua Lipa 2466
## 15 BØRNS 2410
## 16 The Script 2212
## 17 Ariana Grande 2068
## 18 Billie Eilish 2007
## 19 Olivia Rodrigo 2006
## 20 Panic! At The Disco 1948
## 21 Keane 1915
## 22 Conan Gray 1851
## 23 Kim Petras 1776
## 24 The Cab 1760
## 25 MIKA 1600
## 26 Angèle 1551
## 27 Lorde 1532
## 28 Cœur De Pirate 1516
## 29 Florence + The Machine 1388
## 30 Charli xcx 1331
## 31 Troye Sivan 1250
## 32 Indila 1219
## 33 Miley Cyrus 1217
## 34 Beyoncé 1173
## 35 Luísa Sonza 1158
## 36 Rihanna 1138
## 37 Avril Lavigne 1044
## 38 Mikky Ekko 1036
## 39 Birdy 1006
## 40 The Killers 986
## 41 Chappell Roan 941
## 42 IAMX 937
## 43 Katy Perry 892
## 44 SIX 882
## 45 Evanescence 873
## 46 La Roux 857
## 47 Sandy e Junior 832
## 48 Harry Styles 829
## 49 The Weeknd 801
## 50 Linkin Park 796
## 51 Adele 789
## 52 Tove Lo 784
## 53 Louane 783
## 54 Weathers 774
## 55 girl in red 770
## 56 Hoshi 733
## 57 Pabllo Vittar 700
## 58 Kesha 687
## 59 Bilal Hassani 673
## 60 Muse 657
## 61 Pomme 621
## 62 Rina Sawayama 621
## 63 Kylie Minogue 620
## 64 My Chemical Romance 610
## 65 Carissa's Wierd 601
## 66 Chase Atlantic 593
## 67 High School Musical Cast 570
## 68 Selena Gomez 557
## 69 Bigflo & Oli 552
## 70 Zella Day 552
## 71 Wallows 527
## 72 Pitty 517
## 73 Mitski 505
## 74 Vendredi sur Mer 501
## 75 Hozier 500
## 76 Stephen Schwartz 499
## 77 Stromae 487
## 78 RBD 477
## 79 Allie X 470
## 80 Sabrina Carpenter 468
## 81 Lily Allen 447
## 82 Carla 434
## 83 Demi Lovato 427
## 84 Beach Weather 424
## 85 Tate McRae 421
## 86 The 1975 411
## 87 The Ready Set 411
## 88 The Pussycat Dolls 407
## 89 Sufjan Stevens 405
## 90 Hey Violet 401
## 91 Troy 394
## 92 Halsey 388
## 93 NX Zero 387
## 94 Glee Cast 384
## 95 Adam Lambert 380
## 96 The Veronicas 375
## 97 Tom Odell 362
## 98 Bentley Robles 354
## 99 Seafret 352
## 100 OneRepublic 340
## # ℹ 4,343 more rows
Am I surprised Lana Del Rey is in the first place? No. Am I shocked by the difference between her and the second place? A bit. But overall, the top 100 fairly represents my musical taste (duh!).
Let’s do top 10 per year now.
# get the total amount of times I've listened to each artist per year
artists_frequency <- songs_listened %>% group_by(artist_name, year(ts)) %>%
summarise(times_listened=n())
# rename "year(ts)" column
colnames(artists_frequency)[2] <- c('year')
# get top 10 per year
top10artists_listened <- artists_frequency %>% group_by(year) %>% arrange(desc(times_listened)) %>%
slice(1:10)
# print full data frame
top10artists_listened %>% arrange(desc(times_listened)) %>% print(n=nrow(top10artists_listened))
## # A tibble: 70 × 3
## # Groups: year [7]
## artist_name year times_listened
## <chr> <dbl> <int>
## 1 Lana Del Rey 2024 2840
## 2 Lana Del Rey 2023 2622
## 3 Lana Del Rey 2022 2243
## 4 Britney Spears 2024 2076
## 5 Lana Del Rey 2019 1553
## 6 All Time Low 2019 1472
## 7 Jaymes Young 2020 1448
## 8 All Time Low 2020 1446
## 9 Lana Del Rey 2021 1324
## 10 Barcelona 2020 1291
## 11 Taylor Swift 2023 1251
## 12 Melanie Martinez 2019 1228
## 13 The Cab 2020 1227
## 14 Paramore 2023 1227
## 15 Olivia Rodrigo 2024 1168
## 16 Britney Spears 2023 1164
## 17 Ariana Grande 2024 1117
## 18 MARINA 2021 1097
## 19 Angèle 2020 1070
## 20 MIKA 2020 1054
## 21 Billie Eilish 2024 1016
## 22 Dua Lipa 2024 970
## 23 Charli xcx 2024 958
## 24 Taylor Swift 2022 948
## 25 Chappell Roan 2024 937
## 26 The Neighbourhood 2023 919
## 27 Lana Del Rey 2020 909
## 28 Paramore 2022 890
## 29 Lady Gaga 2024 862
## 30 Arctic Monkeys 2023 859
## 31 Dua Lipa 2020 832
## 32 MARINA 2024 819
## 33 Lana Del Rey 2018 816
## 34 Fall Out Boy 2023 811
## 35 The Neighbourhood 2022 810
## 36 BØRNS 2020 798
## 37 The Script 2020 762
## 38 Arctic Monkeys 2022 762
## 39 Jaymes Young 2021 749
## 40 MARINA 2022 710
## 41 MARINA 2023 692
## 42 BØRNS 2022 691
## 43 Conan Gray 2023 684
## 44 Kim Petras 2022 673
## 45 The Script 2019 651
## 46 Britney Spears 2022 644
## 47 Beyoncé 2022 636
## 48 Florence + The Machine 2021 626
## 49 Keane 2021 566
## 50 Birdy 2019 553
## 51 Lady Gaga 2021 553
## 52 Lady Gaga 2023 540
## 53 Weathers 2021 532
## 54 The Neighbourhood 2018 481
## 55 Fall Out Boy 2019 460
## 56 Indila 2019 455
## 57 The Script 2021 422
## 58 Olivia Rodrigo 2021 411
## 59 Britney Spears 2021 368
## 60 IAMX 2019 359
## 61 Cœur De Pirate 2019 355
## 62 Keane 2019 331
## 63 The Script 2018 309
## 64 Avril Lavigne 2018 275
## 65 Keane 2018 269
## 66 Paramore 2018 264
## 67 Fall Out Boy 2018 261
## 68 Cœur De Pirate 2018 260
## 69 All Time Low 2018 244
## 70 Panic! At The Disco 2018 238
I feel like the breakdown per year does not have as much information as the previous one, but it is possible to see how in some years I was more into certain music genres than others.
Just out of curiosity, I would like to know how often the same artists are found within my top 10 across all years.
# count how many times an artist appears in a top 10
top10artists_frequency <- top10artists_listened %>% group_by(artist_name) %>% summarise(times_in_top10=n())
# make a treemap
ggplot(top10artists_frequency, aes(area=times_in_top10, fill=times_in_top10, label=artist_name, subgroup=times_in_top10)) + labs(fill='Artist') + geom_treemap() + geom_treemap_text() + geom_treemap_subgroup_border(color='black') + geom_treemap_subgroup_text(place='centre', grow=T, alpha=0.6) + theme(legend.position='none') + scale_fill_viridis()
In the treemap above, gray numbers represent how many times the artists in each subgroup (same color) are found in a top 10. Lana Del Rey is the only artist that has appeared in a top 10 every year. No surprises there.
Now, let’s see what are my most streamed albums. To do this, I will group by both album name and artist name, as there could be albums released by different authors that might have the same name.
# get the total amount of times I've listened to each album
album_frequency <- songs_listened %>% group_by(album_name, artist_name) %>%
summarise(times_listened=n())
# print the top 100
album_frequency %>% arrange(desc(times_listened)) %>% print(n=100)
## # A tibble: 10,580 × 3
## # Groups: album_name [9,737]
## album_name artist_name times_listened
## <chr> <chr> <int>
## 1 Born To Die - The Paradise Edition Lana Del R… 2675
## 2 Norman Fucking Rockwell! Lana Del R… 2492
## 3 K-12 Melanie Ma… 1859
## 4 Dopamine BØRNS 1819
## 5 Ultraviolence Lana Del R… 1767
## 6 Future Nostalgia Dua Lipa 1392
## 7 AM Arctic Mon… 1391
## 8 Chemtrails Over The Country Club Lana Del R… 1340
## 9 Electra Heart MARINA 1312
## 10 Lust For Life Lana Del R… 1281
## 11 Mini World Indila 1185
## 12 SOUR Olivia Rod… 1059
## 13 Circus (Deluxe Version) Britney Sp… 1015
## 14 Ancient Dreams In A Modern Land MARINA 991
## 15 Wiped Out! The Neighb… 945
## 16 Brol Angèle 942
## 17 Feel Something Jaymes You… 933
## 18 Hard To Imagine The Neighbourhood Ever Changing The Neighb… 921
## 19 Blonde Cœur De Pi… 913
## 20 Honeymoon Lana Del R… 891
## 21 folklore Taylor Swi… 857
## 22 I Love You. The Neighb… 838
## 23 Melodrama Lorde 822
## 24 MANIA Fall Out B… 818
## 25 Blackout Britney Sp… 812
## 26 Cry Baby Melanie Ma… 797
## 27 After Laughter Paramore 786
## 28 This Is Why Paramore 777
## 29 The Rise and Fall of a Midwest Princess Chappell R… 758
## 30 Brand New Eyes Paramore 741
## 31 reputation Taylor Swi… 730
## 32 Chambre 12 Louane 706
## 33 Slut Pop Kim Petras 705
## 34 Absolutes Barcelona 699
## 35 The Family Jewels MARINA 690
## 36 DOCE 22 Luísa Sonza 684
## 37 Riot! Paramore 680
## 38 Froot MARINA 669
## 39 Favourite Worst Nightmare Arctic Mon… 667
## 40 Death of a Bachelor Panic! At … 655
## 41 Dark Star Jaymes You… 641
## 42 La Roux La Roux 638
## 43 SIX: LIVE ON OPENING NIGHT (Original Broadway Ca… SIX 637
## 44 Last Young Renegade All Time L… 629
## 45 RENAISSANCE Beyoncé 629
## 46 Lock Me Up The Cab 619
## 47 Superache Conan Gray 619
## 48 GUTS Olivia Rod… 616
## 49 Il suffit d'y croire Hoshi 593
## 50 Pure Heroine Lorde 589
## 51 Femme Fatale (Deluxe Version) Britney Sp… 584
## 52 Paramore Paramore 582
## 53 Kid Krow Conan Gray 581
## 54 Love + Fear MARINA 573
## 55 #3 Deluxe Version The Script 565
## 56 Hopes And Fears Keane 565
## 57 Symphony Soldier The Cab 554
## 58 The Fame Lady Gaga 547
## 59 Not Quite Yours Barcelona 545
## 60 Chromatica Lady Gaga 540
## 61 The Script The Script 534
## 62 ARTPOP Lady Gaga 533
## 63 Kicker Zella Day 524
## 64 Brol La Suite Angèle 521
## 65 Habits of My Heart Jaymes You… 519
## 66 eternal sunshine Ariana Gra… 518
## 67 thank u, next Ariana Gra… 515
## 68 Too Weird to Live, Too Rare to Die! Panic! At … 508
## 69 Did you know that there's a tunnel under Ocean B… Lana Del R… 503
## 70 Lungs Florence +… 502
## 71 Wicked Stephen Sc… 499
## 72 Premiers émois Vendredi s… 497
## 73 Born This Way Lady Gaga 493
## 74 Dirty Work All Time L… 490
## 75 Future Hearts All Time L… 488
## 76 Love Me Barcelona 483
## 77 racine carrée Stromae 479
## 78 Fine Line Harry Styl… 469
## 79 Midnights Taylor Swi… 469
## 80 Blue Neighbourhood - Deluxe Troye Sivan 461
## 81 The Fame Monster Lady Gaga 460
## 82 Save Rock And Roll Fall Out B… 456
## 83 Fallen Evanescence 451
## 84 The Origin Of Love MIKA 451
## 85 Kingdom Bilal Hass… 447
## 86 Time Mikky Ekko 443
## 87 Hot Fuss The Killers 433
## 88 Nothing Personal (Deluxe Version) All Time L… 430
## 89 In The Zone Britney Sp… 423
## 90 Happier Than Ever Billie Eil… 422
## 91 No Place In Heaven MIKA 418
## 92 Radical Optimism Dua Lipa 414
## 93 Basic Man Barcelona 413
## 94 Let Go Avril Lavi… 411
## 95 La vie de rêve Bigflo & O… 403
## 96 Roses Cœur De Pi… 403
## 97 No Sound Without Silence The Script 387
## 98 American Beauty/American Psycho Fall Out B… 384
## 99 Freedom Child The Script 375
## 100 WHEN WE ALL FALL ASLEEP, WHERE DO WE GO? Billie Eil… 367
## # ℹ 10,480 more rows
To be honest, I did not expect to see some albums on my top 10 so high on this list!
Contrary to what most people believe, Spotify does not pay artist royalties according to a per-play or per-stream rate. It is more complicated than that. However… what if they did? According to a totally not in-depth and scientific research I did, most artists are paid roughly 0.004 USD per stream. So using a little bit of math, let’s see how much money artists have made from my streams alone.
# get the total amount of times I've listened to each artist and multiply by the "payment rate"
artists_revenue <- songs_listened %>% group_by(artist_name) %>%
summarise(revenue=n()*0.004)
# print the top 100
artists_revenue %>% arrange(desc(revenue)) %>% print(n=100)
## # A tibble: 4,443 × 2
## artist_name revenue
## <chr> <dbl>
## 1 Lana Del Rey 49.2
## 2 MARINA 18.1
## 3 Britney Spears 17.5
## 4 All Time Low 17.2
## 5 Paramore 17.0
## 6 Taylor Swift 15.6
## 7 The Neighbourhood 13.5
## 8 Lady Gaga 12.3
## 9 Barcelona 11.6
## 10 Melanie Martinez 11.5
## 11 Fall Out Boy 11.3
## 12 Arctic Monkeys 11.0
## 13 Jaymes Young 10.7
## 14 Dua Lipa 9.86
## 15 BØRNS 9.64
## 16 The Script 8.85
## 17 Ariana Grande 8.27
## 18 Billie Eilish 8.03
## 19 Olivia Rodrigo 8.02
## 20 Panic! At The Disco 7.79
## 21 Keane 7.66
## 22 Conan Gray 7.40
## 23 Kim Petras 7.10
## 24 The Cab 7.04
## 25 MIKA 6.4
## 26 Angèle 6.20
## 27 Lorde 6.13
## 28 Cœur De Pirate 6.06
## 29 Florence + The Machine 5.55
## 30 Charli xcx 5.32
## 31 Troye Sivan 5
## 32 Indila 4.88
## 33 Miley Cyrus 4.87
## 34 Beyoncé 4.69
## 35 Luísa Sonza 4.63
## 36 Rihanna 4.55
## 37 Avril Lavigne 4.18
## 38 Mikky Ekko 4.14
## 39 Birdy 4.02
## 40 The Killers 3.94
## 41 Chappell Roan 3.76
## 42 IAMX 3.75
## 43 Katy Perry 3.57
## 44 SIX 3.53
## 45 Evanescence 3.49
## 46 La Roux 3.43
## 47 Sandy e Junior 3.33
## 48 Harry Styles 3.32
## 49 The Weeknd 3.20
## 50 Linkin Park 3.18
## 51 Adele 3.16
## 52 Tove Lo 3.14
## 53 Louane 3.13
## 54 Weathers 3.10
## 55 girl in red 3.08
## 56 Hoshi 2.93
## 57 Pabllo Vittar 2.8
## 58 Kesha 2.75
## 59 Bilal Hassani 2.69
## 60 Muse 2.63
## 61 Pomme 2.48
## 62 Rina Sawayama 2.48
## 63 Kylie Minogue 2.48
## 64 My Chemical Romance 2.44
## 65 Carissa's Wierd 2.40
## 66 Chase Atlantic 2.37
## 67 High School Musical Cast 2.28
## 68 Selena Gomez 2.23
## 69 Bigflo & Oli 2.21
## 70 Zella Day 2.21
## 71 Wallows 2.11
## 72 Pitty 2.07
## 73 Mitski 2.02
## 74 Vendredi sur Mer 2.00
## 75 Hozier 2
## 76 Stephen Schwartz 2.00
## 77 Stromae 1.95
## 78 RBD 1.91
## 79 Allie X 1.88
## 80 Sabrina Carpenter 1.87
## 81 Lily Allen 1.79
## 82 Carla 1.74
## 83 Demi Lovato 1.71
## 84 Beach Weather 1.70
## 85 Tate McRae 1.68
## 86 The 1975 1.64
## 87 The Ready Set 1.64
## 88 The Pussycat Dolls 1.63
## 89 Sufjan Stevens 1.62
## 90 Hey Violet 1.60
## 91 Troy 1.58
## 92 Halsey 1.55
## 93 NX Zero 1.55
## 94 Glee Cast 1.54
## 95 Adam Lambert 1.52
## 96 The Veronicas 1.5
## 97 Tom Odell 1.45
## 98 Bentley Robles 1.42
## 99 Seafret 1.41
## 100 OneRepublic 1.36
## # ℹ 4,343 more rows
This result honestly made me laugh. The revenue values are very low. However, I am just one person, right? Given the huge amount of Spotify users, artists make much more money than that. Additionally, they have other sources of income as well.
For this section, I would like to know how many days per year I spent listening to music.
# make a 'days_played' column based on the 'ms_played' column
# 1 millisecond = 1.15741e-8 days
songs_listened <- songs_listened %>% mutate(days_played=ms_played*1.15741e-8)
# for every year, sum the days_played column
time_spent_w_songs <- songs_listened %>% group_by(year(ts)) %>% summarise(days_listened=sum(days_played))
# rename "year(ts)" column
colnames(time_spent_w_songs)[1] <- c('year')
# let's plot it!
ggplot(time_spent_w_songs, aes(x=as.factor(year), y=days_listened)) + geom_col() +
xlab('Year') + ylab('Number of days') + ggtitle('Days spent listening to music in each year') + theme_bw()
So many days listening to music in 2020! I blame COVID-19. As for 2024, who knows.
Now that we know how many days per year I have spent streaming songs, let’s do something similar but now with the artists in my top 10 overall.
# get the names of the artists in my top 10
top10artists <- songs_listened %>% group_by(artist_name) %>%
summarise(times_listened=n()) %>% arrange(desc(times_listened)) %>%
slice(1:10) %>% pull(artist_name)
# filter main data frame so it only contains artists in my top 10
top10artists_streamingtime <- songs_listened %>% filter(artist_name %in% top10artists)
# compute total streaming time
top10artists_streamingtime <- top10artists_streamingtime %>% group_by(artist_name) %>%
summarise(streaming_time=sum(days_played))
# plot
ggplot(top10artists_streamingtime, aes(x=reorder(artist_name, -streaming_time), y=streaming_time)) +
geom_col() + coord_flip() + xlab('Artist') + ylab('Number of days') +
ggtitle('Days spent listening to music by artists in my top 10') + theme_bw()
I must admit, that is a lot of days listening to Lana Del Rey. But the results look so good visualized this way!
–
Okay, it is clear that I love Lana Del Rey. So, now I will rank her albums by times streamed in decreasing order.
# get the names of the artists in my top 10
ldr_albums <- songs_listened %>% group_by(album_name, artist_name) %>%
summarise(times_listened=n()) %>% filter(artist_name=='Lana Del Rey') %>%
select(album_name, times_listened)
# print full data frame
ldr_albums %>% arrange(desc(times_listened)) %>% print(n=nrow(ldr_albums))
## # A tibble: 52 × 2
## # Groups: album_name [52]
## album_name times_listened
## <chr> <int>
## 1 "Born To Die - The Paradise Edition" 2675
## 2 "Norman Fucking Rockwell!" 2492
## 3 "Ultraviolence" 1767
## 4 "Chemtrails Over The Country Club" 1340
## 5 "Lust For Life" 1281
## 6 "Honeymoon" 891
## 7 "Did you know that there's a tunnel under Ocean Blvd" 503
## 8 "Young And Beautiful" 353
## 9 "Blue Banisters" 238
## 10 "Born To Die – Paradise Edition" 150
## 11 "Mariners Apartment Complex" 125
## 12 "Blue Jeans Remixes" 61
## 13 "Born To Die" 50
## 14 "Once Upon a Dream (from \"Maleficent\")" 43
## 15 "Watercolor Eyes" 37
## 16 "Say Yes To Heaven" 25
## 17 "Paradise" 22
## 18 "Season Of The Witch" 22
## 19 "Summertime Sadness (Lana Del Rey Vs. Cedric Gervais)" 22
## 20 "Video Games Remixes" 16
## 21 "Looking For America" 15
## 22 "The End Of The Storm" 14
## 23 "Summertime The Gershwin Version" 12
## 24 "Summer Bummer" 11
## 25 "Unmasked: The Platinum Collection" 11
## 26 "Unmasked: The Platinum Collection - Deluxe" 11
## 27 "Blue Jeans (Kris Menace Remix)" 10
## 28 "A&W" 8
## 29 "Text Book" 8
## 30 "Venice Bitch" 8
## 31 "Arcadia" 7
## 32 "Let Me Love You Like A Woman" 7
## 33 "hope is a dangerous thing for a woman like me to have - but … 7
## 34 "Big Eyes: Music From The Original Motion Picture" 6
## 35 "Doin' Time" 6
## 36 "Summertime Sadness [Lana Del Rey vs. Cedric Gervais]" 6
## 37 "Blue Jeans" 5
## 38 "The Art of Sampling" 5
## 39 "West Coast" 5
## 40 "Ride" 4
## 41 "Summertime Sadness (Asadinho Remixes)" 4
## 42 "Wildflower Wildfire" 4
## 43 "Music From Baz Luhrmann's Film The Great Gatsby" 3
## 44 "Take Me Home, Country Roads" 3
## 45 "Video Games" 3
## 46 "Buddy's Rendezvous" 2
## 47 "Dark Paradise" 2
## 48 "Once Upon a Dream" 2
## 49 "Summertime Sadness" 2
## 50 "Chill Your Mind" 1
## 51 "LA Who Am I To Love You" 1
## 52 "Lust for Life (with The Weeknd)" 1
If you were to ask me what my favorite album is, I would say NFR. However, that is not my most streamed album. It could be that the reason why BTD is above NFR is because BTD was released years before, so it had more time to ‘collect’ streams. Perhaps in the future I will compute the ratio of total streams:year since the album was released. Also, the low numbers in the table probably represent EPs.
I wanna try and plot what were my top artists per week, across all my years as a Spotify user. What I hope to see is how my most listened to artists per week change according to some life events, such as concerts, plays, new album releases, etc.
# for every year and week, sum the time spent listening to every artist
topartists_week <- songs_listened %>% group_by(year(ts), week(ts), artist_name) %>% summarise(time=sum(ms_played))
# rename lubridate-made columns
colnames(topartists_week)[1:2] <- c('year', 'week')
# for every year and week, select the top 3 artists and add a new column with their respective position in the top 3
topartists_week <- topartists_week %>% group_by(year, week) %>% slice_max(time, n=3) %>% mutate(position=c(1:3))
# plot it!
ggplot(topartists_week, aes(x=week, y=position, color=artist_name, label=artist_name)) + geom_point(show.legend=F) + geom_line(show.legend=F) + geom_label_repel(show.legend=F) + facet_wrap(~year, ncol=1) + scale_y_continuous(breaks=c(1,2,3)) + scale_x_reverse(breaks=c(1,10,20,30,40,50)) + coord_flip() + xlab('Week') + ylab('Rank') + ggtitle('Top 3 artists per week in every year') + theme_bw()
Ok… this isn’t my most beautiful work. The plot is too busy, but I don’t think I could improve it further. Anyway, regardless of how busy the figure is, it’s still possible to draw some information from it. For instance, we can see the influence in my top artists of the week in weeks in which I saw some musicals in the theater (such as SIX on May 10th, 2022), an artist released a new album (for example, Dua Lipa’s Future Nostalgia on March 27th, 2020 and Beyoncé’s Renaissance on July 29th, 2022), or I first discovered an artist for the first time and streamed them a bunch (such as Weathers on September 6th-12th).
Overall, the plot tells an interesting story. But to be fair, this plot is probably only that interesting to me.
sessionInfo()
## R version 4.4.2 (2024-10-31 ucrt)
## Platform: x86_64-w64-mingw32/x64
## Running under: Windows 11 x64 (build 22631)
##
## Matrix products: default
##
##
## locale:
## [1] LC_COLLATE=Portuguese_Brazil.utf8 LC_CTYPE=Portuguese_Brazil.utf8
## [3] LC_MONETARY=Portuguese_Brazil.utf8 LC_NUMERIC=C
## [5] LC_TIME=Portuguese_Brazil.utf8
##
## time zone: America/Chicago
## tzcode source: internal
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] ggrepel_0.9.6 treemapify_2.5.6 viridis_0.6.5 viridisLite_0.4.2
## [5] lubridate_1.9.4 forcats_1.0.0 stringr_1.5.1 dplyr_1.1.4
## [9] purrr_1.0.2 readr_2.1.5 tidyr_1.3.1 tibble_3.2.1
## [13] ggplot2_3.5.1 tidyverse_2.0.0 jsonlite_1.8.9
##
## loaded via a namespace (and not attached):
## [1] gtable_0.3.6 compiler_4.4.2 Rcpp_1.0.13-1 tidyselect_1.2.1
## [5] gridExtra_2.3 jquerylib_0.1.4 scales_1.3.0 ggfittext_0.10.2
## [9] yaml_2.3.10 fastmap_1.2.0 R6_2.5.1 labeling_0.4.3
## [13] generics_0.1.3 knitr_1.49 munsell_0.5.1 bslib_0.8.0
## [17] pillar_1.10.1 tzdb_0.4.0 rlang_1.1.4 utf8_1.2.4
## [21] cachem_1.1.0 stringi_1.8.4 xfun_0.49 sass_0.4.9
## [25] timechange_0.3.0 cli_3.6.3 withr_3.0.2 magrittr_2.0.3
## [29] digest_0.6.37 grid_4.4.2 rstudioapi_0.17.1 hms_1.1.3
## [33] lifecycle_1.0.4 vctrs_0.6.5 evaluate_1.0.1 glue_1.8.0
## [37] farver_2.1.2 colorspace_2.1-1 rmarkdown_2.29 tools_4.4.2
## [41] pkgconfig_2.0.3 htmltools_0.5.8.1