I learned that, if requested, Spotify sends users their extended streaming history data. Since I’m excited about data science, of course I had to do it. My idea with this script is to slowly write code over time, as I have spare time, to analyze my data. Additionally, I plan on periodically request my streaming data.
In case you’d like to try it for yourself, feel free to request your own extended streaming data and copy any chunks of code I wrote. As a disclaimer, I’m not trying to code the most efficient way in here. It’s just supposed to be fun. Don’t judge my code!
The data are in multiple .json files with the same prefix. So, first I read the files and merged them into a single data frame, and then removed unnecessary columns.
library(jsonlite)
library(tidyverse)
library(viridis)
library(treemapify)
library(ggrepel)
'%&%' = function(a,b) paste (a,b,sep='')
# list of subfiles to read
list_of_json_files <- list.files(pattern='Streaming_History_Audio')
# read each file and append to a merged data frame
for (f in list_of_json_files){
tmp <- fromJSON(readLines(f))
if (exists('full_df')){
full_df <- rbind(full_df, tmp)
rm(tmp)
} else { full_df <- tmp }
}
# only keep columns I want
full_df <- full_df %>% select(master_metadata_album_artist_name, master_metadata_album_album_name, master_metadata_track_name, ts, ms_played) %>% drop_na()
# make sure the timestamp column is in the correct format
full_df$ts <- as_datetime(full_df$ts)
# renaming first two columns (names are too big!)
colnames(full_df)[1:3] <- c('artist_name','album_name','track_name')
The original data contains a ‘skipped’ column with Boolean values, however most cells were empty. Thus, I decided to implement my own ‘Did I skip this song?’ algorithm, which basically is: if I listened to a song for less than 10 seconds, it’s considered as skipped.
# keeps songs that I listened to for less than 10 seconds
songs_skipped <- full_df %>% filter(ms_played<=9999)
# analyze how much I've skipped songs
songs_skipped_summary <- songs_skipped %>% group_by(track_name, artist_name) %>%
summarise(times_skipped=n()) %>% unique()
# plot a histogram
ggplot(songs_skipped_summary, aes(x=times_skipped)) + geom_histogram() +
xlab('Number of times I have skipped a song') + ylab('Count') +
ggtitle('How many times I have skipped songs') + theme_minimal()
As we can see, most songs I have skipped few times. Those are probably songs Spotify recommended to me once or twice, and I skipped them. However, there are songs I have skipped a lot! Let’s see the top 10 most skipped songs of all time:
# 10 most skipped songs of all time
songs_skipped_summary %>% arrange(desc(times_skipped)) %>% head(n=10)
## # A tibble: 10 × 3
## # Groups: track_name [10]
## track_name artist_name times_skipped
## <chr> <chr> <int>
## 1 Bubblegum Bitch MARINA 365
## 2 Daddy Issues The Neighbourhood 325
## 3 Sweater Weather The Neighbourhood 306
## 4 How to Be a Heartbreaker MARINA 264
## 5 Primadonna MARINA 264
## 6 Why'd You Only Call Me When You're High? Arctic Monkeys 255
## 7 Oh No! MARINA 251
## 8 Toxic Britney Spears 222
## 9 Diet Mountain Dew Lana Del Rey 219
## 10 American Money BØRNS 213
Bingo! Those are songs I definitely enjoy, but sometimes I just don’t wanna listen to them.
OK, so now that I was able to briefly analyze the data corresponding to my skipped songs, let’s take a look at the ones I actually listened to. First, let’s see my top 100 most streamed songs of all my time as a Spotify user.
# keeps songs that I listened to for at least 10 seconds
songs_listened <- full_df %>% filter(ms_played>=10000)
# analyze how much I've streamed songs
songs_listened_summary <- songs_listened %>% group_by(track_name, artist_name) %>%
summarise(times_streamed=n()) %>% unique()
# 100 most streamed songs of all time
songs_listened_summary %>% arrange(desc(times_streamed)) %>% print(n=100)
## # A tibble: 18,966 × 3
## # Groups: track_name [17,217]
## track_name artist_name times_streamed
## <chr> <chr> <int>
## 1 Mariners Apartment Complex Lana Del Rey 544
## 2 Daddy Issues The Neighbo… 524
## 3 American Money BØRNS 474
## 4 Pretty When You Cry Lana Del Rey 457
## 5 Young And Beautiful Lana Del Rey 437
## 6 Crier tout bas Cœur De Pir… 435
## 7 Cinnamon Girl Lana Del Rey 434
## 8 Ultraviolence Lana Del Rey 425
## 9 Moondust Jaymes Young 403
## 10 I'll Be Good Jaymes Young 385
## 11 The Emotion BØRNS 383
## 12 Ride Lana Del Rey 368
## 13 Liability Lorde 363
## 14 Sweater Weather The Neighbo… 362
## 15 Bubblegum Bitch MARINA 361
## 16 Lunchbox Friends Melanie Mar… 360
## 17 Place de la République Cœur De Pir… 351
## 18 Hypnotic Zella Day 347
## 19 Background Barcelona 346
## 20 Doin' Time Lana Del Rey 341
## 21 Somebody Else The 1975 336
## 22 Dark Paradise Lana Del Rey 330
## 23 Diet Mountain Dew Lana Del Rey 330
## 24 Fake Happy Paramore 324
## 25 Cherry Lana Del Rey 312
## 26 Somewhere Only We Know Keane 312
## 27 Therapy All Time Low 311
## 28 West Coast Lana Del Rey 308
## 29 People Watching Conan Gray 302
## 30 Friends Chase Atlan… 298
## 31 Infinity Jaymes Young 296
## 32 Why'd You Only Call Me When You're High? Arctic Monk… 296
## 33 The Blackest Day Lana Del Rey 294
## 34 Cough Syrup Young the G… 291
## 35 Before the Worst The Script 288
## 36 Nightcall London Gram… 288
## 37 Snap Out Of It Arctic Monk… 287
## 38 Chemtrails Over The Country Club Lana Del Rey 284
## 39 National Anthem Lana Del Rey 284
## 40 How to Be a Heartbreaker MARINA 283
## 41 Sur la lune Bigflo & Oli 280
## 42 Softcore The Neighbo… 279
## 43 Six Degrees of Separation The Script 276
## 44 J'suis pas dupe Pomme 275
## 45 jealousy, jealousy Olivia Rodr… 275
## 46 When You're Gone Avril Lavig… 273
## 47 My Immortal Evanescence 269
## 48 Wires The Neighbo… 268
## 49 Buzzcut Season Lorde 265
## 50 Chasing Cars Snow Patrol 264
## 51 Complicated Avril Lavig… 261
## 52 Dernière danse Indila 261
## 53 Remembering Sunday All Time Low 261
## 54 S.O.S Indila 261
## 55 The Climb Miley Cyrus 260
## 56 All I Wanted Paramore 257
## 57 Circus Britney Spe… 257
## 58 Lies MARINA 253
## 59 Oh No! MARINA 251
## 60 Sofisticated Fuck Princess Please Leave Me Alone Carissa's W… 249
## 61 Sign of the Times Harry Styles 248
## 62 Happy MARINA 247
## 63 Electric Love BØRNS 246
## 64 Numb Linkin Park 245
## 65 Supermassive Black Hole Muse 245
## 66 Far Too Young to Die Panic! At T… 244
## 67 Happiness IAMX 244
## 68 Lithium Evanescence 244
## 69 Tourner Dans Le Vide Indila 244
## 70 Judas Lady Gaga 243
## 71 Physical Dua Lipa 243
## 72 White Mustang Lana Del Rey 243
## 73 Dark But Just A Game Lana Del Rey 240
## 74 Hometown Glory Adele 239
## 75 Icy Kim Petras 239
## 76 Mini World Indila 239
## 77 Born To Die Lana Del Rey 238
## 78 Elle me dit MIKA 237
## 79 Say Hello Melancholia IAMX 237
## 80 Primadonna MARINA 236
## 81 Nobody's Home Avril Lavig… 234
## 82 Skyfall Adele 234
## 83 yes, and? Ariana Gran… 234
## 84 Sex, Drugs, Etc. Beach Weath… 233
## 85 Strange Birds Birdy 233
## 86 Ta reine Angèle 233
## 87 Thnks fr th Mmrs Fall Out Boy 233
## 88 Boulevard of Broken Dreams Green Day 232
## 89 Russian Roulette Rihanna 232
## 90 Flares The Script 230
## 91 Here Alessia Cara 229
## 92 We Are Broken Paramore 229
## 93 Womanizer Britney Spe… 228
## 94 I'm a Ruin MARINA 226
## 95 A Bad Dream Keane 225
## 96 Comment je vais faire Hoshi 225
## 97 Brooklyn Baby Lana Del Rey 224
## 98 Fuck it I love you Lana Del Rey 224
## 99 In the End Linkin Park 224
## 100 Une miss s'immisce Exotica 224
## # ℹ 18,866 more rows
I feel like any comment about my top 100 list would be TMI. It is what it is.
Now, let’s see the top 10 songs per year.
# sum amount of times I've streamed each song per year
songs_listened_summary <- songs_listened %>% group_by(track_name, artist_name, year(ts)) %>%
summarise(times_streamed=n()) %>% unique()
# rename "year(ts)" column
colnames(songs_listened_summary)[3] <- c('year')
# get top 10 per year
top10songs_listened <- songs_listened_summary %>% group_by(year) %>% arrange(desc(times_streamed)) %>%
slice(1:10)
# print full data frame
top10songs_listened %>% arrange(desc(times_streamed)) %>% print(n=nrow(top10songs_listened))
## # A tibble: 107 × 4
## # Groups: year [12]
## track_name artist_name year times_streamed
## <chr> <chr> <dbl> <int>
## 1 yes, and? Ariana Gra… 2024 215
## 2 Good Luck, Babe! Chappell R… 2024 179
## 3 American Money BØRNS 2022 177
## 4 J'suis pas dupe Pomme 2020 168
## 5 Icy Kim Petras 2022 166
## 6 JOYRIDE Kesha 2024 164
## 7 Crier tout bas Cœur De Pi… 2020 163
## 8 we can't be friends (wait for your love) Ariana Gra… 2024 160
## 9 Houdini Dua Lipa 2024 154
## 10 Comment je vais faire Hoshi 2020 149
## 11 Mini World Indila 2020 145
## 12 jealousy, jealousy Olivia Rod… 2024 145
## 13 Dernière danse Indila 2020 140
## 14 Diet Mountain Dew Lana Del R… 2024 140
## 15 Washing Machine Heart Mitski 2024 140
## 16 the boy is mine Ariana Gra… 2024 137
## 17 Cinnamon Girl Lana Del R… 2024 132
## 18 Moondust Jaymes You… 2022 130
## 19 Avenir Louane 2020 129
## 20 Sur la lune Bigflo & O… 2020 129
## 21 THE QUIET Troye Sivan 2022 128
## 22 National Anthem Lana Del R… 2023 127
## 23 All You Wanna Do SIX 2022 125
## 24 Get Down SIX 2022 125
## 25 Ta reine Angèle 2020 124
## 26 Lunchbox Friends Melanie Ma… 2019 122
## 27 Balance ton quoi Angèle 2020 121
## 28 Écoute Chérie Vendredi s… 2020 120
## 29 Doin' Time Lana Del R… 2023 119
## 30 Do Me a Favour Arctic Mon… 2022 116
## 31 Softcore The Neighb… 2022 116
## 32 Why'd You Only Call Me When You're High? Arctic Mon… 2023 116
## 33 Crier tout bas Cœur De Pi… 2019 114
## 34 Diet Mountain Dew Lana Del R… 2023 114
## 35 Infinity Jaymes You… 2021 110
## 36 Worth It for the Feeling Rebecca Bl… 2022 109
## 37 I'll Be Good Jaymes You… 2025 109
## 38 CHIHIRO Billie Eil… 2025 108
## 39 Parachute Hayley Wil… 2025 108
## 40 Liability Lorde 2019 106
## 41 Man's World MARINA 2021 104
## 42 Moondust Jaymes You… 2021 101
## 43 Mariners Apartment Complex Lana Del R… 2019 100
## 44 Bubblegum Bitch MARINA 2022 100
## 45 I Miss You Adele 2025 100
## 46 Somewhere Only We Know Keane 2019 99
## 47 Don’t Blame Me Taylor Swi… 2023 99
## 48 Sports car Tate McRae 2025 98
## 49 More Than a Friend girli 2021 97
## 50 Nightcall London Gra… 2019 93
## 51 Sign of the Times Harry Styl… 2019 93
## 52 Cinnamon Girl Lana Del R… 2025 93
## 53 Dernière danse Indila 2019 92
## 54 Place de la République Cœur De Pi… 2019 92
## 55 Snap Out Of It Arctic Mon… 2023 92
## 56 Friends Chase Atla… 2021 91
## 57 Karma Taylor Swi… 2023 91
## 58 Softcore The Neighb… 2023 91
## 59 My Immortal Evanescence 2019 90
## 60 Buzzcut Season Lorde 2023 90
## 61 Judas Lady Gaga 2023 90
## 62 Lithium Evanescence 2025 88
## 63 Nightcall London Gra… 2025 88
## 64 Pretty When You Cry Lana Del R… 2025 88
## 65 I'll Be Good Jaymes You… 2021 87
## 66 La baie Clara Luci… 2021 87
## 67 The Light Behind Your Eyes My Chemica… 2025 87
## 68 Crier tout bas Cœur De Pi… 2018 85
## 69 Hypnotic Zella Day 2021 82
## 70 Venus Fly Trap MARINA 2021 81
## 71 American Money BØRNS 2021 79
## 72 When You're Gone Avril Lavi… 2018 78
## 73 Place de la République Cœur De Pi… 2018 74
## 74 Wires The Neighb… 2018 73
## 75 Mariners Apartment Complex Lana Del R… 2018 69
## 76 Before the Worst The Script 2018 67
## 77 Six Degrees of Separation The Script 2018 66
## 78 Ultraviolence Lana Del R… 2018 66
## 79 Russian Roulette Rihanna 2018 65
## 80 The Blackest Day Lana Del R… 2018 65
## 81 Partition Beyoncé 2015 5
## 82 Booty Jennifer L… 2015 2
## 83 Don't Cha The Pussyc… 2015 2
## 84 Flawless Remix (feat. Nicki Minaj) Beyoncé 2015 2
## 85 G.U.Y. Lady Gaga 2015 2
## 86 Gasolina - DJ Buddah Remix Daddy Yank… 2015 2
## 87 Rabiosa Shakira 2015 2
## 88 Sweet Dreams Beyoncé 2015 2
## 89 Your Body Christina … 2015 2
## 90 Berghain ROSALÍA 2026 2
## 91 Better By Myself Hey Violet 2026 2
## 92 Fruityloop Lily Allen 2026 2
## 93 Starring Role MARINA 2026 2
## 94 4 Minutes (feat. Justin Timberlake and Tim… Madonna 2015 1
## 95 Back To Black Amy Wineho… 2016 1
## 96 Far Side Of The Moon Tinashe 2016 1
## 97 Impossible - Main Shontelle 2016 1
## 98 Somebody That I Used To Know Gotye 2016 1
## 99 Til It Happens To You Lady Gaga 2016 1
## 100 Always On the Run Yuksek 2017 1
## 101 Je bois et puis je danse Aline 2017 1
## 102 1TRAGO Danna Paola 2026 1
## 103 22 Lily Allen 2026 1
## 104 4ever The Veroni… 2026 1
## 105 ALIEN SUPERSTAR Beyoncé 2026 1
## 106 Ai Doutor Bibi Babyd… 2026 1
## 107 Amor de P... Bibi Babyd… 2026 1
It is interesting to see my top 10 songs of each year, because I definitely see how it changes over the years. It is also possible to notice that I basically did not use Spotify between 2015-2017, so I will make sure to remove those years from my data frame.
# only keep streams that occurred in 2018 or after
songs_listened <- songs_listened %>% filter(year(ts)>2017)
Now, what are the artists I listen to the most? I have my own guesses, but let’s see what the data tell us.
# get the total amount of times I've listened to each artist
artists_frequency <- songs_listened %>% group_by(artist_name) %>%
summarise(times_listened=n())
# print the top 100
artists_frequency %>% arrange(desc(times_listened)) %>% print(n=100)
## # A tibble: 4,820 × 2
## artist_name times_listened
## <chr> <int>
## 1 Lana Del Rey 15360
## 2 Britney Spears 5170
## 3 MARINA 5035
## 4 Paramore 4705
## 5 All Time Low 4471
## 6 Taylor Swift 3950
## 7 Lady Gaga 3656
## 8 The Neighbourhood 3649
## 9 Melanie Martinez 3075
## 10 Barcelona 3032
## 11 Jaymes Young 2885
## 12 Arctic Monkeys 2840
## 13 Fall Out Boy 2839
## 14 Dua Lipa 2614
## 15 BØRNS 2543
## 16 Billie Eilish 2508
## 17 The Script 2473
## 18 Ariana Grande 2294
## 19 Keane 2247
## 20 Olivia Rodrigo 2200
## 21 Panic! At The Disco 2034
## 22 Lorde 1940
## 23 Conan Gray 1910
## 24 Kim Petras 1846
## 25 The Cab 1768
## 26 Florence + The Machine 1665
## 27 Angèle 1632
## 28 Cœur De Pirate 1622
## 29 Lily Allen 1621
## 30 MIKA 1601
## 31 Charli xcx 1560
## 32 Miley Cyrus 1433
## 33 Avril Lavigne 1325
## 34 Troye Sivan 1322
## 35 Evanescence 1302
## 36 Rihanna 1291
## 37 Indila 1273
## 38 Birdy 1241
## 39 IAMX 1231
## 40 Beyoncé 1223
## 41 Luísa Sonza 1203
## 42 The Killers 1157
## 43 Adele 1116
## 44 Mikky Ekko 1106
## 45 Linkin Park 1090
## 46 Tove Lo 1073
## 47 Chappell Roan 1065
## 48 La Roux 990
## 49 The Weeknd 965
## 50 Katy Perry 950
## 51 Harry Styles 933
## 52 Sandy e Junior 911
## 53 SIX 902
## 54 Louane 891
## 55 Tate McRae 873
## 56 Carissa's Wierd 869
## 57 My Chemical Romance 845
## 58 Hoshi 825
## 59 Kesha 799
## 60 Pabllo Vittar 794
## 61 girl in red 786
## 62 Weathers 774
## 63 Pomme 772
## 64 Muse 757
## 65 Bilal Hassani 722
## 66 Shakira 720
## 67 Chase Atlantic 712
## 68 Selena Gomez 679
## 69 Kylie Minogue 668
## 70 Rina Sawayama 667
## 71 Demi Lovato 656
## 72 RBD 639
## 73 Pitty 618
## 74 Hozier 610
## 75 Zella Day 599
## 76 Bigflo & Oli 595
## 77 High School Musical Cast 590
## 78 NX Zero 567
## 79 Sufjan Stevens 541
## 80 Tom Odell 534
## 81 Wallows 530
## 82 Mitski 526
## 83 Allie X 522
## 84 Vendredi sur Mer 512
## 85 Stephen Schwartz 510
## 86 Hey Violet 504
## 87 Sabrina Carpenter 501
## 88 Stromae 490
## 89 OneRepublic 480
## 90 The 1975 470
## 91 SYML 458
## 92 Andrew Belle 453
## 93 Halsey 450
## 94 Bentley Robles 445
## 95 Carla 440
## 96 Hayley Williams 439
## 97 Beach Weather 437
## 98 The Pussycat Dolls 437
## 99 Seafret 428
## 100 The Ready Set 424
## # ℹ 4,720 more rows
Am I surprised Lana Del Rey is in the first place? No. Am I shocked by the difference between her and the second place? A bit. But overall, the top 100 fairly represents my musical taste (duh!).
Let’s do top 10 per year now.
# get the total amount of times I've listened to each artist per year
artists_frequency <- songs_listened %>% group_by(artist_name, year(ts)) %>%
summarise(times_listened=n())
# rename "year(ts)" column
colnames(artists_frequency)[2] <- c('year')
# get top 10 per year
top10artists_listened <- artists_frequency %>% group_by(year) %>% arrange(desc(times_listened)) %>%
slice(1:10)
# print full data frame
top10artists_listened %>% arrange(desc(times_listened)) %>% print(n=nrow(top10artists_listened))
## # A tibble: 90 × 3
## # Groups: year [9]
## artist_name year times_listened
## <chr> <dbl> <int>
## 1 Lana Del Rey 2025 3053
## 2 Lana Del Rey 2024 2840
## 3 Lana Del Rey 2023 2622
## 4 Lana Del Rey 2022 2243
## 5 Britney Spears 2024 2076
## 6 Lana Del Rey 2019 1553
## 7 All Time Low 2019 1472
## 8 Jaymes Young 2020 1448
## 9 All Time Low 2020 1446
## 10 Lana Del Rey 2021 1324
## 11 Barcelona 2020 1291
## 12 Taylor Swift 2023 1251
## 13 Melanie Martinez 2019 1228
## 14 The Cab 2020 1227
## 15 Paramore 2023 1227
## 16 Olivia Rodrigo 2024 1168
## 17 Britney Spears 2023 1164
## 18 Lily Allen 2025 1157
## 19 Ariana Grande 2024 1117
## 20 MARINA 2021 1097
## 21 Angèle 2020 1070
## 22 MIKA 2020 1054
## 23 Billie Eilish 2024 1016
## 24 Dua Lipa 2024 970
## 25 Charli xcx 2024 958
## 26 Taylor Swift 2022 948
## 27 Chappell Roan 2024 937
## 28 The Neighbourhood 2023 919
## 29 Lana Del Rey 2020 909
## 30 Paramore 2022 890
## 31 Lady Gaga 2024 862
## 32 Arctic Monkeys 2023 859
## 33 Dua Lipa 2020 832
## 34 MARINA 2024 819
## 35 Lana Del Rey 2018 816
## 36 Fall Out Boy 2023 811
## 37 The Neighbourhood 2022 810
## 38 Britney Spears 2025 801
## 39 BØRNS 2020 798
## 40 The Script 2020 762
## 41 Arctic Monkeys 2022 762
## 42 Jaymes Young 2021 749
## 43 MARINA 2022 710
## 44 MARINA 2023 692
## 45 BØRNS 2022 691
## 46 Conan Gray 2023 684
## 47 Kim Petras 2022 673
## 48 The Script 2019 651
## 49 Britney Spears 2022 644
## 50 Beyoncé 2022 636
## 51 Florence + The Machine 2021 626
## 52 Lady Gaga 2025 580
## 53 Keane 2021 566
## 54 Birdy 2019 553
## 55 Lady Gaga 2021 553
## 56 Lady Gaga 2023 540
## 57 Weathers 2021 532
## 58 MARINA 2025 508
## 59 Billie Eilish 2025 501
## 60 The Neighbourhood 2018 481
## 61 Paramore 2025 465
## 62 Fall Out Boy 2019 460
## 63 Indila 2019 455
## 64 Tate McRae 2025 452
## 65 Shakira 2025 430
## 66 Evanescence 2025 429
## 67 The Script 2021 422
## 68 Olivia Rodrigo 2021 411
## 69 Britney Spears 2021 368
## 70 IAMX 2019 359
## 71 Cœur De Pirate 2019 355
## 72 Keane 2019 331
## 73 The Script 2018 309
## 74 Avril Lavigne 2018 275
## 75 Keane 2018 269
## 76 Paramore 2018 264
## 77 Fall Out Boy 2018 261
## 78 Cœur De Pirate 2018 260
## 79 All Time Low 2018 244
## 80 Panic! At The Disco 2018 238
## 81 Lily Allen 2026 17
## 82 MARINA 2026 8
## 83 Bibi Babydoll 2026 6
## 84 Halsey 2026 6
## 85 Danna Paola 2026 5
## 86 Florence + The Machine 2026 5
## 87 La Roux 2026 5
## 88 RBD 2026 5
## 89 Hey Violet 2026 4
## 90 Hilary Duff 2026 4
I feel like the breakdown per year does not have as much information as the previous one, but it is possible to see how in some years I was more into certain music genres than others.
Just out of curiosity, I would like to know how often the same artists are found within my top 10 across all years.
# count how many times an artist appears in a top 10
top10artists_frequency <- top10artists_listened %>% group_by(artist_name) %>% summarise(times_in_top10=n())
# make a treemap
ggplot(top10artists_frequency, aes(area=times_in_top10, fill=times_in_top10, label=artist_name, subgroup=times_in_top10)) + labs(fill='Artist') + geom_treemap() + geom_treemap_text() + geom_treemap_subgroup_border(color='black') + geom_treemap_subgroup_text(place='centre', grow=T, alpha=0.6) + theme(legend.position='none') + scale_fill_viridis()
In the treemap above, gray numbers represent how many times the artists in each subgroup (same color) are found in a top 10. Lana Del Rey is the only artist that has appeared in a top 10 every year. No surprises there.
Now, let’s see what are my most streamed albums. To do this, I will group by both album name and artist name, as there could be albums released by different authors that might have the same name.
# get the total amount of times I've listened to each album
album_frequency <- songs_listened %>% group_by(album_name, artist_name) %>%
summarise(times_listened=n())
# print the top 100
album_frequency %>% arrange(desc(times_listened)) %>% print(n=100)
## # A tibble: 11,743 × 3
## # Groups: album_name [10,799]
## album_name artist_name times_listened
## <chr> <chr> <int>
## 1 Norman Fucking Rockwell! Lana Del R… 2931
## 2 Born To Die - The Paradise Edition Lana Del R… 2708
## 3 Ultraviolence Lana Del R… 2142
## 4 K-12 Melanie Ma… 1954
## 5 Dopamine BØRNS 1918
## 6 Chemtrails Over The Country Club Lana Del R… 1587
## 7 Lust For Life Lana Del R… 1573
## 8 Electra Heart MARINA 1508
## 9 Future Nostalgia Dua Lipa 1441
## 10 AM Arctic Mon… 1420
## 11 Mini World Indila 1239
## 12 Honeymoon Lana Del R… 1179
## 13 Circus (Deluxe Version) Britney Sp… 1159
## 14 SOUR Olivia Rod… 1158
## 15 Wiped Out! The Neighb… 1096
## 16 Feel Something Jaymes You… 1063
## 17 Brol Angèle 1015
## 18 Ancient Dreams In A Modern Land MARINA 997
## 19 Blonde Cœur De Pi… 965
## 20 Melodrama Lorde 960
## 21 Hard To Imagine The Neighbourhood Ever Changing The Neighb… 957
## 22 Blackout Britney Sp… 912
## 23 Cry Baby Melanie Ma… 891
## 24 I Love You. The Neighb… 882
## 25 folklore Taylor Swi… 871
## 26 After Laughter Paramore 867
## 27 Brand New Eyes Paramore 857
## 28 Did you know that there's a tunnel under Ocean B… Lana Del R… 855
## 29 The Rise and Fall of a Midwest Princess Chappell R… 832
## 30 MANIA Fall Out B… 818
## 31 This Is Why Paramore 815
## 32 Chambre 12 Louane 814
## 33 Froot MARINA 797
## 34 Riot! Paramore 785
## 35 La Roux La Roux 761
## 36 The Family Jewels MARINA 760
## 37 Hopes And Fears Keane 751
## 38 Pure Heroine Lorde 743
## 39 Slut Pop Kim Petras 739
## 40 reputation Taylor Swi… 739
## 41 Favourite Worst Nightmare Arctic Mon… 733
## 42 Absolutes Barcelona 732
## 43 DOCE 22 Luísa Sonza 714
## 44 Femme Fatale (Deluxe Version) Britney Sp… 682
## 45 West End Girl Lily Allen 675
## 46 Superache Conan Gray 665
## 47 Dark Star Jaymes You… 661
## 48 Il suffit d'y croire Hoshi 658
## 49 Death of a Bachelor Panic! At … 657
## 50 The Script The Script 654
## 51 SIX: LIVE ON OPENING NIGHT (Original Broadway Ca… SIX 653
## 52 RENAISSANCE Beyoncé 648
## 53 GUTS Olivia Rod… 639
## 54 Born To Die – Paradise Edition Lana Del R… 631
## 55 Last Young Renegade All Time L… 629
## 56 Paramore Paramore 625
## 57 Lock Me Up The Cab 624
## 58 Fallen Evanescence 619
## 59 The Fame Lady Gaga 602
## 60 Kid Krow Conan Gray 593
## 61 ARTPOP Lady Gaga 590
## 62 eternal sunshine Ariana Gra… 583
## 63 Not Quite Yours Barcelona 581
## 64 Habits of My Heart Jaymes You… 580
## 65 Chromatica Lady Gaga 575
## 66 Love + Fear MARINA 574
## 67 Too Weird to Live, Too Rare to Die! Panic! At … 573
## 68 Kicker Zella Day 570
## 69 #3 Deluxe Version The Script 565
## 70 Lungs Florence +… 561
## 71 Symphony Soldier The Cab 556
## 72 Dirty Work All Time L… 549
## 73 HIT ME HARD AND SOFT Billie Eil… 549
## 74 Born This Way Lady Gaga 546
## 75 thank u, next Ariana Gra… 544
## 76 It's Not Me, It's You Lily Allen 543
## 77 Love Me Barcelona 537
## 78 Let Go Avril Lavi… 533
## 79 Brol La Suite Angèle 527
## 80 Songs About Leaving Carissa's … 523
## 81 Hot Fuss The Killers 515
## 82 Fine Line Harry Styl… 513
## 83 The Fame Monster Lady Gaga 510
## 84 Wicked Stephen Sc… 510
## 85 Premiers émois Vendredi s… 507
## 86 In The Zone Britney Sp… 506
## 87 Metanoia IAMX 501
## 88 Kingdom Bilal Hass… 494
## 89 Happier Than Ever Billie Eil… 492
## 90 Future Hearts All Time L… 488
## 91 Midnights Taylor Swi… 482
## 92 Nothing Personal (Deluxe Version) All Time L… 482
## 93 racine carrée Stromae 479
## 94 Fire Within Birdy 463
## 95 Blue Neighbourhood - Deluxe Troye Sivan 461
## 96 Save Rock And Roll Fall Out B… 457
## 97 The Origin Of Love MIKA 452
## 98 Roses Cœur De Pi… 450
## 99 Time Mikky Ekko 450
## 100 No Sound Without Silence The Script 446
## # ℹ 11,643 more rows
To be honest, I did not expect to see some albums on my top 10 so high on this list!
Contrary to what most people believe, Spotify does not pay artist royalties according to a per-play or per-stream rate. It is more complicated than that. However… what if they did? According to a totally not in-depth and scientific research I did, most artists are paid roughly 0.004 USD per stream. So using a little bit of math, let’s see how much money artists have made from my streams alone.
# get the total amount of times I've listened to each artist and multiply by the "payment rate"
artists_revenue <- songs_listened %>% group_by(artist_name) %>%
summarise(revenue=n()*0.004)
# print the top 100
artists_revenue %>% arrange(desc(revenue)) %>% print(n=100)
## # A tibble: 4,820 × 2
## artist_name revenue
## <chr> <dbl>
## 1 Lana Del Rey 61.4
## 2 Britney Spears 20.7
## 3 MARINA 20.1
## 4 Paramore 18.8
## 5 All Time Low 17.9
## 6 Taylor Swift 15.8
## 7 Lady Gaga 14.6
## 8 The Neighbourhood 14.6
## 9 Melanie Martinez 12.3
## 10 Barcelona 12.1
## 11 Jaymes Young 11.5
## 12 Arctic Monkeys 11.4
## 13 Fall Out Boy 11.4
## 14 Dua Lipa 10.5
## 15 BØRNS 10.2
## 16 Billie Eilish 10.0
## 17 The Script 9.89
## 18 Ariana Grande 9.18
## 19 Keane 8.99
## 20 Olivia Rodrigo 8.8
## 21 Panic! At The Disco 8.14
## 22 Lorde 7.76
## 23 Conan Gray 7.64
## 24 Kim Petras 7.38
## 25 The Cab 7.07
## 26 Florence + The Machine 6.66
## 27 Angèle 6.53
## 28 Cœur De Pirate 6.49
## 29 Lily Allen 6.48
## 30 MIKA 6.40
## 31 Charli xcx 6.24
## 32 Miley Cyrus 5.73
## 33 Avril Lavigne 5.3
## 34 Troye Sivan 5.29
## 35 Evanescence 5.21
## 36 Rihanna 5.16
## 37 Indila 5.09
## 38 Birdy 4.96
## 39 IAMX 4.92
## 40 Beyoncé 4.89
## 41 Luísa Sonza 4.81
## 42 The Killers 4.63
## 43 Adele 4.46
## 44 Mikky Ekko 4.42
## 45 Linkin Park 4.36
## 46 Tove Lo 4.29
## 47 Chappell Roan 4.26
## 48 La Roux 3.96
## 49 The Weeknd 3.86
## 50 Katy Perry 3.8
## 51 Harry Styles 3.73
## 52 Sandy e Junior 3.64
## 53 SIX 3.61
## 54 Louane 3.56
## 55 Tate McRae 3.49
## 56 Carissa's Wierd 3.48
## 57 My Chemical Romance 3.38
## 58 Hoshi 3.3
## 59 Kesha 3.20
## 60 Pabllo Vittar 3.18
## 61 girl in red 3.14
## 62 Weathers 3.10
## 63 Pomme 3.09
## 64 Muse 3.03
## 65 Bilal Hassani 2.89
## 66 Shakira 2.88
## 67 Chase Atlantic 2.85
## 68 Selena Gomez 2.72
## 69 Kylie Minogue 2.67
## 70 Rina Sawayama 2.67
## 71 Demi Lovato 2.62
## 72 RBD 2.56
## 73 Pitty 2.47
## 74 Hozier 2.44
## 75 Zella Day 2.40
## 76 Bigflo & Oli 2.38
## 77 High School Musical Cast 2.36
## 78 NX Zero 2.27
## 79 Sufjan Stevens 2.16
## 80 Tom Odell 2.14
## 81 Wallows 2.12
## 82 Mitski 2.10
## 83 Allie X 2.09
## 84 Vendredi sur Mer 2.05
## 85 Stephen Schwartz 2.04
## 86 Hey Violet 2.02
## 87 Sabrina Carpenter 2.00
## 88 Stromae 1.96
## 89 OneRepublic 1.92
## 90 The 1975 1.88
## 91 SYML 1.83
## 92 Andrew Belle 1.81
## 93 Halsey 1.8
## 94 Bentley Robles 1.78
## 95 Carla 1.76
## 96 Hayley Williams 1.76
## 97 Beach Weather 1.75
## 98 The Pussycat Dolls 1.75
## 99 Seafret 1.71
## 100 The Ready Set 1.70
## # ℹ 4,720 more rows
This result honestly made me laugh. The revenue values are very low. However, I am just one person, right? Given the huge amount of Spotify users, artists make much more money than that. Additionally, they have other sources of income as well.
For this section, I would like to know how many days per year I spent listening to music.
# make a 'days_played' column based on the 'ms_played' column
# 1 millisecond = 1.15741e-8 days
songs_listened <- songs_listened %>% mutate(days_played=ms_played*1.15741e-8)
# for every year, sum the days_played column
time_spent_w_songs <- songs_listened %>% group_by(year(ts)) %>% summarise(days_listened=sum(days_played))
# rename "year(ts)" column
colnames(time_spent_w_songs)[1] <- c('year')
# turn it into a factor
time_spent_w_songs$year <- as.factor(time_spent_w_songs$year)
# let's plot it!
ggplot(time_spent_w_songs, aes(x=year, y=days_listened)) + geom_col() +
xlab('Year') + ylab('Number of days') + ggtitle('Days spent listening to music in each year') + theme_minimal()
So many days listening to music in 2020! I blame COVID-19.
Now that we know how many days per year I have spent streaming songs, let’s do something similar but now with the artists in my top 10 overall.
# get the names of the artists in my top 10
top10artists <- songs_listened %>% group_by(artist_name) %>%
summarise(times_listened=n()) %>% arrange(desc(times_listened)) %>%
slice(1:10) %>% pull(artist_name)
# filter main data frame so it only contains artists in my top 10
top10artists_streamingtime <- songs_listened %>% filter(artist_name %in% top10artists)
# compute total streaming time
top10artists_streamingtime <- top10artists_streamingtime %>% group_by(artist_name) %>%
summarise(streaming_time=sum(days_played))
# plot
ggplot(top10artists_streamingtime, aes(x=reorder(artist_name, -streaming_time), y=streaming_time)) +
geom_col() + coord_flip() + xlab('Artist') + ylab('Number of days') +
ggtitle('Days spent listening to music by artists in my top 10') + theme_minimal()
I must admit, that is a lot of days listening to Lana Del Rey. But the results look so good visualized this way!
I wanna try and plot what were my top artists per week, across all my years as a Spotify user. What I hope to see is how my most listened to artists per week change according to some life events, such as concerts, plays, new album releases, etc.
# for every year and week, sum the time spent listening to every artist
topartists_week <- songs_listened %>% group_by(year(ts), week(ts), artist_name) %>% summarise(time=sum(ms_played))
# rename lubridate-made columns
colnames(topartists_week)[1:2] <- c('year', 'week')
# for every year and week, select the top 3 artists and add a new column with their respective position in the top 3
topartists_week <- topartists_week %>% group_by(year, week) %>% slice_max(time, n=3) %>% mutate(position=c(1:3))
# plot it!
ggplot(topartists_week, aes(x=week, y=position, color=artist_name, label=artist_name)) + geom_point(show.legend=F) + geom_line(show.legend=F) + geom_label_repel(show.legend=F) + facet_wrap(~year, ncol=1) + scale_y_continuous(breaks=c(1,2,3)) + scale_x_reverse(breaks=c(1,10,20,30,40,50)) + coord_flip() + xlab('Week') + ylab('Rank') + ggtitle('Top 3 artists per week in every year') + theme_minimal()
Ok… this isn’t my most beautiful work. The plot is too busy, but I don’t think I could improve it further. Anyway, regardless of how busy the figure is, it’s still possible to draw some information from it. For instance, we can see the influence in my top artists of the week in weeks in which I saw some musicals in the theater (such as SIX on May 10th, 2022), an artist released a new album (for example, Dua Lipa’s Future Nostalgia on March 27th, 2020 and Beyoncé’s Renaissance on July 29th, 2022), or I first discovered an artist for the first time and streamed them a bunch (such as Weathers on September 6th-12th).
Overall, the plot tells an interesting story. But to be fair, this plot is probably only that interesting to me.
sessionInfo()
## R version 4.4.2 (2024-10-31)
## Platform: aarch64-apple-darwin20
## Running under: macOS Sequoia 15.6
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.0
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## time zone: America/Chicago
## tzcode source: internal
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] ggrepel_0.9.6 treemapify_2.5.6 viridis_0.6.5 viridisLite_0.4.2
## [5] lubridate_1.9.4 forcats_1.0.0 stringr_1.5.1 dplyr_1.1.4
## [9] purrr_1.0.4 readr_2.1.5 tidyr_1.3.1 tibble_3.3.0
## [13] ggplot2_3.5.2 tidyverse_2.0.0 jsonlite_2.0.0
##
## loaded via a namespace (and not attached):
## [1] gtable_0.3.6 compiler_4.4.2 Rcpp_1.0.14 tidyselect_1.2.1
## [5] gridExtra_2.3 jquerylib_0.1.4 scales_1.4.0 ggfittext_0.10.2
## [9] yaml_2.3.10 fastmap_1.2.0 R6_2.6.1 labeling_0.4.3
## [13] generics_0.1.4 knitr_1.50 bslib_0.9.0 pillar_1.10.2
## [17] RColorBrewer_1.1-3 tzdb_0.5.0 rlang_1.1.6 utf8_1.2.6
## [21] cachem_1.1.0 stringi_1.8.7 xfun_0.52 sass_0.4.10
## [25] timechange_0.3.0 cli_3.6.5 withr_3.0.2 magrittr_2.0.3
## [29] digest_0.6.37 grid_4.4.2 rstudioapi_0.17.1 hms_1.1.3
## [33] lifecycle_1.0.4 vctrs_0.6.5 evaluate_1.0.3 glue_1.8.0
## [37] farver_2.1.2 rmarkdown_2.29 tools_4.4.2 pkgconfig_2.0.3
## [41] htmltools_0.5.8.1