In the era of digital streaming, Spotify has become a symbol of how technology mediates everyday cultural experiences. Millions of users engage with the platform on a daily basis, curating playlists that accompany them through work, leisure, and social interactions. Creating a playlist is no longer a purely technical action, but a form of self-expression that reflects personal taste and identity.
From a data-driven perspective, each playlist offers a unique opportunity to explore the underlying structure of musical choices made by users. A playlist can be treated as a set of co-occurring artists, which makes it a natural and intuitive playground for modelling relationships in data. In this project, we focus on identifying which artists tend to appear together on playlists and whether these co-occurrences follow consistent and interpretable patterns.
To achieve this, we employ association rule mining, a method designed to capture dependencies between items within a set. In the context of Spotify playlists, association rules allow us to examine how the presence of one artist on a playlist affects the probability of another artist being included as well. Using a dataset consisting of over 69,000 playlists created by more than 9,000 Spotify users, we investigate two main aspects. First, we identify the most common and influential association rules that shape artists’ co-occurrence on playlists. Second, we narrow the analysis to a more personalized case by focusing on the author’s favourite artist — the rapper 2Pac — and explore which artists are most frequently included alongside him.
To pursue the goal of this project, we will use the dataset obtained from kaggle.com (https://www.kaggle.com/datasets/andrewmvd/spotify-playlists). This dataset is based on the subset of users in the #nowplaying dataset who published their #nowplaying tweets via Spotify up to 2021. In principle, the dataset holds users, their playlists and the tracks contained in these playlists. It total, we have over 6.6 million observations in the entire dataset.
Mining association rules at the individual track level would certainly be interesting; however, in practice, it would be unlikely to yield meaningful results. A preliminary inspection of the dataset reveals that most playlists contain a large number of tracks by the same artists, which is intuitively reasonable. Users often include entire albums by a single artist to create a kind of “mega-mix”, or add bundles of popular tracks by different artists within the same playlist. Taking an educated guess, this structure would lead to extremely high confidence values for associations between tracks by the same artist, thereby dominating the results and potentially obscuring more interesting cross-artist relationships.
Before embarking on the analysis of association rules governing artist inclusion in playlists, some data cleaning is required. The original dataset weighs over 1.1 GB, which makes working with it extremely time-consuming (especially during the compilation of this document). For this reason, we proceed with a lighter data_clean dataset. In this version, the column corresponding to individual track records has been removed, and a substantial amount of noise - most likely originating from the web-scraping data collection process - has been filtered out. In particular, we ensure that no user ID exceeds 32 characters. On Spotify, user IDs are represented by 32-character hexadecimal strings, and all inspected observations with different ID formats turned out to be invalid or corrupted entries. To further reduce the risk of such noise spilling over into other variables, we additionally impose a constraint that no playlist name or artist name exceeds 32 characters. While these restrictions may lead to some loss of information, they constitute an acceptable trade-off between data reliability and computational feasibility for the purposes of this analysis. Finally, to further narrow the scope and improve performance, the dataset is limited to the 1,000 most frequently occurring artists.
data <- read.csv("spotify_dataset.csv")
data_clean <- data[nchar(data$user_id) == 32,]
data_clean <- data_clean[,-3] #column id for titles of track records.
data_clean <- data_clean[str_detect(data_clean$user_id, pattern),]
data_clean <- data_clean[nchar(data_clean$artistname) %in% 1:32 & nchar(data_clean$playlistname) %in% 1:32,]
data_clean <- na.omit(data_clean)
artist_freq <- data_clean %>%
group_by(artistname) %>%
summarise(count = n(), .groups = "drop")
artist_freq <- sort_by(artist_freq, artist_freq$count, decreasing = T)
artist_list <- artist_freq$artistname[1:1000]
data_clean <- data_clean[data_clean$artistname %in% artist_list,]
Here you can take a glimpse into the structure of the dataset.
head(data_clean, 20)
## user_id artistname
## 1 9cc0cfd4d7d7885102480dd99e7a90d6 Elvis Costello
## 2 9cc0cfd4d7d7885102480dd99e7a90d6 Elvis Costello & The Attractions
## 3 9cc0cfd4d7d7885102480dd99e7a90d6 Elvis Costello & The Attractions
## 4 9cc0cfd4d7d7885102480dd99e7a90d6 Elvis Costello
## 5 9cc0cfd4d7d7885102480dd99e7a90d6 Paul McCartney
## 6 9cc0cfd4d7d7885102480dd99e7a90d6 Paul McCartney
## 7 9cc0cfd4d7d7885102480dd99e7a90d6 Paul McCartney
## 8 9cc0cfd4d7d7885102480dd99e7a90d6 Joshua Radin
## 9 9cc0cfd4d7d7885102480dd99e7a90d6 Joshua Radin
## 10 9cc0cfd4d7d7885102480dd99e7a90d6 Paul McCartney
## 11 9cc0cfd4d7d7885102480dd99e7a90d6 Elvis Costello & The Attractions
## 12 9cc0cfd4d7d7885102480dd99e7a90d6 Joshua Radin
## 13 9cc0cfd4d7d7885102480dd99e7a90d6 Elvis Costello
## 14 9cc0cfd4d7d7885102480dd99e7a90d6 Joshua Radin
## 15 9cc0cfd4d7d7885102480dd99e7a90d6 Joshua Radin
## 16 9cc0cfd4d7d7885102480dd99e7a90d6 Noah And The Whale
## 17 9cc0cfd4d7d7885102480dd99e7a90d6 Pearl Jam
## 18 9cc0cfd4d7d7885102480dd99e7a90d6 Tom Petty And The Heartbreakers
## 19 9cc0cfd4d7d7885102480dd99e7a90d6 Bruce Springsteen
## 20 9cc0cfd4d7d7885102480dd99e7a90d6 Madness
## playlistname
## 1 HARD ROCK 2010
## 2 HARD ROCK 2010
## 3 HARD ROCK 2010
## 4 HARD ROCK 2010
## 5 HARD ROCK 2010
## 6 HARD ROCK 2010
## 7 HARD ROCK 2010
## 8 HARD ROCK 2010
## 9 HARD ROCK 2010
## 10 HARD ROCK 2010
## 11 HARD ROCK 2010
## 12 HARD ROCK 2010
## 13 HARD ROCK 2010
## 14 HARD ROCK 2010
## 15 HARD ROCK 2010
## 16 IOW 2012
## 17 IOW 2012
## 18 IOW 2012
## 19 IOW 2012
## 20 IOW 2012
str(data_clean)
## 'data.frame': 2557789 obs. of 3 variables:
## $ user_id : chr "9cc0cfd4d7d7885102480dd99e7a90d6" "9cc0cfd4d7d7885102480dd99e7a90d6" "9cc0cfd4d7d7885102480dd99e7a90d6" "9cc0cfd4d7d7885102480dd99e7a90d6" ...
## $ artistname : chr "Elvis Costello" "Elvis Costello & The Attractions" "Elvis Costello & The Attractions" "Elvis Costello" ...
## $ playlistname: chr "HARD ROCK 2010" "HARD ROCK 2010" "HARD ROCK 2010" "HARD ROCK 2010" ...
At this stage, we construct a transaction representation of the data by treating each playlist as an individual transaction. Artists included in a given playlist are then interpreted as items co-occurring within the same transaction basket.
artist_playlist <- data_clean %>%
group_by(user_id, playlistname) %>%
summarize(artists = paste(unique(artistname), collapse = ";"), .groups = "drop")
artist_list <- strsplit(artist_playlist$artists, ";")
artist_trans <- as(artist_list, "transactions")
summary(artist_trans)
## transactions as itemMatrix in sparse format with
## 69279 rows (elements/itemsets/transactions) and
## 1000 columns (items) and a density of 0.01079034
##
## most frequent items:
## Coldplay Daft Punk Rihanna David Guetta Calvin Harris
## 4210 4044 3859 3515 3351
## (Other)
## 728565
##
## element (itemset/transaction) length distribution:
## sizes
## 1 2 3 4 5 6 7 8 9 10 11 12 13
## 26115 5656 3764 3157 2643 2269 2108 1808 1662 1427 1345 1154 1107
## 14 15 16 17 18 19 20 21 22 23 24 25 26
## 959 854 800 742 665 603 518 510 493 480 453 415 370
## 27 28 29 30 31 32 33 34 35 36 37 38 39
## 349 319 281 269 268 255 227 211 185 189 208 194 183
## 40 41 42 43 44 45 46 47 48 49 50 51 52
## 173 137 162 125 104 109 98 132 109 103 99 93 74
## 53 54 55 56 57 58 59 60 61 62 63 64 65
## 82 82 60 78 71 57 57 59 56 48 51 61 48
## 66 67 68 69 70 71 72 73 74 75 76 77 78
## 52 51 38 36 34 38 47 41 33 38 27 27 39
## 79 80 81 82 83 84 85 86 87 88 89 90 91
## 33 29 20 34 33 21 26 22 18 26 26 19 24
## 92 93 94 95 96 97 98 99 100 101 102 103 104
## 20 19 16 14 20 16 18 11 17 15 23 13 31
## 105 106 107 108 109 110 111 112 113 114 115 116 117
## 14 13 14 15 14 9 11 13 16 8 12 16 9
## 118 119 120 121 122 123 124 125 126 127 128 129 130
## 9 14 6 4 7 11 8 12 2 8 5 7 10
## 131 132 133 134 135 136 137 138 139 140 141 142 143
## 5 7 8 10 6 6 8 6 6 7 6 6 7
## 144 145 146 147 148 149 150 151 152 153 154 155 156
## 7 6 6 4 7 6 3 11 3 2 3 2 10
## 157 158 159 160 161 162 163 164 165 166 167 168 169
## 5 11 7 6 4 5 4 1 9 7 2 4 4
## 170 171 172 174 175 176 177 178 179 180 181 182 183
## 2 3 3 3 1 7 3 5 3 2 3 2 3
## 184 185 186 187 188 189 191 192 193 195 196 198 200
## 2 2 1 2 5 3 3 6 2 4 1 4 2
## 201 202 203 205 206 207 208 209 210 214 215 217 219
## 5 1 2 1 4 2 2 5 3 1 2 2 3
## 220 221 223 224 225 226 227 232 233 234 235 236 237
## 1 2 1 2 2 2 1 2 1 2 1 2 1
## 241 243 245 250 251 252 253 254 255 256 261 270 273
## 1 2 2 2 1 1 1 2 1 1 1 2 1
## 277 278 281 282 283 287 291 293 299 301 303 307 309
## 1 4 2 1 1 1 1 1 1 1 1 1 1
## 318 336 343 372 378 410 447 497 523 525 564 588
## 1 1 1 1 1 1 1 1 1 1 1 1
##
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.00 1.00 3.00 10.79 12.00 588.00
##
## includes extended item information - examples:
## labels
## 1 2 Chainz
## 2 2Pac
## 3 3 Doors Down
We can come to a few interesting conclusions at this point. First, let us look at the cumulative distribution and density of artist occurenses.
cummulative_frequency <- data.frame(
artist = artist_freq$artistname[1:1000],
freq = artist_freq$count[1:1000]
)
cummulative_frequency$cumfreq <- cumsum(cummulative_frequency$freq) /
sum(cummulative_frequency$freq)
cummulative_frequency$rank <- seq_len(nrow(cummulative_frequency))
thresholds <- c(0.1, 0.25, 0.5, 0.75, 0.9)
cutoffs <- sapply(thresholds, function(t) {
min(cummulative_frequency$rank[cummulative_frequency$cumfreq >= t])
})
| Threshold | Cutoff |
|---|---|
| 10% | 22 |
| 25% | 74 |
| 50% | 229 |
| 75% | 504 |
| 90% | 768 |
As can be observed, the cumulative distribution of artist occurrences exhibits a relatively smooth curve. Only 22 artists account for 10% of all playlist appearances, and 229 artists account for 50%, indicating a high concentration among the top-ranked artists. The distribution is strongly right-skewed, reflecting a pronounced long-tail pattern where a small number of artists appear in the majority of playlists, while the majority of artists appear much less frequently.
These observations allow us to make a few preliminary assumptions in our exploration of association rules. We can expect that the strongest and most frequent rules will cluster around the top-ranked artists, who dominate playlist appearances, while combinations involving less popular artists may have very low support and risk being underrepresented. This long-tail structure suggests that without careful parameter tuning, many subtle but potentially interesting relationships might remain hidden. Careful parameter tuning is therefore necessary to capture meaningful relationships across the full spectrum of artist occurrences.
These observations can be further reinforced by examining the binary incidence matrix of artists and playlists. Given that the dataset contains over 69 thousand unique playlists, we restrict the visualization to randomly selected 1,000 playlists to ensure better readability.
set.seed(1916)
image(sample(artist_trans, size = 1000), main = "Artist–playlist incidence matrix", xlab = "Artists (Items)", ylab = "Playlists (Transactions)")
Several conclusions can be drawn from this matrix. First, only a small number of distinct vertical stripes are visible in the plot. The scarcity of such stripes reinforces the observation that only a limited fraction of artists appear in the majority of playlists. Second, numerous horizontal stripes can be observed, indicating the presence of playlists that include a relatively large number of artists considered in this analysis. This pattern is also reflected in the summary(artist_trans) output, where four playlists were found to contain more than half of the studied artists. Finally, the presence of large empty areas within the matrix highlights the fact that many artists remain underrepresented in the overall dataset, appearing only sporadically across playlists
To achieve the objective of this analysis, we employ the apriori() algorithm, a well-established method for mining association rules in transactional data. The choice of this algorithm over alternative approaches, such as eclat() or fp_growth(), is motivated by its conceptual simplicity and greater transparency in parameter selection, which is particularly important for carefully controlling the support and confidence thresholds in the studied dataset.
In the first step, we generate association rules across the full set of analyzed artists while systematically tuning the model parameters. The analysis begins with conservative parameter restrictions, which are gradually relaxed in subsequent iterations. When considering the first parameter - support - it is important to note that even the most frequently occurring artist appears in slightly over 0.5% of playlists, while most prominent artists reach frequencies of around 0.4%.
itemFrequencyPlot(artist_trans, topN = 20)
artist_rules1 <- apriori(artist_trans, parameter = list(support = 0.004, confidence = 0.8, minlen = 2))
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.8 0.1 1 none FALSE TRUE 5 0.004 2
## maxlen target ext
## 10 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 277
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[1000 item(s), 69279 transaction(s)] done [0.06s].
## sorting and recoding items ... [869 item(s)] done [0.01s].
## creating transaction tree ... done [0.01s].
## checking subsets of size 1 2 3 4 5 done [0.08s].
## writing ... [287 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
summary(artist_rules1)
## set of 287 rules
##
## rule length distribution (lhs + rhs):sizes
## 3 4 5
## 39 218 30
##
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 3.000 4.000 4.000 3.969 4.000 5.000
##
## summary of quality measures:
## support confidence coverage lift
## Min. :0.004013 Min. :0.8000 Min. :0.004345 Min. :14.36
## 1st Qu.:0.004157 1st Qu.:0.8111 1st Qu.:0.004980 1st Qu.:15.20
## Median :0.004388 Median :0.8290 Median :0.005312 Median :16.35
## Mean :0.004612 Mean :0.8340 Mean :0.005538 Mean :16.35
## 3rd Qu.:0.004886 3rd Qu.:0.8490 3rd Qu.:0.005875 3rd Qu.:17.06
## Max. :0.007405 Max. :0.9468 Max. :0.009094 Max. :24.56
## count
## Min. :278.0
## 1st Qu.:288.0
## Median :304.0
## Mean :319.5
## 3rd Qu.:338.5
## Max. :513.0
##
## mining info:
## data ntransactions support confidence
## artist_trans 69279 0.004 0.8
## call
## apriori(data = artist_trans, parameter = list(support = 0.004, confidence = 0.8, minlen = 2))
inspect(sort(artist_rules1, by = "lift")[1:50])
## lhs rhs support confidence coverage lift count
## [1] {Kendrick Lamar,
## Lil Wayne} => {Drake} 0.004056063 0.8005698 0.005066470 24.56274 281
## [2] {David Guetta,
## Katy Perry,
## Ke$ha,
## Rihanna} => {Lady Gaga} 0.004041629 0.8022923 0.005037602 21.83111 280
## [3] {Calvin Harris,
## David Guetta,
## Pitbull,
## Swedish House Mafia} => {Avicii} 0.004157104 0.8520710 0.004878823 19.20320 288
## [4] {Armin van Buuren,
## Calvin Harris,
## David Guetta} => {Avicii} 0.004171538 0.8500000 0.004907692 19.15652 289
## [5] {Calvin Harris,
## Macklemore & Ryan Lewis,
## Pitbull} => {Avicii} 0.004431357 0.8297297 0.005340724 18.69969 307
## [6] {Lady Gaga,
## Miley Cyrus,
## Rihanna} => {Katy Perry} 0.004056063 0.8192420 0.004950995 18.68827 281
## [7] {David Guetta,
## Ke$ha,
## Lady Gaga,
## Rihanna} => {Katy Perry} 0.004041629 0.8187135 0.004936561 18.67621 280
## [8] {Flo Rida,
## Pitbull,
## Rihanna,
## Swedish House Mafia} => {David Guetta} 0.004113801 0.9468439 0.004344751 18.66185 285
## [9] {Calvin Harris,
## Imagine Dragons,
## Pitbull} => {Avicii} 0.004229276 0.8276836 0.005109774 18.65358 293
## [10] {David Guetta,
## Swedish House Mafia,
## will.i.am} => {Avicii} 0.004012760 0.8273810 0.004849955 18.64676 278
## [11] {Calvin Harris,
## Pitbull,
## Swedish House Mafia} => {Avicii} 0.004691176 0.8269720 0.005672715 18.63754 325
## [12] {Flo Rida,
## LMFAO,
## Swedish House Mafia} => {David Guetta} 0.004171538 0.9444444 0.004416923 18.61456 289
## [13] {Calvin Harris,
## David Guetta,
## Tiësto} => {Avicii} 0.005759321 0.8243802 0.006986244 18.57913 399
## [14] {Calvin Harris,
## OneRepublic,
## Pitbull} => {Avicii} 0.004229276 0.8230337 0.005138642 18.54878 293
## [15] {Calvin Harris,
## David Guetta,
## OneRepublic} => {Avicii} 0.005369593 0.8230088 0.006524344 18.54822 372
## [16] {Drake,
## JAY Z,
## Kendrick Lamar} => {Kanye West} 0.004777783 0.8826667 0.005412896 18.54724 331
## [17] {Calvin Harris,
## David Guetta,
## Icona Pop} => {Avicii} 0.004979864 0.8194774 0.006076878 18.46863 345
## [18] {Calvin Harris,
## David Guetta,
## Macklemore & Ryan Lewis} => {Avicii} 0.005311855 0.8177778 0.006495475 18.43033 368
## [19] {Calvin Harris,
## David Guetta,
## Rihanna,
## Swedish House Mafia} => {Avicii} 0.004445792 0.8169761 0.005441764 18.41226 308
## [20] {Britney Spears,
## Ke$ha,
## Rihanna} => {Katy Perry} 0.004287013 0.8070652 0.005311855 18.41049 297
## [21] {Drake,
## JAY Z,
## T.I.} => {Kanye West} 0.004041629 0.8750000 0.004619004 18.38615 280
## [22] {Calvin Harris,
## Pitbull,
## will.i.am} => {Avicii} 0.004402488 0.8155080 0.005398461 18.37917 305
## [23] {Calvin Harris,
## Ellie Goulding,
## Pitbull} => {Avicii} 0.004489095 0.8141361 0.005513936 18.34826 311
## [24] {Flo Rida,
## Pitbull,
## Swedish House Mafia} => {David Guetta} 0.005124208 0.9293194 0.005513936 18.31645 355
## [25] {Flo Rida,
## Rihanna,
## Swedish House Mafia} => {David Guetta} 0.005470633 0.9289216 0.005889231 18.30861 379
## [26] {Drake,
## Kanye West,
## Snoop Dogg} => {JAY Z} 0.004287013 0.8342697 0.005138642 18.26141 297
## [27] {Lady Gaga,
## P!nk,
## Rihanna} => {Katy Perry} 0.004517964 0.8005115 0.005643846 18.26099 313
## [28] {Eminem,
## Kanye West,
## Snoop Dogg} => {JAY Z} 0.004402488 0.8333333 0.005282986 18.24092 305
## [29] {A$AP Rocky,
## Drake,
## JAY Z} => {Kanye West} 0.004142669 0.8670695 0.004777783 18.21950 287
## [30] {Drake,
## Eminem,
## Kanye West} => {JAY Z} 0.004994298 0.8317308 0.006004706 18.20584 346
## [31] {Afrojack,
## Calvin Harris,
## David Guetta} => {Avicii} 0.004676742 0.8039702 0.005817059 18.11915 324
## [32] {Calvin Harris,
## David Guetta,
## Zedd} => {Avicii} 0.004748914 0.8024390 0.005918099 18.08464 329
## [33] {Swedish House Mafia,
## Zedd} => {Avicii} 0.004315882 0.8016086 0.005384027 18.06592 299
## [34] {LMFAO,
## Rihanna,
## Swedish House Mafia} => {David Guetta} 0.004416923 0.9161677 0.004821086 18.05723 306
## [35] {Calvin Harris,
## Flo Rida,
## Swedish House Mafia} => {Avicii} 0.004128235 0.8011204 0.005153077 18.05492 286
## [36] {Flo Rida,
## LMFAO,
## Pitbull,
## Rihanna} => {David Guetta} 0.004229276 0.9156250 0.004619004 18.04654 293
## [37] {Calvin Harris,
## David Guetta,
## Swedish House Mafia} => {Avicii} 0.007274932 0.8000000 0.009093665 18.02967 504
## [38] {Drake,
## Kanye West,
## T.I.} => {JAY Z} 0.004041629 0.8235294 0.004907692 18.02632 280
## [39] {Eminem,
## JAY Z,
## Lil Wayne} => {Kanye West} 0.004229276 0.8542274 0.004950995 17.94966 293
## [40] {LMFAO,
## Pitbull,
## Swedish House Mafia} => {David Guetta} 0.004041629 0.9090909 0.004445792 17.91776 280
## [41] {Drake,
## JAY Z,
## Lil Wayne} => {Kanye West} 0.005153077 0.8500000 0.006062443 17.86083 357
## [42] {Avicii,
## David Guetta,
## Zedd} => {Calvin Harris} 0.004748914 0.8635171 0.005499502 17.85246 329
## [43] {Drake,
## JAY Z,
## Snoop Dogg} => {Kanye West} 0.004287013 0.8461538 0.005066470 17.78001 297
## [44] {JAY Z,
## Rick Ross} => {Kanye West} 0.004157104 0.8445748 0.004922126 17.74683 288
## [45] {Avicii,
## Flo Rida,
## Swedish House Mafia} => {David Guetta} 0.004763348 0.8991826 0.005297421 17.72247 330
## [46] {Calvin Harris,
## Flo Rida,
## Swedish House Mafia} => {David Guetta} 0.004633439 0.8991597 0.005153077 17.72201 321
## [47] {Flo Rida,
## Katy Perry,
## Swedish House Mafia} => {David Guetta} 0.004243710 0.8990826 0.004720045 17.72050 294
## [48] {Big Sean,
## JAY Z} => {Kanye West} 0.004388054 0.8397790 0.005225249 17.64606 304
## [49] {Armin van Buuren,
## Avicii,
## David Guetta} => {Calvin Harris} 0.004171538 0.8525074 0.004893258 17.62485 289
## [50] {Eminem,
## Kanye West,
## Lil Wayne} => {JAY Z} 0.004229276 0.8049451 0.005254117 17.61952 293
plot(artist_rules1, method = "grouped", measure = "lift")
plot(artist_rules1, method = "graph", control = list(title = "Network of co-occurring artists in playlists"))
## Available control parameters (with default values):
## layout = stress
## circular = FALSE
## ggraphdots = NULL
## edges = <environment>
## nodes = <environment>
## nodetext = <environment>
## colors = c("#EE0000FF", "#EEEEEEFF")
## engine = ggplot2
## max = 100
## verbose = FALSE
plot(artist_rules1[1:100], method = "paracoord") #restriction on number of rules for better performance
print(paste0("Number of: significant rules - ",sum(is.significant(artist_rules1)), "; insignificant rules: ", sum(!is.significant(artist_rules1)), "."))
## [1] "Number of: significant rules - 287; insignificant rules: 0."
With a support threshold of 0.4% and a confidence threshold of 80%, we obtained a set of 287 rules exhibiting very high lift values (minimum lift = 14.36). This indicates that the presence of the artists on the left-hand side of a rule strongly predicts the inclusion of the artist on the right-hand side. Interestingly, the algorithm did not identify any rules of length 2, while most rules involve 4 artists.This is in line with low overall frequency of artists’ inclusion in analysed playlists. We can also see that every rule passed the test for Fischer significance.
Several noteworthy patterns emerge from these rules. First, the analysis suggests that inclusion of artists from the same music genre in a playlist tends to imply the inclusion of additional artists from the same genre. For example, the rule with the highest lift indicates that if a playlist contains rappers Kendrick Lamar and Lil Wayne, Drake appears on the same playlist approximately 25 times more often than expected by chance. This observation is intuitive, as playlists are often genre-specific, reflecting users’ preference to group artists of the same style rather than mixing genres.
When examining artists with the highest support and confidence, it is striking that these rules are largely permutations of a few key artists within the party-pop genre, including David Guetta, Pitbull, Swedish House Mafia, Rihanna, Calvin Harris, and LMFAO. This emphasizes that, in practice, popular playlists often feature repeated combinations of the same core set of artists, reflecting prevailing trends within a specific genre.
It is also worth emphasizing that most rules pivot around only a handful of artists. From the network and parallel coordinates plots, we can identify pivots centered on five main artists belonging to two distinct music genres. The first group corresponds to hip-hop, where the inclusion of most artists tends to lead to Kanye West serving as the final antecedent in the strongest rules. The second group is associated with the previously mentioned party-pop genre, with the strongest rule antecedents being Avicii, Calvin Harris, David Guetta, and Rihanna. These patterns reinforce the earlier observations that a relatively small set of popular artists dominates co-occurrence patterns in playlists, while the majority of artists appear only sporadically.
This approach focuses on applying the most “sensible” parameters, determined through an extensive trial-and-error process, which yield meaningful findings and align well with the underlying data structure. Let us now examine what happens when these parameters are modified.
First, we will increase the support threshold while lowering the confidence threshold.
artist_rules2 <- apriori(artist_trans, parameter = list(support = 0.01, confidence = 0.5, minlen = 2))
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.5 0.1 1 none FALSE TRUE 5 0.01 2
## maxlen target ext
## 10 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 692
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[1000 item(s), 69279 transaction(s)] done [0.06s].
## sorting and recoding items ... [387 item(s)] done [0.01s].
## creating transaction tree ... done [0.01s].
## checking subsets of size 1 2 3 4 done [0.02s].
## writing ... [39 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
summary(artist_rules2)
## set of 39 rules
##
## rule length distribution (lhs + rhs):sizes
## 2 3
## 13 26
##
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.000 2.000 3.000 2.667 3.000 3.000
##
## summary of quality measures:
## support confidence coverage lift
## Min. :0.01006 Min. :0.5000 Min. :0.01361 Min. : 8.976
## 1st Qu.:0.01028 1st Qu.:0.5216 1st Qu.:0.01625 1st Qu.:11.370
## Median :0.01091 Median :0.5860 Median :0.01898 Median :12.635
## Mean :0.01165 Mean :0.5988 Mean :0.01986 Mean :12.631
## 3rd Qu.:0.01215 3rd Qu.:0.6699 3rd Qu.:0.02164 3rd Qu.:14.261
## Max. :0.01833 Max. :0.7391 Max. :0.03523 Max. :17.107
## count
## Min. : 697.0
## 1st Qu.: 712.0
## Median : 756.0
## Mean : 806.8
## 3rd Qu.: 841.5
## Max. :1270.0
##
## mining info:
## data ntransactions support confidence
## artist_trans 69279 0.01 0.5
## call
## apriori(data = artist_trans, parameter = list(support = 0.01, confidence = 0.5, minlen = 2))
inspect(sort(artist_rules2, by = "lift"))
## lhs rhs support
## [1] {David Guetta, Rihanna} => {Flo Rida} 0.01102787
## [2] {Calvin Harris, Pitbull} => {Avicii} 0.01007520
## [3] {Drake, JAY Z} => {Kanye West} 0.01006077
## [4] {Drake, Kanye West} => {JAY Z} 0.01006077
## [5] {Katy Perry, Rihanna} => {Lady Gaga} 0.01078249
## [6] {The Who} => {The Rolling Stones} 0.01036389
## [7] {Calvin Harris, David Guetta} => {Avicii} 0.01401579
## [8] {Avicii, David Guetta} => {Pitbull} 0.01027728
## [9] {David Guetta, Rihanna} => {Pitbull} 0.01091240
## [10] {Lady Gaga, Rihanna} => {Katy Perry} 0.01078249
## [11] {Avicii, David Guetta} => {Calvin Harris} 0.01401579
## [12] {Flo Rida, Rihanna} => {David Guetta} 0.01102787
## [13] {Pitbull, Rihanna} => {David Guetta} 0.01091240
## [14] {Calvin Harris, Pitbull} => {David Guetta} 0.01014738
## [15] {Avicii, Pitbull} => {Calvin Harris} 0.01007520
## [16] {Avicii, Rihanna} => {David Guetta} 0.01007520
## [17] {Avicii, Pitbull} => {David Guetta} 0.01027728
## [18] {Calvin Harris, Rihanna} => {David Guetta} 0.01050823
## [19] {Avicii, Calvin Harris} => {David Guetta} 0.01401579
## [20] {David Guetta, Pitbull} => {Avicii} 0.01027728
## [21] {David Guetta, Katy Perry} => {Rihanna} 0.01131656
## [22] {David Guetta, Rihanna} => {Katy Perry} 0.01131656
## [23] {Katy Perry, Lady Gaga} => {Rihanna} 0.01078249
## [24] {LMFAO} => {David Guetta} 0.01073919
## [25] {Swedish House Mafia} => {Avicii} 0.01273113
## [26] {David Guetta, Flo Rida} => {Rihanna} 0.01102787
## [27] {Swedish House Mafia} => {David Guetta} 0.01446326
## [28] {Miley Cyrus} => {Katy Perry} 0.01034946
## [29] {David Guetta, Pitbull} => {Calvin Harris} 0.01014738
## [30] {Flo Rida} => {David Guetta} 0.01707588
## [31] {Katy Perry, Rihanna} => {David Guetta} 0.01131656
## [32] {David Guetta, Pitbull} => {Rihanna} 0.01091240
## [33] {Swedish House Mafia} => {Calvin Harris} 0.01239914
## [34] {Pitbull} => {David Guetta} 0.01833167
## [35] {Ke$ha} => {Rihanna} 0.01082579
## [36] {LMFAO} => {Rihanna} 0.01023398
## [37] {Jason Derulo} => {David Guetta} 0.01189394
## [38] {Flo Rida} => {Rihanna} 0.01538706
## [39] {Nicki Minaj} => {Rihanna} 0.01326520
## confidence coverage lift count
## [1] 0.5096731 0.02163715 17.107386 764
## [2] 0.6959123 0.01447769 15.683834 698
## [3] 0.7391304 0.01361163 15.531155 697
## [4] 0.6734300 0.01493959 14.740775 697
## [5] 0.5393502 0.01999163 14.676214 747
## [6] 0.5402558 0.01918330 14.643343 718
## [7] 0.6486306 0.02160828 14.618243 971
## [8] 0.5046067 0.02036692 14.321444 712
## [9] 0.5043362 0.02163715 14.313769 756
## [10] 0.6266779 0.01720579 14.295560 747
## [11] 0.6881644 0.02036692 14.227199 971
## [12] 0.7166979 0.01538706 14.125780 764
## [13] 0.7032558 0.01551697 13.860842 756
## [14] 0.7008973 0.01447769 13.814357 703
## [15] 0.6679426 0.01508394 13.809130 698
## [16] 0.6863324 0.01467977 13.527288 698
## [17] 0.6813397 0.01508394 13.428886 712
## [18] 0.6535009 0.01607991 12.880196 728
## [19] 0.6464714 0.02168045 12.741647 971
## [20] 0.5606299 0.01833167 12.634964 712
## [21] 0.6718081 0.01684493 12.060687 784
## [22] 0.5230153 0.02163715 11.930846 784
## [23] 0.6564148 0.01642633 11.784337 747
## [24] 0.5971108 0.01798525 11.768773 744
## [25] 0.5157895 0.02468280 11.624391 882
## [26] 0.6458157 0.01707588 11.594057 764
## [27] 0.5859649 0.02468280 11.549093 1002
## [28] 0.5042194 0.02052570 11.502080 717
## [29] 0.5535433 0.01833167 11.444025 703
## [30] 0.5731589 0.02979258 11.296693 1183
## [31] 0.5660650 0.01999163 11.156875 784
## [32] 0.5952756 0.01833167 10.686732 756
## [33] 0.5023392 0.02468280 10.385424 859
## [34] 0.5202786 0.03523434 10.254446 1270
## [35] 0.5703422 0.01898122 10.239113 750
## [36] 0.5690209 0.01798525 10.215392 709
## [37] 0.5012165 0.02373013 9.878743 824
## [38] 0.5164729 0.02979258 9.272020 1066
## [39] 0.5000000 0.02653041 8.976289 919
plot(artist_rules2, method = "grouped", measure = "lift")
plot(artist_rules2, method = "graph", control = list(title = "Network of co-occurring artists in playlists"))
## Available control parameters (with default values):
## layout = stress
## circular = FALSE
## ggraphdots = NULL
## edges = <environment>
## nodes = <environment>
## nodetext = <environment>
## colors = c("#EE0000FF", "#EEEEEEFF")
## engine = ggplot2
## max = 100
## verbose = FALSE
plot(artist_rules2, method = "paracoord")
Lowering the confidence threshold to 50% while simultaneously increasing the support threshold to 1% yields a new set of 39 rules. This time, the rules also consist of two artists. What is particularly noteworthy is that many of the artists appearing in this set also occurred in the previous set generated with stricter confidence parameter, although they are now represented in shorter rules. This pattern suggests that popular playlists are often composed of recurring combinations of the same core artists, reflecting consistent trends in user-generated playlists.
artist_rules3 <- apriori(artist_trans, parameter = list(support = 0.003, confidence = 0.9, minlen = 2))
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.9 0.1 1 none FALSE TRUE 5 0.003 2
## maxlen target ext
## 10 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 207
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[1000 item(s), 69279 transaction(s)] done [0.07s].
## sorting and recoding items ... [944 item(s)] done [0.01s].
## creating transaction tree ... done [0.02s].
## checking subsets of size 1 2 3 4 5 6 done [0.16s].
## writing ... [138 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
summary(artist_rules3)
## set of 138 rules
##
## rule length distribution (lhs + rhs):sizes
## 4 5 6
## 40 92 6
##
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 4.000 4.000 5.000 4.754 5.000 6.000
##
## summary of quality measures:
## support confidence coverage lift
## Min. :0.003002 Min. :0.9004 Min. :0.003190 Min. :16.16
## 1st Qu.:0.003089 1st Qu.:0.9051 1st Qu.:0.003363 1st Qu.:17.75
## Median :0.003248 Median :0.9124 Median :0.003551 Median :17.98
## Mean :0.003338 Mean :0.9170 Mean :0.003641 Mean :17.80
## 3rd Qu.:0.003446 3rd Qu.:0.9257 3rd Qu.:0.003767 3rd Qu.:18.35
## Max. :0.005471 Max. :0.9638 Max. :0.005889 Max. :20.57
## count
## Min. :208.0
## 1st Qu.:214.0
## Median :225.0
## Mean :231.2
## 3rd Qu.:238.8
## Max. :379.0
##
## mining info:
## data ntransactions support confidence
## artist_trans 69279 0.003 0.9
## call
## apriori(data = artist_trans, parameter = list(support = 0.003, confidence = 0.9, minlen = 2))
inspect(sort(artist_rules3, by = "lift")[1:50])
## lhs rhs support confidence coverage lift count
## [1] {Calvin Harris,
## David Guetta,
## Swedish House Mafia,
## Tiësto} => {Avicii} 0.003478688 0.9128788 0.003810679 20.57363 241
## [2] {Armin van Buuren,
## Calvin Harris,
## Swedish House Mafia} => {Avicii} 0.003319909 0.9019608 0.003680769 20.32757 230
## [3] {A$AP Rocky,
## JAY Z,
## Lil Wayne} => {Kanye West} 0.003161131 0.9163180 0.003449819 19.25435 219
## [4] {Drake,
## JAY Z,
## Lil Wayne,
## T.I.} => {Kanye West} 0.003002353 0.9162996 0.003276606 19.25396 208
## [5] {Drake,
## Eminem,
## JAY Z,
## Lil Wayne} => {Kanye West} 0.003233303 0.9105691 0.003550860 19.13355 224
## [6] {A$AP Rocky,
## Drake,
## JAY Z,
## Kendrick Lamar} => {Kanye West} 0.003074525 0.9102564 0.003377647 19.12698 213
## [7] {Dr. Dre,
## Drake,
## JAY Z} => {Kanye West} 0.003348778 0.9062500 0.003695203 19.04279 232
## [8] {JAY Z,
## Kendrick Lamar,
## Lil Wayne} => {Kanye West} 0.003305475 0.9051383 0.003651900 19.01944 229
## [9] {Katy Perry,
## LMFAO,
## Rihanna,
## Swedish House Mafia} => {David Guetta} 0.003074525 0.9638009 0.003190000 18.99606 213
## [10] {Flo Rida,
## LMFAO,
## Pitbull,
## Swedish House Mafia} => {David Guetta} 0.003291041 0.9620253 0.003420950 18.96107 228
## [11] {Flo Rida,
## LMFAO,
## Rihanna,
## Swedish House Mafia} => {David Guetta} 0.003507556 0.9604743 0.003651900 18.93050 243
## [12] {Flo Rida,
## Lady Gaga,
## Rihanna,
## Swedish House Mafia} => {David Guetta} 0.003146697 0.9603524 0.003276606 18.92810 218
## [13] {Avicii,
## Calvin Harris,
## Flo Rida,
## Rihanna,
## Swedish House Mafia} => {David Guetta} 0.003031222 0.9502262 0.003190000 18.72851 210
## [14] {Avicii,
## Tiësto,
## Zedd} => {Calvin Harris} 0.003464253 0.9056604 0.003825113 18.72374 240
## [15] {Calvin Harris,
## Flo Rida,
## Rihanna,
## Swedish House Mafia} => {David Guetta} 0.003651900 0.9475655 0.003853982 18.67607 253
## [16] {Avicii,
## David Guetta,
## Ellie Goulding,
## Swedish House Mafia} => {Calvin Harris} 0.003233303 0.9032258 0.003579728 18.67341 224
## [17] {Flo Rida,
## Pitbull,
## Rihanna,
## Swedish House Mafia} => {David Guetta} 0.004113801 0.9468439 0.004344751 18.66185 285
## [18] {Avicii,
## David Guetta,
## Swedish House Mafia,
## Tiësto} => {Calvin Harris} 0.003478688 0.9026217 0.003853982 18.66092 241
## [19] {Avicii,
## David Guetta,
## Swedish House Mafia,
## Zedd} => {Calvin Harris} 0.003146697 0.9008264 0.003493122 18.62380 218
## [20] {Flo Rida,
## LMFAO,
## Swedish House Mafia} => {David Guetta} 0.004171538 0.9444444 0.004416923 18.61456 289
## [21] {Enrique Iglesias,
## Pitbull,
## Swedish House Mafia} => {David Guetta} 0.003117828 0.9432314 0.003305475 18.59065 216
## [22] {Calvin Harris,
## Rihanna,
## Swedish House Mafia,
## will.i.am} => {David Guetta} 0.003031222 0.9417040 0.003218869 18.56054 210
## [23] {Calvin Harris,
## Flo Rida,
## Pitbull,
## Swedish House Mafia} => {David Guetta} 0.003420950 0.9404762 0.003637466 18.53634 237
## [24] {Avicii,
## Flo Rida,
## Pitbull,
## Swedish House Mafia} => {David Guetta} 0.003594163 0.9396226 0.003825113 18.51952 249
## [25] {Flo Rida,
## Maroon 5,
## Rihanna,
## Swedish House Mafia} => {David Guetta} 0.003117828 0.9391304 0.003319909 18.50982 216
## [26] {Flo Rida,
## Katy Perry,
## Rihanna,
## Swedish House Mafia} => {David Guetta} 0.003680769 0.9375000 0.003926154 18.47768 255
## [27] {Flo Rida,
## Swedish House Mafia,
## The Black Eyed Peas} => {David Guetta} 0.003406516 0.9365079 0.003637466 18.45813 236
## [28] {Flo Rida,
## Katy Perry,
## Pitbull,
## Swedish House Mafia} => {David Guetta} 0.003348778 0.9354839 0.003579728 18.43795 232
## [29] {Avicii,
## Flo Rida,
## Rihanna,
## Swedish House Mafia} => {David Guetta} 0.003752941 0.9352518 0.004012760 18.43337 260
## [30] {Flo Rida,
## Lady Gaga,
## LMFAO,
## Pitbull} => {David Guetta} 0.003088959 0.9344978 0.003305475 18.41851 214
## [31] {Avicii,
## Rihanna,
## Swedish House Mafia,
## will.i.am} => {David Guetta} 0.003045656 0.9336283 0.003262172 18.40138 211
## [32] {Flo Rida,
## Katy Perry,
## Lady Gaga,
## LMFAO} => {David Guetta} 0.003016787 0.9330357 0.003233303 18.38970 209
## [33] {Avicii,
## Flo Rida,
## Jason Derulo,
## Rihanna} => {David Guetta} 0.003002353 0.9327354 0.003218869 18.38378 208
## [34] {Avicii,
## Flo Rida,
## LMFAO,
## Rihanna} => {David Guetta} 0.003348778 0.9317269 0.003594163 18.36390 232
## [35] {LMFAO,
## Pitbull,
## Rihanna,
## Swedish House Mafia} => {David Guetta} 0.003319909 0.9311741 0.003565294 18.35300 230
## [36] {Calvin Harris,
## Flo Rida,
## Pitbull,
## will.i.am} => {David Guetta} 0.003074525 0.9301310 0.003305475 18.33245 213
## [37] {Flo Rida,
## Pitbull,
## Swedish House Mafia} => {David Guetta} 0.005124208 0.9293194 0.005513936 18.31645 355
## [38] {Lady Gaga,
## Pitbull,
## Rihanna,
## Swedish House Mafia} => {David Guetta} 0.003031222 0.9292035 0.003262172 18.31417 210
## [39] {Flo Rida,
## Rihanna,
## Swedish House Mafia} => {David Guetta} 0.005470633 0.9289216 0.005889231 18.30861 379
## [40] {Calvin Harris,
## Flo Rida,
## Rihanna,
## will.i.am} => {David Guetta} 0.003305475 0.9271255 0.003565294 18.27321 229
## [41] {Katy Perry,
## Pitbull,
## Rihanna,
## Swedish House Mafia} => {David Guetta} 0.003651900 0.9267399 0.003940588 18.26561 253
## [42] {Flo Rida,
## LMFAO,
## Snoop Dogg} => {David Guetta} 0.003103394 0.9267241 0.003348778 18.26530 215
## [43] {Katy Perry,
## LMFAO,
## Swedish House Mafia} => {David Guetta} 0.003449819 0.9263566 0.003724072 18.25805 239
## [44] {Pitbull,
## Swedish House Mafia,
## The Black Eyed Peas} => {David Guetta} 0.003262172 0.9262295 0.003521991 18.25555 226
## [45] {Pitbull,
## Swedish House Mafia,
## Taio Cruz} => {David Guetta} 0.003074525 0.9260870 0.003319909 18.25274 213
## [46] {Flo Rida,
## Katy Perry,
## LMFAO,
## Pitbull,
## Rihanna} => {David Guetta} 0.003060090 0.9257642 0.003305475 18.24638 212
## [47] {Flo Rida,
## Swedish House Mafia,
## Taio Cruz} => {David Guetta} 0.003233303 0.9256198 0.003493122 18.24353 224
## [48] {Flo Rida,
## LMFAO,
## Rihanna,
## The Black Eyed Peas} => {David Guetta} 0.003045656 0.9254386 0.003291041 18.23996 211
## [49] {Enrique Iglesias,
## Rihanna,
## Swedish House Mafia} => {David Guetta} 0.003117828 0.9230769 0.003377647 18.19341 216
## [50] {Flo Rida,
## Jason Derulo,
## Swedish House Mafia} => {David Guetta} 0.003088959 0.9224138 0.003348778 18.18034 214
plot(artist_rules3, method = "grouped", measure = "lift")
plot(artist_rules3, method = "graph", control = list(title = "Network of co-occurring artists in playlists"))
## Available control parameters (with default values):
## layout = stress
## circular = FALSE
## ggraphdots = NULL
## edges = <environment>
## nodes = <environment>
## nodetext = <environment>
## colors = c("#EE0000FF", "#EEEEEEFF")
## engine = ggplot2
## max = 100
## verbose = FALSE
plot(artist_rules3[1:100], method = "paracoord")
In this iteration, we lowered the support threshold to 0.3% while simultaneously increasing the confidence threshold to 90%. This yields a set of 138 rules, none shorter than four artists, while also introducing rules containing up to six artists. Raising the confidence threshold eliminates many previously strong rules in which more significant number of artists served as final antecedents, leaving David Guetta as the sole prominent pivot. The distinction between the hip-hop and party-pop subgroups is now less pronounced, with more interactions observed between them. These results reinforce previous findings that association rules describing artist co-occurrences in playlists are primarily driven by the most popular artists within similar genres, although there can be some exceptions from that trend.
To conclude the first part of this study, the analysis of association rules in Spotify playlists allows for several general observations. High confidence and lift values indicate that certain combinations of artists strongly predict the presence of others (especially when they create music in the same genre), suggesting clear patterns of co-occurrence within user-generated playlists. However, it is important to consider that most of these associations are based on relatively low support, meaning that they occur in only a small fraction of all playlists. As a result, while the rules are statistically strong, their overall prevalence in the dataset is limited, and they may primarily reflect the behavior of playlists centered around the most popular artists. Nevertheless, it is not against common sense to assume, that these finding may be in accordance with reality, as it is highly plausible that many playlists are composed of the songs created by the same most popular artists and are of the same genre of music.
We now continue our analysis, this time focusing on association rules involving a single artist — 2Pac. Narrowing the scope allows us to examine less popular artists, resulting in fewer rules and a more straightforward interpretation. As before, we employ the apriori() algorithm to identify these rules. In this case, the research question is simpler: which artists’ inclusion on a playlist increases the likelihood of 2Pac being included?
After fine-tuning the parameters, the following algorithm yielded the best results.
shakur_rules <- apriori(artist_trans, parameter = list(support = 0.001, confidence = 0.75, minlen = 2),
appearance = list(default = "lhs", rhs = "2Pac"), control = list(verbose=F))
summary(shakur_rules)
## set of 80 rules
##
## rule length distribution (lhs + rhs):sizes
## 4 5
## 30 50
##
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 4.000 4.000 5.000 4.625 5.000 5.000
##
## summary of quality measures:
## support confidence coverage lift
## Min. :0.001010 Min. :0.7500 Min. :0.001227 Min. :53.29
## 1st Qu.:0.001039 1st Qu.:0.7608 1st Qu.:0.001328 1st Qu.:54.06
## Median :0.001090 Median :0.7784 Median :0.001386 Median :55.31
## Mean :0.001138 Mean :0.7865 Mean :0.001449 Mean :55.89
## 3rd Qu.:0.001202 3rd Qu.:0.8050 3rd Qu.:0.001544 3rd Qu.:57.20
## Max. :0.001674 Max. :0.8824 Max. :0.002223 Max. :62.70
## count
## Min. : 70.00
## 1st Qu.: 72.00
## Median : 75.50
## Mean : 78.83
## 3rd Qu.: 83.25
## Max. :116.00
##
## mining info:
## data ntransactions support confidence
## artist_trans 69279 0.001 0.75
## call
## apriori(data = artist_trans, parameter = list(support = 0.001, confidence = 0.75, minlen = 2), appearance = list(default = "lhs", rhs = "2Pac"), control = list(verbose = F))
inspect(sort(shakur_rules, by = "lift"))
## lhs rhs support confidence coverage lift count
## [1] {Ice Cube,
## Nas,
## Snoop Dogg,
## Wu-Tang Clan} => {2Pac} 0.001082579 0.8823529 0.001226923 62.69593 75
## [2] {Dr. Dre,
## Ice Cube,
## Snoop Dogg,
## Wu-Tang Clan} => {2Pac} 0.001097014 0.8636364 0.001270226 61.36601 76
## [3] {Ice Cube,
## Snoop Dogg,
## The Notorious B.I.G.,
## Wu-Tang Clan} => {2Pac} 0.001255792 0.8613861 0.001457873 61.20612 87
## [4] {Dr. Dre,
## Ice Cube,
## The Notorious B.I.G.,
## Wu-Tang Clan} => {2Pac} 0.001068145 0.8505747 0.001255792 60.43791 74
## [5] {Dr. Dre,
## Ice Cube,
## Nas,
## Snoop Dogg} => {2Pac} 0.001053710 0.8488372 0.001241357 60.31445 73
## [6] {Ice Cube,
## Nas,
## OutKast,
## Snoop Dogg} => {2Pac} 0.001039276 0.8372093 0.001241357 59.48823 72
## [7] {Ice Cube,
## Nas,
## Snoop Dogg,
## The Notorious B.I.G.} => {2Pac} 0.001226923 0.8333333 0.001472308 59.21282 85
## [8] {Eminem,
## Ice Cube,
## Snoop Dogg,
## The Notorious B.I.G.} => {2Pac} 0.001140317 0.8315789 0.001371267 59.08816 79
## [9] {Eminem,
## Nas,
## Snoop Dogg,
## The Notorious B.I.G.} => {2Pac} 0.001140317 0.8315789 0.001371267 59.08816 79
## [10] {Ice Cube,
## OutKast,
## The Notorious B.I.G.,
## Wu-Tang Clan} => {2Pac} 0.001039276 0.8275862 0.001255792 58.80446 72
## [11] {Ice Cube,
## JAY Z,
## Nas,
## Snoop Dogg} => {2Pac} 0.001169185 0.8265306 0.001414570 58.72945 81
## [12] {Dr. Dre,
## Ice Cube,
## JAY Z,
## Wu-Tang Clan} => {2Pac} 0.001024842 0.8255814 0.001241357 58.66200 71
## [13] {Cypress Hill,
## JAY Z,
## Snoop Dogg,
## The Notorious B.I.G.} => {2Pac} 0.001024842 0.8255814 0.001241357 58.66200 71
## [14] {Ice Cube,
## JAY Z,
## Snoop Dogg,
## Wu-Tang Clan} => {2Pac} 0.001183620 0.8200000 0.001443439 58.26542 82
## [15] {Nas,
## Snoop Dogg,
## The Notorious B.I.G.,
## Wu-Tang Clan} => {2Pac} 0.001284661 0.8165138 0.001573348 58.01770 89
## [16] {Ice Cube,
## JAY Z,
## The Notorious B.I.G.,
## Wu-Tang Clan} => {2Pac} 0.001212489 0.8155340 0.001486742 57.94808 84
## [17] {Ice Cube,
## Method Man,
## Wu-Tang Clan} => {2Pac} 0.001010407 0.8139535 0.001241357 57.83578 70
## [18] {Ice Cube,
## Nas,
## The Notorious B.I.G.,
## Wu-Tang Clan} => {2Pac} 0.001111448 0.8105263 0.001371267 57.59226 77
## [19] {Ice Cube,
## Nas,
## Snoop Dogg} => {2Pac} 0.001400136 0.8083333 0.001732127 57.43644 97
## [20] {DMX,
## Ice Cube,
## Snoop Dogg} => {2Pac} 0.001140317 0.8061224 0.001414570 57.27934 79
## [21] {Nas,
## OutKast,
## Snoop Dogg,
## Wu-Tang Clan} => {2Pac} 0.001010407 0.8045977 0.001255792 57.17100 70
## [22] {Dr. Dre,
## Ice Cube,
## Wu-Tang Clan} => {2Pac} 0.001270226 0.8000000 0.001587783 56.84431 88
## [23] {Ice Cube,
## Snoop Dogg,
## Wu-Tang Clan} => {2Pac} 0.001443439 0.8000000 0.001804299 56.84431 100
## [24] {Eminem,
## Ice Cube,
## Wu-Tang Clan} => {2Pac} 0.001039276 0.8000000 0.001299095 56.84431 72
## [25] {Ice Cube,
## The Game,
## The Notorious B.I.G.} => {2Pac} 0.001024842 0.7977528 0.001284661 56.68463 71
## [26] {OutKast,
## Snoop Dogg,
## The Notorious B.I.G.,
## Wu-Tang Clan} => {2Pac} 0.001068145 0.7956989 0.001342398 56.53869 74
## [27] {Ice Cube,
## Method Man,
## Snoop Dogg} => {2Pac} 0.001010407 0.7954545 0.001270226 56.52133 70
## [28] {Ice Cube,
## The Notorious B.I.G.,
## Wu-Tang Clan} => {2Pac} 0.001515611 0.7954545 0.001905339 56.52133 105
## [29] {Dr. Dre,
## Ice Cube,
## JAY Z,
## Nas} => {2Pac} 0.001010407 0.7954545 0.001270226 56.52133 70
## [30] {Dr. Dre,
## Nas,
## OutKast,
## The Notorious B.I.G.} => {2Pac} 0.001097014 0.7916667 0.001385701 56.25218 76
## [31] {Dr. Dre,
## Snoop Dogg,
## The Notorious B.I.G.,
## Wu-Tang Clan} => {2Pac} 0.001140317 0.7900000 0.001443439 56.13375 79
## [32] {Ice Cube,
## JAY Z,
## Method Man} => {2Pac} 0.001010407 0.7865169 0.001284661 55.88626 70
## [33] {Method Man,
## Snoop Dogg,
## Wu-Tang Clan} => {2Pac} 0.001097014 0.7835052 0.001400136 55.67226 76
## [34] {Ice Cube,
## Snoop Dogg,
## The Game} => {2Pac} 0.001097014 0.7835052 0.001400136 55.67226 76
## [35] {Method Man,
## Nas,
## Snoop Dogg,
## The Notorious B.I.G.} => {2Pac} 0.001039276 0.7826087 0.001327964 55.60856 72
## [36] {Dr. Dre,
## Nas,
## OutKast,
## Snoop Dogg} => {2Pac} 0.001039276 0.7826087 0.001327964 55.60856 72
## [37] {Dr. Dre,
## Eminem,
## Snoop Dogg,
## The Notorious B.I.G.} => {2Pac} 0.001342398 0.7815126 0.001717692 55.53068 93
## [38] {Ice Cube,
## Method Man,
## The Notorious B.I.G.} => {2Pac} 0.001082579 0.7812500 0.001385701 55.51202 75
## [39] {DMX,
## Nas,
## Snoop Dogg} => {2Pac} 0.001024842 0.7802198 0.001313529 55.43882 71
## [40] {Busta Rhymes,
## Ice Cube,
## Snoop Dogg,
## The Notorious B.I.G.} => {2Pac} 0.001068145 0.7789474 0.001371267 55.34840 74
## [41] {Cypress Hill,
## Ice Cube,
## JAY Z} => {2Pac} 0.001010407 0.7777778 0.001299095 55.26530 70
## [42] {Busta Rhymes,
## Eminem,
## Snoop Dogg,
## The Notorious B.I.G.} => {2Pac} 0.001010407 0.7777778 0.001299095 55.26530 70
## [43] {50 Cent,
## Nas,
## Snoop Dogg,
## The Notorious B.I.G.} => {2Pac} 0.001010407 0.7777778 0.001299095 55.26530 70
## [44] {Nas,
## OutKast,
## Snoop Dogg,
## The Notorious B.I.G.} => {2Pac} 0.001299095 0.7758621 0.001674389 55.12918 90
## [45] {50 Cent,
## Eminem,
## Snoop Dogg,
## The Notorious B.I.G.} => {2Pac} 0.001140317 0.7745098 0.001472308 55.03309 79
## [46] {Ice Cube,
## JAY Z,
## Nas,
## Wu-Tang Clan} => {2Pac} 0.001039276 0.7741935 0.001342398 55.01062 72
## [47] {Dr. Dre,
## Nas,
## Snoop Dogg,
## The Notorious B.I.G.} => {2Pac} 0.001212489 0.7706422 0.001573348 54.75828 84
## [48] {Dr. Dre,
## JAY Z,
## Nas,
## OutKast} => {2Pac} 0.001154751 0.7692308 0.001501176 54.65799 80
## [49] {DMX,
## Snoop Dogg,
## The Notorious B.I.G.} => {2Pac} 0.001342398 0.7685950 0.001746561 54.61282 93
## [50] {Dr. Dre,
## Method Man,
## Snoop Dogg} => {2Pac} 0.001053710 0.7684211 0.001371267 54.60045 73
## [51] {Dr. Dre,
## Eminem,
## Ice Cube,
## The Notorious B.I.G.} => {2Pac} 0.001053710 0.7684211 0.001371267 54.60045 73
## [52] {Dr. Dre,
## Lil Wayne,
## Snoop Dogg,
## The Notorious B.I.G.} => {2Pac} 0.001053710 0.7684211 0.001371267 54.60045 73
## [53] {Dr. Dre,
## Ice Cube,
## Snoop Dogg,
## The Notorious B.I.G.} => {2Pac} 0.001385701 0.7680000 0.001804299 54.57054 96
## [54] {Cypress Hill,
## JAY Z,
## The Notorious B.I.G.} => {2Pac} 0.001183620 0.7663551 0.001544480 54.45366 82
## [55] {Ice Cube,
## OutKast,
## Snoop Dogg,
## The Notorious B.I.G.} => {2Pac} 0.001226923 0.7657658 0.001602217 54.41178 85
## [56] {Dr. Dre,
## Ice Cube,
## Nas,
## The Notorious B.I.G.} => {2Pac} 0.001068145 0.7628866 0.001400136 54.20720 74
## [57] {JAY Z,
## Snoop Dogg,
## The Notorious B.I.G.,
## Wu-Tang Clan} => {2Pac} 0.001299095 0.7627119 0.001703258 54.19478 90
## [58] {Ice Cube,
## OutKast,
## Wu-Tang Clan} => {2Pac} 0.001198054 0.7614679 0.001573348 54.10639 83
## [59] {Cypress Hill,
## Ice Cube,
## The Notorious B.I.G.} => {2Pac} 0.001010407 0.7608696 0.001327964 54.06388 70
## [60] {Cypress Hill,
## Nas,
## Snoop Dogg} => {2Pac} 0.001010407 0.7608696 0.001327964 54.06388 70
## [61] {Ice Cube,
## JAY Z,
## OutKast,
## Wu-Tang Clan} => {2Pac} 0.001053710 0.7604167 0.001385701 54.03170 73
## [62] {Busta Rhymes,
## OutKast,
## Snoop Dogg,
## The Notorious B.I.G.} => {2Pac} 0.001053710 0.7604167 0.001385701 54.03170 73
## [63] {Cypress Hill,
## Ice Cube,
## Snoop Dogg} => {2Pac} 0.001140317 0.7596154 0.001501176 53.97476 79
## [64] {Busta Rhymes,
## Nas,
## Snoop Dogg} => {2Pac} 0.001226923 0.7589286 0.001616652 53.92596 85
## [65] {Cypress Hill,
## Snoop Dogg,
## The Notorious B.I.G.} => {2Pac} 0.001226923 0.7589286 0.001616652 53.92596 85
## [66] {Busta Rhymes,
## Nas,
## Snoop Dogg,
## The Notorious B.I.G.} => {2Pac} 0.001039276 0.7578947 0.001371267 53.85250 72
## [67] {DMX,
## JAY Z,
## Snoop Dogg,
## The Notorious B.I.G.} => {2Pac} 0.001082579 0.7575758 0.001429004 53.82984 75
## [68] {50 Cent,
## Ice Cube,
## The Notorious B.I.G.} => {2Pac} 0.001169185 0.7570093 0.001544480 53.78959 81
## [69] {Nas,
## Snoop Dogg,
## Wu-Tang Clan} => {2Pac} 0.001429004 0.7557252 0.001890905 53.69834 99
## [70] {Dr. Dre,
## Kanye West,
## The Game,
## The Notorious B.I.G.} => {2Pac} 0.001024842 0.7553191 0.001356833 53.66949 71
## [71] {JAY Z,
## OutKast,
## Snoop Dogg,
## Wu-Tang Clan} => {2Pac} 0.001068145 0.7551020 0.001414570 53.65407 74
## [72] {Snoop Dogg,
## The Notorious B.I.G.,
## Wu-Tang Clan} => {2Pac} 0.001674389 0.7532468 0.002222896 53.52224 116
## [73] {Method Man,
## OutKast,
## Snoop Dogg} => {2Pac} 0.001010407 0.7526882 0.001342398 53.48255 70
## [74] {JAY Z,
## Nas,
## Snoop Dogg,
## Wu-Tang Clan} => {2Pac} 0.001140317 0.7523810 0.001515611 53.46072 79
## [75] {Dr. Dre,
## Eminem,
## Ice Cube,
## Snoop Dogg} => {2Pac} 0.001140317 0.7523810 0.001515611 53.46072 79
## [76] {Dr. Dre,
## JAY Z,
## Nas,
## The Notorious B.I.G.} => {2Pac} 0.001313529 0.7520661 0.001746561 53.43835 91
## [77] {Method Man,
## OutKast,
## The Notorious B.I.G.} => {2Pac} 0.001039276 0.7500000 0.001385701 53.29154 72
## [78] {DMX,
## OutKast,
## The Notorious B.I.G.} => {2Pac} 0.001082579 0.7500000 0.001443439 53.29154 75
## [79] {Kanye West,
## Snoop Dogg,
## The Game,
## The Notorious B.I.G.} => {2Pac} 0.001169185 0.7500000 0.001558914 53.29154 81
## [80] {Drake,
## Ice Cube,
## Kanye West,
## Snoop Dogg} => {2Pac} 0.001039276 0.7500000 0.001385701 53.29154 72
The resulting rules are relatively homogeneous in terms of their structure: all 80 rules consist of either four or five artists, with no shorter rules identified. This suggests that the inclusion of 2Pac in a playlist is rarely associated with the presence of a single artist or a simple pair, but rather emerges in richer, more densely connected artist combinations. Such patterns are consistent with the notion that 2Pac appears primarily in thematically coherent, “old school gangsta rap” focused playlists rather than in minimal or loosely curated collections.
From the perspective of quality measures, the rules exhibit exceptionally high values lift, which range from approximately 53 to over 62. These magnitudes imply that 2Pac appears on playlists containing these specific artist combinations over fifty times more often than would be expected under independence, pointing to a very strong associative relationship with fellow hip-hop artists, such as West Coast-affiliated Ice Cube, Snoop Dogg, Dr. Dre or the Wu-Tang Clan, or even his greatest nemesis from the East Coast - The Notorious B.I.G. Closer examination of these rules highlights that no artists performing in other genre than hip-hop has meaningful association rules connecting them with 2pac.
Despite very high lift values and relatively high confidence threshold, it is important to interpret these results with caution due to the relatively low support levels (as in the previous example). Even the strongest rules are supported by roughly 70–115 playlists out of over 69,000 transactions. This implies that while the associations are extremely strong when they do occur, they characterize a specific subset of playlists rather than broad, population-wide listening behavior. Consequently, these rules are best understood as describing niche but highly consistent playlist archetypes, rather than general trends across the entire Spotify ecosystem.
plot(shakur_rules, method = "grouped", measure = "lift")
plot(shakur_rules, method = "graph", control = list(title = "Network of co-occurring artists in playlists"))
## Available control parameters (with default values):
## layout = stress
## circular = FALSE
## ggraphdots = NULL
## edges = <environment>
## nodes = <environment>
## nodetext = <environment>
## colors = c("#EE0000FF", "#EEEEEEFF")
## engine = ggplot2
## max = 100
## verbose = FALSE
plot(shakur_rules, method = "paracoord")
Overall, the artist-specific analysis reinforces earlier findings from the global rule mining exercise: association rules in playlist data tend to be driven by small, genre-coherent clusters of highly recognizable artists. In the case of 2Pac, his presence is not random nor weakly associated, but instead strongly embedded within a well-defined network of classic hip-hop artists, reflecting both user curation habits and the enduring cultural positioning of the artist as a champion of a specific genre.
In this study, we applied association rule mining techniques to uncover structural patterns in user-generated Spotify playlists with respect to the artists included in them. The analysis shows that playlists are highly diverse and that no single artist is popular enough to appear in the majority of playlists. However, this does not hold for groups of artists. The distribution of the 1,000 most popular artists is strongly right-skewed, indicating that while individual artists appear relatively infrequently, combinations of artists collectively account for a substantial share of playlist content.
Using the apriori() algorithm, we demonstrated that many artist appearances on playlists can be explained by their co-occurrence with other artists, particularly within the same music genre. This pattern is evident both in the general analysis covering all artists in the dataset and in the focused case study centered on the rapper 2Pac and his strong associations with the broader hip-hop genre. Moreover, most high-quality co-occurrence patterns are concentrated within two dominant genres: hip hop and party-pop. Within these genres, a small number of artists, such as David Guetta in party-pop and Kanye West in hip hop, emerge as particularly strong rule antecedents, suggesting that many playlists are organized around a limited set of central artists.
At the same time, it is important to emphasize that despite the high confidence and lift values observed for many rules, their support remains relatively low. This indicates that while the identified associations are strong and meaningful when they occur, they apply only to a small fraction of all playlists. Consequently, the discovered rules should be interpreted as describing niche but consistent user curation habits rather than universal patterns across the entire platform.
This study is not without limitations. The dataset used in the analysis constitutes a sample collected via web scraping from Twitter and may therefore not be fully representative of the broader Spotify user base. As a result, the findings may be biased toward the preferences and behaviors of Twitter users. Additionally, several restrictions were imposed during data cleaning and preprocessing to improve computational performance, which may have reduced the overall variability of the dataset and led to the loss of some information.
Future research could address these limitations by employing more representative data sources and relaxing some of the preprocessing constraints. Further insights could also be gained by narrowing the scope of the analysis - for example, by focusing on specific music genres - or by extending it to the level of individual tracks rather than artists. Such extensions may provide a more nuanced understanding of playlist formation and user curation behavior on music streaming platforms.