Objective:

Using a small dataset from LastFM with up to 50 favorite musical artists per user, find possible pairs of bands.

Solution:

Finding pair of musicians from this list can be found using the Apriori algorithm, which is often used to related itens in market basket. The algorithm uses the frequency of the itens, as well as the frequency os them related and have parameters to control the results, where the trade off happens.

Loading required libraries

library(arules)

Loading the dataset

raw <- read.csv("Artist_lists_small.txt", sep = ",", header = F,stringsAsFactors = F)

Exploratory analysis

head(raw, 1)
##              V1         V2       V3           V4              V5
## 1 Michael Bublé Jason Mraz 光田康典 The National Sarah McLachlan
##               V6        V7     V8       V9       V10          V11
## 1 Claude Debussy Lady Gaga 周杰倫 植松伸夫 Indochine Rise Against
##               V12       V13                   V14          V15
## 1 City and Colour Radiohead Red Hot Chili Peppers Alexisonfire
##             V16            V17                  V18                 V19
## 1 Bebo & Cigala The Mars Volta Chick Corea & Hiromi Theory of a Deadman
##                 V20          V21            V22       V23
## 1 Cantaloupe Island Adam's Apple A Brighter Day Afrodisia
##                 V24       V25          V26     V27        V28     V29
## 1 Mambo De La Pinta Greg Osby James Newton Testify Art Blakey Caravan
##               V30  V31        V32                V33          V34    V35
## 1 Tsuyoshi Sekito Mira Temptation A Night In Tunisia Serj Tankian Winter
##            V36                       V37           V38            V39
## 1 Duke Pearson Cantaloop (Flip Fantasia) Cæcilie Norby El Cumbanchero
##                           V40          V41        V42            V43
## 1 You Don't Know What Love Is Death Letter Art Taylor Closer to Home
##            V44                                      V45        V46
## 1 Indio Gitano Solomon Ilori and his Afro-Drum Ensemble Black Byrd
##        V47              V48        V49               V50
## 1 Far West Hi-Heel Sneakers Congalegra There Is The Bomb
#results suppressed
str(raw)
summary(raw)

Preparing the dataset

#transforming the data into a list
list <- as.list(as.data.frame(t(raw),stringsAsFactors = F), blank.lines.skip = T)
#get rid of duplicates
list_u <- lapply(list, unique)

Transform into transactions

txn <- as(list_u, "transactions")

Parameters defined:

  • support : not defined in the objective, so we use musicians that appeared in at least 10% of the observations;
supp = 0.1
  • confidence: the rule must appear in at least 50 transactions. As we have 904 observations, our confidence will be set to 50/904.
confi = 50/904
  • rules : we want pairs of musicians, as the RHS is always 1, our lHS will be 1 as well, so our rule lenght will be set to 2.
rlen = 2

Running the apriori algorithm

basket_rules <- apriori(txn, parameter = list(sup = supp, conf = confi, target="rules", minlen = rlen, maxlen = rlen))
## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport support minlen maxlen
##  0.05530973    0.1    1 none FALSE            TRUE     0.1      2      2
##  target   ext
##   rules FALSE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 90 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[10673 item(s), 904 transaction(s)] done [0.01s].
## sorting and recoding items ... [33 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 done [0.00s].
## writing ... [12 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].

Conclusion:

Using the parameters assumed by the objective, and also considering the trade off in order to keep performance, as seen in the apriori algorithm results, we had the following likelihood musical tastes: ###List of pairs returned

inspect(basket_rules)
##    lhs                 rhs              support   confidence lift    
## 1  {The Killers}    => {Muse}           0.1028761 0.6078431  2.005439
## 2  {Muse}           => {The Killers}    0.1028761 0.3394161  2.005439
## 3  {Arctic Monkeys} => {Muse}           0.1205752 0.6300578  2.078731
## 4  {Muse}           => {Arctic Monkeys} 0.1205752 0.3978102  2.078731
## 5  {Coldplay}       => {Muse}           0.1194690 0.5510204  1.817965
## 6  {Muse}           => {Coldplay}       0.1194690 0.3941606  1.817965
## 7  {Radiohead}      => {The Beatles}    0.1161504 0.4468085  1.669070
## 8  {The Beatles}    => {Radiohead}      0.1161504 0.4338843  1.669070
## 9  {Radiohead}      => {Muse}           0.1316372 0.5063830  1.670694
## 10 {Muse}           => {Radiohead}      0.1316372 0.4343066  1.670694
## 11 {The Beatles}    => {Muse}           0.1128319 0.4214876  1.390601
## 12 {Muse}           => {The Beatles}    0.1128319 0.3722628  1.390601
#export to file
write(basket_rules, file = "rules.txt")