Using a small dataset from LastFM with up to 50 favorite musical artists per user, find possible pairs of bands.
Finding pair of musicians from this list can be found using the Apriori algorithm, which is often used to related itens in market basket. The algorithm uses the frequency of the itens, as well as the frequency os them related and have parameters to control the results, where the trade off happens.
library(arules)
raw <- read.csv("Artist_lists_small.txt", sep = ",", header = F,stringsAsFactors = F)
head(raw, 1)
## V1 V2 V3 V4 V5
## 1 Michael Bublé Jason Mraz 光田康典 The National Sarah McLachlan
## V6 V7 V8 V9 V10 V11
## 1 Claude Debussy Lady Gaga 周杰倫 植松伸夫 Indochine Rise Against
## V12 V13 V14 V15
## 1 City and Colour Radiohead Red Hot Chili Peppers Alexisonfire
## V16 V17 V18 V19
## 1 Bebo & Cigala The Mars Volta Chick Corea & Hiromi Theory of a Deadman
## V20 V21 V22 V23
## 1 Cantaloupe Island Adam's Apple A Brighter Day Afrodisia
## V24 V25 V26 V27 V28 V29
## 1 Mambo De La Pinta Greg Osby James Newton Testify Art Blakey Caravan
## V30 V31 V32 V33 V34 V35
## 1 Tsuyoshi Sekito Mira Temptation A Night In Tunisia Serj Tankian Winter
## V36 V37 V38 V39
## 1 Duke Pearson Cantaloop (Flip Fantasia) Cæcilie Norby El Cumbanchero
## V40 V41 V42 V43
## 1 You Don't Know What Love Is Death Letter Art Taylor Closer to Home
## V44 V45 V46
## 1 Indio Gitano Solomon Ilori and his Afro-Drum Ensemble Black Byrd
## V47 V48 V49 V50
## 1 Far West Hi-Heel Sneakers Congalegra There Is The Bomb
#results suppressed
str(raw)
summary(raw)
#transforming the data into a list
list <- as.list(as.data.frame(t(raw),stringsAsFactors = F), blank.lines.skip = T)
#get rid of duplicates
list_u <- lapply(list, unique)
txn <- as(list_u, "transactions")
supp = 0.1
confi = 50/904
rlen = 2
basket_rules <- apriori(txn, parameter = list(sup = supp, conf = confi, target="rules", minlen = rlen, maxlen = rlen))
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport support minlen maxlen
## 0.05530973 0.1 1 none FALSE TRUE 0.1 2 2
## target ext
## rules FALSE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 90
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[10673 item(s), 904 transaction(s)] done [0.01s].
## sorting and recoding items ... [33 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 done [0.00s].
## writing ... [12 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
Using the parameters assumed by the objective, and also considering the trade off in order to keep performance, as seen in the apriori algorithm results, we had the following likelihood musical tastes: ###List of pairs returned
inspect(basket_rules)
## lhs rhs support confidence lift
## 1 {The Killers} => {Muse} 0.1028761 0.6078431 2.005439
## 2 {Muse} => {The Killers} 0.1028761 0.3394161 2.005439
## 3 {Arctic Monkeys} => {Muse} 0.1205752 0.6300578 2.078731
## 4 {Muse} => {Arctic Monkeys} 0.1205752 0.3978102 2.078731
## 5 {Coldplay} => {Muse} 0.1194690 0.5510204 1.817965
## 6 {Muse} => {Coldplay} 0.1194690 0.3941606 1.817965
## 7 {Radiohead} => {The Beatles} 0.1161504 0.4468085 1.669070
## 8 {The Beatles} => {Radiohead} 0.1161504 0.4338843 1.669070
## 9 {Radiohead} => {Muse} 0.1316372 0.5063830 1.670694
## 10 {Muse} => {Radiohead} 0.1316372 0.4343066 1.670694
## 11 {The Beatles} => {Muse} 0.1128319 0.4214876 1.390601
## 12 {Muse} => {The Beatles} 0.1128319 0.3722628 1.390601
#export to file
write(basket_rules, file = "rules.txt")