Recommender System for Music Artists

Recommendation system for music artists based on user preferences according to the songs listening frequencies (called weights) used as ratings

Last.fm dataset is the official song tag and song similarity dataset of the Million Song Dataset.(+940,000 matched tracks)

Last.fm provides a dataset for music recommendations. For each user in the dataset it contains a list of their top most listened to artists including the number of times those artists were played. It also includes user applied tags which can be used to build a content vector.

This dataset contains social networking, tagging, and music artist listening information from a set of 2K users from Last.fm online music system. The dataset is released in the framework of the 2nd International Workshop on Information Heterogeneity and Fusion in Recommender Systems (HetRec 2011) http://ir.ii.uam.es/hetrec2011 http://www.last.fm

Data Statistics:

1892 users

17632 artists

92834 user-listened artist relations, i.e. tuples [user, artist, listeningCount]

11946 tags

186479 tag assignments (tas), i.e. tuples [user, tag, artist]

Data collection, exploration and preparation

##   id              name
## 1  1      MALICE MIZER
## 2  2   Diary of Dreams
## 3  3 Carpathian Forest
## 4  4      Moi dix Mois
## 5  5       Bella Morte
## 6  6         Moonspell
##   userID artistID weight
## 1      2       51  13883
## 2      2       52  11690
## 3      2       53  11351
## 4      2       54  10300
## 5      2       55   8983
## 6      2       56   6152
## [1]  1892 17633
## [1]  407 2000
##      288   71   88  291  497   66 287 700 226  299 332 343   377 678  294
## 1     NA 2654 1553   NA   NA 3312  NA  NA  NA   NA  NA  NA    NA  NA   NA
## 3     NA   NA   NA   NA   NA   NA  NA  NA  NA   NA  NA  NA    NA  NA   NA
## 4     NA   NA   NA   NA   NA   NA  NA  NA 139   NA  NA  NA    NA  NA   NA
## 6  43864   NA   NA 5379   NA   NA  NA  NA  NA 1780 510  NA    NA  NA 3251
## 9     NA   NA   NA   NA   NA   NA  NA  NA  NA   NA  NA  NA    NA  NA   NA
## 10   749   NA   NA   NA   NA   NA  NA  NA  NA   NA  NA  NA    NA  NA  264
## 11  3072   NA   NA   NA 3226   NA  NA  NA  NA   NA  NA  NA 59695  NA   NA
## 12    NA   NA   NA   NA   NA   NA  NA  NA  NA    4  NA  NA    NA  NA   NA
## 14    NA   NA   NA   NA   NA   NA  NA  NA  NA   NA  NA  NA    NA  NA   NA
## 16   129   NA   NA   NA   NA   NA  NA 116  NA   NA  NA  NA    NA  NA   NA
##     0%    25%    50%    75%   100% 
##      1    151    343    842 128654
##  90% 
## 1872
##    288 71 88 291 497 66 287 700 226 299 332 343 377 678 294
## 1   NA  5  4  NA  NA  5  NA  NA  NA  NA  NA  NA  NA  NA  NA
## 3   NA NA NA  NA  NA NA  NA  NA  NA  NA  NA  NA  NA  NA  NA
## 4   NA NA NA  NA  NA NA  NA  NA   1  NA  NA  NA  NA  NA  NA
## 6    5 NA NA   5  NA NA  NA  NA  NA   4   3  NA  NA  NA   5
## 9   NA NA NA  NA  NA NA  NA  NA  NA  NA  NA  NA  NA  NA  NA
## 10   3 NA NA  NA  NA NA  NA  NA  NA  NA  NA  NA  NA  NA   2
## 11   5 NA NA  NA   5 NA  NA  NA  NA  NA  NA  NA   5  NA  NA
## 12  NA NA NA  NA  NA NA  NA  NA  NA   1  NA  NA  NA  NA  NA
## 14  NA NA NA  NA  NA NA  NA  NA  NA  NA  NA  NA  NA  NA  NA
## 16   1 NA NA  NA  NA NA  NA   1  NA  NA  NA  NA  NA  NA  NA
## 407 x 2000 rating matrix of class 'realRatingMatrix' with 13909 ratings.
## ratings_vec
##      0      1      2      3      4      5 
## 800091   3466   3485   3480   2086   1392

recommenderlab - UBCF and IBCF Recommender Methods

## [1] "ALS_realRatingMatrix"          "ALS_implicit_realRatingMatrix"
## [3] "IBCF_realRatingMatrix"         "POPULAR_realRatingMatrix"     
## [5] "RANDOM_realRatingMatrix"       "RERECOMMEND_realRatingMatrix" 
## [7] "SVD_realRatingMatrix"          "SVDF_realRatingMatrix"        
## [9] "UBCF_realRatingMatrix"
## $ALS_realRatingMatrix
## [1] "Recommender for explicit ratings based on latent factors, calculated by alternating least squares algorithm."
## 
## $ALS_implicit_realRatingMatrix
## [1] "Recommender for implicit data based on latent factors, calculated by alternating least squares algorithm."
## 
## $IBCF_realRatingMatrix
## [1] "Recommender based on item-based collaborative filtering."
## 
## $POPULAR_realRatingMatrix
## [1] "Recommender based on item popularity."
## 
## $RANDOM_realRatingMatrix
## [1] "Produce random recommendations (real ratings)."
## 
## $RERECOMMEND_realRatingMatrix
## [1] "Re-recommends highly rated items (real ratings)."
## 
## $SVD_realRatingMatrix
## [1] "Recommender based on SVD approximation with column-mean imputation."
## 
## $SVDF_realRatingMatrix
## [1] "Recommender based on Funk SVD with gradient descend."
## 
## $UBCF_realRatingMatrix
## [1] "Recommender based on user-based collaborative filtering."
## $`1`
## [1] "289" "292" "67"  "229" "154"
## 
## $`3`
## [1] "227"  "1412" "233"  "198"  "707" 
## 
## $`17`
## [1] "498" "289" "154" "207" "292"
## 
## $`24`
## [1] "707" "917" "706" "181" "503"
## 
## $`25`
## [1] "707" "830" "706" "233" "813"
## [[1]]
## [1] Madonna            Radiohead          The Killers       
## [4] Britney Spears     Christina Aguilera
## 14035 Levels: -123 min. -OZ- -t de sangre !!! !deladap !DISTAIN ... ZZ Top
## 
## [[2]]
## [1] System of a Down The Beatles      Nine Inch Nails  Metallica       
## [5] Led Zeppelin    
## 14035 Levels: -123 min. -OZ- -t de sangre !!! !deladap !DISTAIN ... ZZ Top
## 
## [[3]]
## [1] Radiohead          Arctic Monkeys     Britney Spears    
## [4] Christina Aguilera Paramore          
## 14035 Levels: -123 min. -OZ- -t de sangre !!! !deladap !DISTAIN ... ZZ Top
## 
## [[4]]
## [1] Paradise Lost In Flames     AC/DC         Metallica     Iron Maiden  
## 14035 Levels: -123 min. -OZ- -t de sangre !!! !deladap !DISTAIN ... ZZ Top
## 
## [[5]]
## [1] Nine Inch Nails AC/DC           Metallica       As I Lay Dying 
## [5] Atreyu         
## 14035 Levels: -123 min. -OZ- -t de sangre !!! !deladap !DISTAIN ... ZZ Top
##      RMSE       MSE       MAE 
## 1.0030380 1.0060851 0.7508933
## $`1`
## [1] "66"  "287" "510" "162" "474"
## 
## $`3`
## [1] "1933" "278"  "812"  "1374" "1338"
## 
## $`17`
## [1] "66"   "332"  "202"  "1411" "232" 
## 
## $`24`
## [1] "226" "228" "917" "181" "503"
## 
## $`25`
## [1] "376"  "3944" "396"  "1922" "2427"
## [[1]]
## [1] Faithless           God Is an Astronaut Monica             
## [4] Craig David         P.O.D.             
## 14035 Levels: -123 min. -OZ- -t de sangre !!! !deladap !DISTAIN ... ZZ Top
## 
## [[2]]
## [1] 2Pac              Dark Tranquillity HammerFall        Demons & Wizards 
## [5] ムック        
## 14035 Levels: -123 min. -OZ- -t de sangre !!! !deladap !DISTAIN ... ZZ Top
## 
## [[3]]
## [1] Faithless      CAKE           Sunset Rubdown Kelly Rowland 
## [5] King Crimson  
## 14035 Levels: -123 min. -OZ- -t de sangre !!! !deladap !DISTAIN ... ZZ Top
## 
## [[4]]
## [1] Paradise Lost           Queens of the Stone Age Kings of Leon          
## [4] In Flames               Iron Maiden            
## 14035 Levels: -123 min. -OZ- -t de sangre !!! !deladap !DISTAIN ... ZZ Top
## 
## [[5]]
## [1] UVERworld     L'Arc~en~Ciel Rie fu        Teddy Geiger 
## 14035 Levels: -123 min. -OZ- -t de sangre !!! !deladap !DISTAIN ... ZZ Top
##      RMSE       MSE       MAE 
## 1.1212407 1.2571807 0.7040816

Observations & Conclusion

UBCF recommender model demonstrates to be the best model for music artist recommendations. Although IBCF performed relatively very close in terms of ratings calculation accuracy (RMSE_UBCF = 1.00 vs RMSE_IBCF = 1.12).

ROC shows a substantial AUC for the UBCF model with the point at 10 artist recommendation as the best trade-off point between TPR and FPR. Regarding the Precision/Recall curve, it aligns with the ROC for the UBCF model, showing the best balance between Precision and Recall at the 10 to 15 artists recommendation.

In both curves, the IBCF model shows very poor perfomance.