In this project we will develop recommender systems for music on Amazon based on ratings and compare them in terms of accuracy and other metrics to choose our best engine.
First, we load the necessary libraries for this project.
library(recommenderlab)
## Loading required package: Matrix
## Warning: package 'Matrix' was built under R version 3.6.3
## Loading required package: arules
## Warning: package 'arules' was built under R version 3.6.3
##
## Attaching package: 'arules'
## The following objects are masked from 'package:base':
##
## abbreviate, write
## Loading required package: proxy
## Warning: package 'proxy' was built under R version 3.6.3
##
## Attaching package: 'proxy'
## The following object is masked from 'package:Matrix':
##
## as.matrix
## The following objects are masked from 'package:stats':
##
## as.dist, dist
## The following object is masked from 'package:base':
##
## as.matrix
## Loading required package: registry
## Registered S3 methods overwritten by 'registry':
## method from
## print.registry_field proxy
## print.registry_entry proxy
library(tidyr)
## Warning: package 'tidyr' was built under R version 3.6.3
##
## Attaching package: 'tidyr'
## The following objects are masked from 'package:Matrix':
##
## expand, pack, unpack
library(caTools)
## Warning: package 'caTools' was built under R version 3.6.3
library(ggplot2)
library(jsonlite)
## Warning: package 'jsonlite' was built under R version 3.6.3
library(purrr)
## Warning: package 'purrr' was built under R version 3.6.3
##
## Attaching package: 'purrr'
## The following object is masked from 'package:jsonlite':
##
## flatten
library(data.table)
## Warning: package 'data.table' was built under R version 3.6.3
##
## Attaching package: 'data.table'
## The following object is masked from 'package:purrr':
##
## transpose
library(dplyr)
## Warning: package 'dplyr' was built under R version 3.6.3
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:data.table':
##
## between, first, last
## The following objects are masked from 'package:arules':
##
## intersect, recode, setdiff, setequal, union
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
We sourced our data from Amazon’s product reviews between May 1996 - July 2014 which can be found in the following link: http://jmcauley.ucsd.edu/data/amazon/links.html
Here we load our dataset of 64,706 song reviews from a 5-core dataset, meaning that each reviewer and item have at least 5 reviews. This helps so that our matrix is not as sparse. The users are identified by unique reviewer IDs and the songs/albums are coded with Amazon Standard Identification Numbers (ASINs). We have several columns for our reviews including the ratings, how many “helpful” thumbs up they got, the review time and the unstructured text for the review.
#hc <- read.csv("ratings_Digital_Music.csv")
#colnames(hc) <- c('user', 'song', 'rating', 'time') #add column names
#head(hc)
am <- readLines("Digital_Music_5.json") %>% map(fromJSON) %>% map(as.data.table) %>% rbindlist(fill = TRUE)
am2 <- subset(am, select = -c(helpful))
am3 <- am2 %>% distinct()
head(am)
## reviewerID asin reviewerName helpful
## 1: A3EBHHCZO6V2A4 5555991584 Amaranth "music fan" 3
## 2: A3EBHHCZO6V2A4 5555991584 Amaranth "music fan" 3
## 3: AZPWAXJG9OJXV 5555991584 bethtexas 0
## 4: AZPWAXJG9OJXV 5555991584 bethtexas 0
## 5: A38IRL0X2T4DPF 5555991584 bob turnley 2
## 6: A38IRL0X2T4DPF 5555991584 bob turnley 2
## reviewText
## 1: It's hard to believe "Memory of Trees" came out 11 years ago;it has held up well over the passage of time.It's Enya's last great album before the New Age/pop of "Amarantine" and "Day without rain." Back in 1995,Enya still had her creative spark,her own voice.I agree with the reviewer who said that this is her saddest album;it is melancholy,bittersweet,from the opening title song."Memory of Trees" is elegaic&majestic.;"Pax Deorum" sounds like it is from a Requiem Mass,it is a dark threnody.Unlike the reviewer who said that this has a "disconcerting" blend of spirituality&sensuality;,I don't find it disconcerting at all."Anywhere is" is a hopeful song,looking to possibilities."Hope has a place" is about love,but it is up to the listener to decide if it is romantic,platonic,etc.I've always had a soft spot for this song."On my way home" is a triumphant ending about return.This is truly a masterpiece of New Age music,a must for any Enya fan!
## 2: It's hard to believe "Memory of Trees" came out 11 years ago;it has held up well over the passage of time.It's Enya's last great album before the New Age/pop of "Amarantine" and "Day without rain." Back in 1995,Enya still had her creative spark,her own voice.I agree with the reviewer who said that this is her saddest album;it is melancholy,bittersweet,from the opening title song."Memory of Trees" is elegaic&majestic.;"Pax Deorum" sounds like it is from a Requiem Mass,it is a dark threnody.Unlike the reviewer who said that this has a "disconcerting" blend of spirituality&sensuality;,I don't find it disconcerting at all."Anywhere is" is a hopeful song,looking to possibilities."Hope has a place" is about love,but it is up to the listener to decide if it is romantic,platonic,etc.I've always had a soft spot for this song."On my way home" is a triumphant ending about return.This is truly a masterpiece of New Age music,a must for any Enya fan!
## 3: A clasically-styled and introverted album, Memory of Trees is a masterpiece of subtlety. Many of the songs have an endearing shyness to them - soft piano and a lovely, quiet voice. But within every introvert is an inferno, and Enya lets that fire explode on a couple of songs that absolutely burst with an expected raw power.If you've never heard Enya before, you might want to start with one of her more popularized works, like Watermark, just to play it safe. But if you're already a fan, then your collection is not complete without this beautiful work of musical art.
## 4: A clasically-styled and introverted album, Memory of Trees is a masterpiece of subtlety. Many of the songs have an endearing shyness to them - soft piano and a lovely, quiet voice. But within every introvert is an inferno, and Enya lets that fire explode on a couple of songs that absolutely burst with an expected raw power.If you've never heard Enya before, you might want to start with one of her more popularized works, like Watermark, just to play it safe. But if you're already a fan, then your collection is not complete without this beautiful work of musical art.
## 5: I never thought Enya would reach the sublime heights of Evacuee or Marble Halls from 'Shepherd Moons.' 'The Celts, Watermark and Day...' were all pleasant and admirable throughout, but are less ambitious both lyrically and musically. But Hope Has a Place from 'Memory...' reaches those heights and beyond. It is Enya at her most inspirational and comforting. I'm actually glad that this song didn't get overexposed the way Only Time did. It makes it that much more special to all who own this album.
## 6: I never thought Enya would reach the sublime heights of Evacuee or Marble Halls from 'Shepherd Moons.' 'The Celts, Watermark and Day...' were all pleasant and admirable throughout, but are less ambitious both lyrically and musically. But Hope Has a Place from 'Memory...' reaches those heights and beyond. It is Enya at her most inspirational and comforting. I'm actually glad that this song didn't get overexposed the way Only Time did. It makes it that much more special to all who own this album.
## overall summary unixReviewTime reviewTime
## 1: 5 Enya's last great album 1158019200 09 12, 2006
## 2: 5 Enya's last great album 1158019200 09 12, 2006
## 3: 5 Enya at her most elegant 991526400 06 3, 2001
## 4: 5 Enya at her most elegant 991526400 06 3, 2001
## 5: 5 The best so far 1058140800 07 14, 2003
## 6: 5 The best so far 1058140800 07 14, 2003
We focused on the reviewer, song and rating columns and converted our table from long to wide so that there would be a row for each reviewer and a column for each song/album, producing a user-item matrix. Our resulting data set has 5,541 users and 3,569 songs/albums.
am4 <- subset(am3, select = c(reviewerID, asin, overall))
colnames(am4) <- c('user', 'song', 'rating')
head(am4)
## user song rating
## 1: A3EBHHCZO6V2A4 5555991584 5
## 2: AZPWAXJG9OJXV 5555991584 5
## 3: A38IRL0X2T4DPF 5555991584 5
## 4: A22IK3I6U76GX0 5555991584 5
## 5: A1AISPOIIHTHXX 5555991584 4
## 6: A2P49WD75WHAG5 5555991584 5
am5 <- spread(am4, song, rating) #convert table from long to wide
dim(am5)
## [1] 5541 3569
In order to use the recommenderlab library, we first have to convert our dataframe into a real rating matrix.
hc_matrix <- as.matrix(am5)
hc_RRM <- as(hc_matrix, "realRatingMatrix")
## Warning in storage.mode(from) <- "double": NAs introduced by coercion
dim(hc_RRM)
## [1] 5541 3569
We then jump into our music ratings data through some exploratory work. Below is the distribution of our user ratings. The most frequent rating is a 5 with nearly 31,000 of them in our matrix. This is followed by a 4 rating and then 3, 2, and 1 ratings in descending order.
vector_ratings <- as.vector(hc_RRM@data)
vector_ratings <- vector_ratings[vector_ratings != 0]
vector_ratings <- factor(vector_ratings)
qplot(vector_ratings) + ggtitle("Distribution of ratings for songs/albums on Amazon")
We also plotted the distribution of the average rating that the 3,569 songs/albums received. Most music received high ratings.
average_ratings <- colMeans(hc_RRM)
qplot(average_ratings) + stat_bin(binwidth = 0.1) + ggtitle("Distribution of the average music rating")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Next, we start to build our recommender system by splitting our data into training and test sets using cross-validation with 4 folds. Out of 1-5 ratings, we decided that the threshold between good and bad ratings was a rating of 3. We also made our “given” parameter 5 as the the minimum number of ratings that a user has in our dataset is 6 and we want to make sure that our users have items to test the model.
percentage_training <- 0.8
min(rowCounts(hc_RRM))
## [1] 6
items_to_keep <- 5
rating_threshold <- 3
eval_sets <- evaluationScheme(data=hc_RRM, method="cross-validation", k=4, given=items_to_keep, goodRating=rating_threshold)
Singular Value Decomposition is a dimensionality reduction matrix factorization technique that decomposes the matrix into the product of its vectors and categorizes users/items. In order to implement SVD we need to make sure that there are no missing values, so we normalize our ratings to remove bias due to users who tend to give high or low ratings. Normalization makes it such that the average ratings of each user is 0. We train this model using normalization as a pre-processing technique.
svd_rec <- Recommender(data=getData(eval_sets, "train"), method="svd")
We then use this trained model to predict music ratings for our users and evaluate these predictions. The RMSE for this recommender system was 1.09.
svd_pred <- predict(object=svd_rec, newdata=getData(eval_sets, "known"), n=5, type="ratings")
svd_accuracy <- calcPredictionAccuracy(x=svd_pred, data=getData(eval_sets, "unknown"), byUser=FALSE)
svd_accuracy
## RMSE MSE MAE
## 1.1155659 1.2444872 0.8496135
We compare this SVD recommender engine with one that uses item based collaborative filtering.
IBCF recommends users items that received similar ratings to the items that the users rated. In this model, we normalized our data, looked at the 5 most similar items to each item, and compared items using cosine similarity. We normalize our ratings to remove bias due to users who tend to give high or low ratings. Normalization makes it such that the average ratings of each user is 0. We also use cosine similarity in order to identify similar items based on the cosine distance between every item-item vector pair.
ib_rec <- Recommender(data=getData(eval_sets, "train"), method = "IBCF", parameter = list(k = 5, normalize = "center", method = "cosine"))
This model’s predictions using the test set had a RMSE of 1.21, which doesn’t perform as well as the SVD engine.
ib_pred <- predict(object=ib_rec, newdata=getData(eval_sets, "known"), n=5, type="ratings")
ib_accuracy <- calcPredictionAccuracy(x=ib_pred, data=getData(eval_sets, "unknown"), byUser=FALSE)
ib_accuracy
## RMSE MSE MAE
## 1.1719270 1.3734129 0.7502689
We also train a UBCF model which finds similarities between users based on their ratings. We set the nn parameter to 5 in order to identify the top 5 users that each user is most similar to. We normalized our data and used cosine similarity.
ub_rec <- Recommender(data=getData(eval_sets, "train"), method = "UBCF", parameter = list(nn = 5, normalize = "center", method = "cosine"))
This model’s predictions using the test set had a RMSE of 1.28, which doesn’t perform as well as either the IBCF engine or the SVD engine.
ub_pred <- predict(object=ub_rec, newdata=getData(eval_sets, "known"), n=5, type="ratings")
ub_accuracy <- calcPredictionAccuracy(x=ub_pred, data=getData(eval_sets, "unknown"), byUser=FALSE)
ub_accuracy
## RMSE MSE MAE
## 1.2170437 1.4811953 0.9330049
We can also adapt some of the code from “Building a Recommendation System with R” to plot the ROC curve for our different recommender engines. Other than the models that we previously discussed, we also consider a “random” model to implement some novelty into the music recommendations that are produced rather than only recommend to users the same type of music as that of songs they rated highly. I appreciate this when streaming through Spotify’s recommender system as the recommendations often begin to “stagnate”. This fourth model randomly choses items for comparison.
models_to_evaluate <- list(
IBCF_cos = list(name = "IBCF", param = list(method = "cosine", k = 5, normalize = "center")),
UBCF_cos = list(name = "UBCF", param = list(method = "cosine", nn = 5, normalize = "center")),
SVD = list(name = "svd"),
random = list(name = "RANDOM", param=NULL)
)
n_recommendations <- c(1, 5, seq(1, 10, 1))
list_results <- evaluate(x = eval_sets, method = models_to_evaluate, n = n_recommendations)
## IBCF run fold/sample [model time/prediction time]
## 1 [250.61sec/0.61sec]
## 2 [241.59sec/0.44sec]
## 3 [250.94sec/0.4sec]
## 4 [251.76sec/0.39sec]
## UBCF run fold/sample [model time/prediction time]
## 1 [0sec/101.01sec]
## 2 [0sec/100.8sec]
## 3 [0sec/93.08sec]
## 4 [0sec/101.2sec]
## svd run fold/sample [model time/prediction time]
## 1 [4.06sec/3.99sec]
## 2 [5.07sec/4.12sec]
## 3 [4.61sec/4.05sec]
## 4 [4.59sec/3.83sec]
## RANDOM run fold/sample [model time/prediction time]
## 1 [0sec/4sec]
## 2 [0sec/3.67sec]
## 3 [0sec/3.75sec]
## 4 [0sec/4.14sec]
class(list_results)
## [1] "evaluationResultList"
## attr(,"package")
## [1] "recommenderlab"
avg_matrices <- lapply(list_results, avg)
We can see the precision, recall, TPR, and FPR for each model for different values of n (the number of recommendations that are produced by the models).
avg_matrices$IBCF_cos[3:12, 5:8]
## precision recall TPR FPR
## 1 0.008266211 0.0009490941 0.0009490941 0.0002553449
## 2 0.009047739 0.0024303574 0.0024303574 0.0005078080
## 3 0.008690590 0.0034338090 0.0034338090 0.0007569200
## 4 0.008792142 0.0044699193 0.0044699193 0.0010023789
## 5 0.008794554 0.0054690352 0.0054690352 0.0012433861
## 6 0.008806394 0.0062565851 0.0062565851 0.0014791765
## 7 0.008752353 0.0074827813 0.0074827813 0.0017091287
## 8 0.008755511 0.0079778186 0.0079778186 0.0019338060
## 9 0.008872069 0.0088148444 0.0088148444 0.0021518488
## 10 0.009053166 0.0098129827 0.0098129827 0.0023642113
avg_matrices$UBCF_cos[3:12, 5:8]
## precision recall TPR FPR
## 1 0.01892484 0.003246766 0.003246766 0.0002706158
## 2 0.02039667 0.007028523 0.007028523 0.0005404180
## 3 0.02063706 0.010925384 0.010925384 0.0008104337
## 4 0.01997971 0.014434854 0.014434854 0.0010813086
## 5 0.01985324 0.018398725 0.018398725 0.0013507062
## 6 0.01967714 0.021549761 0.021549761 0.0016194948
## 7 0.01942841 0.024812607 0.024812607 0.0018881399
## 8 0.01927062 0.027880049 0.027880049 0.0021562731
## 9 0.01907401 0.031432250 0.031432250 0.0024237527
## 10 0.01893392 0.034353817 0.034353817 0.0026906787
avg_matrices$SVD[3:12, 5:8]
## precision recall TPR FPR
## 1 0.10209235 0.03052170 0.03052170 0.0002524150
## 2 0.06060606 0.03559591 0.03559591 0.0005281623
## 3 0.04527417 0.03836310 0.03836310 0.0008051648
## 4 0.03702201 0.04024744 0.04024744 0.0010828414
## 5 0.03217893 0.04213476 0.04213476 0.0013603597
## 6 0.02861953 0.04425007 0.04425007 0.0016384450
## 7 0.02602556 0.04636817 0.04636817 0.0019166362
## 8 0.02396735 0.04872800 0.04872800 0.0021950789
## 9 0.02244669 0.05140214 0.05140214 0.0024733230
## 10 0.02110390 0.05268280 0.05268280 0.0027519146
avg_matrices$random[3:12, 5:8]
## precision recall TPR FPR
## 1 0.001803752 0.0003462086 0.0003462086 0.0002806341
## 2 0.002164502 0.0005792215 0.0005792215 0.0005610659
## 3 0.001984127 0.0008650036 0.0008650036 0.0008417513
## 4 0.001984127 0.0010283377 0.0010283377 0.0011223258
## 5 0.001948052 0.0012285533 0.0012285533 0.0014029599
## 6 0.001924002 0.0014203363 0.0014203363 0.0016835922
## 7 0.002061431 0.0016054390 0.0016054390 0.0019639170
## 8 0.002119408 0.0018288292 0.0018288292 0.0022443488
## 9 0.002044252 0.0020360952 0.0020360952 0.0025250830
## 10 0.002056277 0.0022536484 0.0022536484 0.0028056153
Below is the ROC curves for each model. Again, the SVD model performs the best as it has the largest AUC (area under the curve). In comparison, our new random model performs the worst.
plot(list_results, annotate = 1, legend = "topleft") + title("ROC curve")
## integer(0)
After comparing the RMSE and AUC of our recommender engines, we found that the engine that uses SVD performs better than engines that implement IBCF, UBCF, or random music recommendations. We may also want to consider diversifying recommendations by, for example, changing a random set of 15% of our 1-star rating predictions to 3-star ratings. In future projects, we could also use unstructured free text review data to build a hybrid model that accounts for content.
In online evaluation we would have an A/B test with some users interacting with one recommender system and a different set of users interacting with another recommender system. In this scenario, we would benefit from continously refining our recommender engine using click-through rates as a performance metric.
(Note that some of the metric results may differ when these results are published to rpubs)