As with the other models, this starts with reading in the data and understanding the base constructs of the profiles and reviews
require(dplyr)
rec_data<-read.table('data/ratings.dat',
sep = ',',
col.names = c('user_id',
'profile_id',
'rating'))
gen_data <- read.table('data/gender.dat',
sep = ',',
col.names = c('user_id',
'gender'))
data <- merge(rec_data,
gen_data, by= 'user_id', all=TRUE)
# Number of Reviews Made by Each user
user_data <-data%>%
dplyr::group_by(user_id)%>%
summarize(reviews = n())
# Number of Unique Users
unique_users <- length(unique(user_data$user_id))
# Number reviews per profile
profile_data <- data%>%
dplyr::group_by(profile_id)%>%
summarize(reviews = n())%>%
arrange(desc(reviews))
Blow are a few samples of Most Reviewed Profiles with review counts as wells the the top Users for reviewing profiles along with the number of reviews per user.
Number of Unique Users: 220970
Number of Reviews Per Profile:
profile_id | reviews |
---|---|
NA | 85611 |
156148 | 33389 |
31116 | 28398 |
193687 | 23649 |
121859 | 23639 |
83773 | 23113 |
Number of Reviews per User:
user_id | reviews |
---|---|
1 | 345 |
2 | 97 |
3 | 20 |
4 | 101 |
5 | 105 |
6 | 96 |
In order to make the data more manageable for local evaluation, I reduced it to the most reviewed 100 profiles
# top 100 users
most_reviewed <- profile_data[2:101,]
final_data <-merge(most_reviewed,
data, by = 'profile_id',
all.x=TRUE)
Summary of Most Reviewed Profiles
A quick review of the most reviewed profiles in the smaller data set.
total_reviews <- length(unique(final_data$user_id)) * length(unique(final_data$profile_id ))
knitr::kable(head(most_reviewed, 12), col.names = c("Profiles", "User Reviews"))
Profiles | User Reviews |
---|---|
156148 | 33389 |
31116 | 28398 |
193687 | 23649 |
121859 | 23639 |
83773 | 23113 |
22319 | 21387 |
71636 | 21284 |
89855 | 20634 |
20737 | 18550 |
162707 | 18224 |
68989 | 16591 |
60983 | 16253 |
realRatingMatrix
user_id
, profile_id
and rating
, then took the data wide, to make the data into a matrix and converted it to a recommenderlab
realRatingMatrix
.
# Reducing to necessary fields
final_data <- final_data[,c("user_id",
"profile_id",
"rating")]
#create wide matrix
data_matrix <- tidyr::spread(final_data,
key = 'profile_id',
value = 'rating')
data_names <- data_matrix$user_id
data_matrix$user_id <- NULL
data_matrix<-as.matrix(data_matrix)
dating <- as(data_matrix, 'realRatingMatrix')
To get an understanding of the data, is is important to evaluate the distribution of rating values in our final ratings matrix.
The NA values were removed so as not to skew the data. The most common rating is 10, with a solid showing in 6 and 8 as well.
As previously, I set up a schema to evaluate models prior to building and testing a the diversity component.
Two models are compared, Item Based Collaborative Filtering and Singular Value Decomposition.
scheme <- evaluationScheme(dating, method="split",
train=0.8,
given= 1,
goodRating=7)
set.seed(1492)
evaluation_models <-list(IBCF_COS = list(name='IBCF',
param = list(
method = 'cosine',
normalize = 'z-score',
k=40)),
SVD_CENT = list(name='SVD',
param = list(
normalize = 'center')))
list_results <- evaluate(x =scheme,
method = evaluation_models,
n = seq(10, 20))
average_matrices <- lapply(list_results, avg)
An evaluation results list is created and plotted. Clearly NEITHER model is particularly good.
Both the ROC
curve and the Precision & Recall
curve clearly show that the Item Based Recommender is of little use with all the different settings producing the same results.
The SVD is not much different, however, is does show that the performance is better baseline and improves slightly as n increases.
Based on these graphs I will be using the Singular Value Decomposition model for the diversity test.
rec_svd <- Recommender(getData(scheme,
"known"),
method = "SVD",
list(k =40,
normalize ='center',
type='ratings'))
p <- predict(rec_svd, getData(scheme, "known"), type="ratings")
err <-calcPredictionAccuracy(p, getData(scheme, "unknown"))
err2 <- calcPredictionAccuracy(p,
getData(scheme, "unknown"),
given = 5,
goodRating = 7)
RMSE | MSE | MAE |
---|---|---|
3.411992 | 11.64169 | 2.572059 |
RMSE | MSE | MAE |
---|---|---|
3.411992 | 11.64169 | 2.572059 |
predictions
and ratings
.
pred_svd <- predict(rec_svd,
getData(scheme, "unknown"),
n = 60)
predictions <- getList(pred_svd)
ratings <- getRatings(pred_svd)
getList(pred_svd)[[35]][1:5]
[1] "102328" "179192" "61157" "34328" "54929"
[1] 9.012046 8.890652 8.565071 8.538497 8.535509
In order to add some diversity to the recommender system, the top three profiles are presented and two more, which are randomly selected from the remaining 57 predictions. The hope is that by choosing people a bit further out, this will improve the odds of finding potential dates who are interesting in ways not covered by the sites questionnaire but may be interesting for other reasons.
## Adding Serendipity to the List of Predictions
div_pred <-list()
div_rate <-list()
for (i in 1:length(predictions)){
p_top = predictions[[i]][1:3] #first three predictions
r_top = ratings[[i]][1:3] #first three ratings
p_diversity = sample(predictions[[i]][4:60], size=2,
replace = FALSE,
prob = NULL)
r_diversity = sample(ratings[[i]][4:60], size=2,
replace = FALSE,
prob = NULL)
p_temp = c(p_top, p_diversity)
r_temp = c(r_top, r_diversity)
div_pred[[i]] = p_temp
div_rate[[i]] = r_temp
}
Both Profiles and Ratings are generated and based on the ratings the predictions seem to be in the ballpark of the original (although slightly lower)
Test Observation 41 Profiles:
75169, 93891, 54929, 61157, 45992
Test Observation 41 Ratings:
8.372086, 8.3496249, 8.3268807, 8.1493106, 8.2385773
In order to estimate the difference in variation due to the new recommendation selection algorithm, the mean of each row of predictions and diversity predictions is calculated. Rowise differences are calculated and the mean difference in variation is calculated by summing them up and dividing to the number of rows.
prediction_variance = unlist(lapply(ratings, mean))
diversity_variance = unlist(lapply(div_rate, mean))
difference <-sum(diversity_variance-
prediction_variance, na.rm = TRUE)/
length(prediction_variance)
Prediction Variance Increase Due to Diversity 0.1426675
Based on the method used to select recommendations in the diversity algorithm, there is an average change of .14 in the ratings recommended.
While there is no sense that this small change in recommendation will have a pragmatic change in the results, clearly there is some amount of variety added due to the new selection model.
In the grander scheme of things all of this work in the background is a decent starting place, however, the real value of recommender systems is in applied performance.
Whether or not this data is useful can only be ascertained through extensive monitoring and performance testing live, online.
Were I to use a recommender system like this one live, I would build in a number of features in the site to gather data to try to estimate the utility of the diversity we believe we added. There would be three features that I would find useful. - A post viewing survey asking if they recommendation was good (or if the profile was desirable) - A timer which tracks how long users are on each profile and compares the organic recommendations to the diversity enhanced recommendations - A counter which tracks how many times a member views a profile, then compare the organic recommendations to the diversity recommendations
I think with these sorts of measurements, we could use A/B testing to figure out whether or not new algorithm is providing reasonably meaningful suggestions relative to the standard model.