For this simple exploration of recommender systems using recommenderlab
in R
, I focused on getting the techniques down properly and optimizing the parameters within the algorithms using `params = list(normalize=“some-method”, method=“some-method”), so that it was neither advantageous nor necessary to bring in more than a single ratings predictor.
With that in mind I chose to use rating
which is he overall rating for each restaurant in the data set. But I hope to revisit this model in the next assignment using more variables to see how it improves predicitions for such a small set.
To see if there was enhanced performance using different dissimilarity metrics and normalization equations, I did some testing across both sets as follows.
Dissimilarity Methods
Normalization Methods
Three separate error metrics were calculated for each pass with a dissimilary/normalization pair to see if they varied at all in magnitude or direction or change as the different models were explored.
Errors Metrics:
Observations: 1161
Features : 5
Type | Levels | Possible Values | |
---|---|---|---|
UserId | character | 138 | |
placeID | character | 130 | |
rating | numeric | 3 | 0,1,2 |
food_rating | numeric | 3 | 0,1,2 |
service_rating | numeric | 3 | 0,1,2 |
# Reading Data
food<- read.csv('data/rating_final.csv')
### Subsetting for Items, Users and Overall Rating
food_for_rec <- food[,c('userID', 'placeID', 'rating')]
### Renaming for simplicity (and agreement with `recommenderlab conventions`)å
colnames(food_for_rec) <- c( 'user','item', 'rating')
## Going Wide
food_for_rec <- food_for_rec%>%
spread(key = item, value = rating)
# Extracting names for indexing the matrix
mat_names <- food_for_rec$user
restaurant_matrix <- as.matrix(food_for_rec)
# Adding index to matrix (user ID's)
rownames(restaurant_matrix) <- mat_names
## Making `recommenderlab` matrix
restaurants <- as(restaurant_matrix, 'realRatingMatrix')
This is a very sparse matrix with users in different geographic areas thus very few of our reviewers have reviewed more than 5-8 restauants.
According, this matrix is sparse, 5% of the fields host a review of 1 or 2. There may be challenges developing a useful predictor with so few reviewers and reviews for 130 locations. It will be important to test the minimum review size to see if it significanly affects the results.
For the follow comaprisons of item-based and user-based recommenders using recommenderlab
I tested the effectiveness using 5, 6, & 7 as minimum reviews per reviewer to see if the density of reviews or number of reviewers were more influential on the results.
Rating | Percent of Matrix |
---|---|
0 - No Review | 94.9 |
1 | 23.6 |
2 | 027.3 |
# Distribution Plot Zeros Removed
plot_ratings <- spread(food[,c("placeID","userID", 'rating')],
key =placeID, value = as.numeric(rating), fill = 0)
plot_ratings <- as.numeric(as.matrix(plot_ratings[,2:ncol(plot_ratings)]))
props <- prop.table(table(plot_ratings))
zero_ratings <- length(as.vector(plot_ratings[plot_ratings==0]))
hist(as.vector(plot_ratings[plot_ratings>0]),
breaks = c(.75,1.25,1.75,2.25),
main = "Distribution of Ratings",
col = 'maroon', xlab = "Ratings",
xlim = c(0, 3))
# Heatmap of first 50
image(normalize(restaurants)[1:50, 1:50],
main = 'First 50 Reviewers & First 50 Restaurants',
xlab = "Restaurants",
ylab = "Reviewers")
# Funtion which takes
test_ibcf_ubcf_params <-function(matrix, train_ratio, min_reviews, normal ='center', method ='pearson'){
matrix = matrix[rowCounts(matrix)>min_reviews,]
eval <- evaluationScheme(matrix,
method = 'split',
train = train_ratio,
given = min_reviews,
goodRating = 2)
params = list(normalize= normal, method=method)
item_eval <- Recommender(getData(eval, 'train'), 'IBCF', param = params)
user_eval <- Recommender(getData(eval, 'train'), 'UBCF', param = params)
item_pred <- predict(item_eval, getData(eval, 'known'), type = 'ratings')
user_pred <- predict(user_eval, getData(eval, 'known'), type = 'ratings')
error <- rbind(
UBCF = calcPredictionAccuracy(user_pred, getData(eval, "unknown")),
IBCF = calcPredictionAccuracy(item_pred, getData(eval, "unknown"))
)
print(error)}
# Calling Function with
# - Center Normalizatio - Pearson Distance Metric
# - 80% Train 20% test - 8 'given' minimum observations per user
RMSE | MSE | MAE | |
---|---|---|---|
UBCF | 0.4087671 | 0.1670905 | 0.3682016 |
IBCF | 0.3482584 | 0.1212839 | 0.2181978 |
RMSE | MSE | MAE | |
---|---|---|---|
UBCF | 0.3157068 | 0.0996708 | 0.2991736 |
IBCF | 0.5724105 | 0.3276537 | 0.3973006 |
RMSE | MSE | MAE | |
---|---|---|---|
UBCF | 0.6084335 | 0.3701913 | 0.5490075 |
IBCF | 0.4347148 | 0.1889769 | 0.3239961 |
RMSE | MSE | MAE | |
---|---|---|---|
UBCF | 0.5740619 | 0.3295471 | 0.5487022 |
IBCF | 0.5364686 | 0.2877985 | 0.4130695 |
RMSE | MSE | MAE | |
---|---|---|---|
UBCF | 0.6628161 | 0.4393252 | 0.5095242 |
IBCF | 0.4923255 | 0.2423844 | 0.3142065 |
RMSE | MSE | MAE | |
---|---|---|---|
UBCF | 0.2392441 | 0.0572378 | 0.2392441 |
IBCF | 0.6306225 | 0.3976847 | 0.4690007 |
RMSE | MSE | MAE | |
---|---|---|---|
UBCF | 0.4000711 | 0.1600569 | 0.3280428 |
IBCF | 0.5579159 | 0.3112702 | 0.3815400 |
Using a full 80% Training and 20% testing on the matrix subset with 5
, 6
, or 7
mimimum reviews per reviewer (set as given
in params) and comparing them across scaling methods center
and z-score` there are a wide range of possible recommenders.
The best of the Center Scaled
When comparing user-based to item-based recommenders across all three minimum reviewer sizes, all three metrics were associated with the same (either UBCF or IBCF) for that number of reviewers, but which method had the lowest error varied with the number of reviewers.
The absolute lowest set of scores for center
scaling with Pearson's
similarity was the model with 6
reviewers using Item-Based
Pearson, 6-Reviewer, Center Scaled
RMSE | MSE | MAE | |
---|---|---|---|
IBCF | 0.4984133 | 0.2484158 | 0.3492304 |
The best of the z-score Scaled The models were computed in exactly the same configurations for pearson changing only the scaling to z-score. In this case, for 5
& 6
reviewers all three errors for each comarison were aligned with the same model (either user or item). For 7
reviewers , however, the user-based had lower RMSE and MSE but a higher MAE and the disparity was pretty significant.
The lowest overall Scores were with just 5
reviewers, this model outperformed the center-scaled best model by approximately .1 across the board.
Pearson, 5-Reviewer, z-score Scaled
RMSE | MSE | MAE | |
---|---|---|---|
UBCF | 0.3458464 | 00.1196097 | 0.2295455 |
RMSE | MSE | MAE | |
---|---|---|---|
UBCF | 0.5233471 | 0.2738921 | 0.4745330 |
IBCF | 0.5705011 | 0.3254715 | 0.3957425 |
RMSE | MSE | MAE | |
---|---|---|---|
UBCF | 0.5568457 | 0.3100771 | 0.4556102 |
IBCF | 0.5888832 | 0.3467835 | 0.4363687 |
RMSE | MSE | MAE | |
---|---|---|---|
UBCF | 0.4996145 | 0.2496146 | 0.4056109 |
IBCF | 0.6150877 | 0.3783329 | 0.4969680 |
RMSE | MSE | MAE | |
---|---|---|---|
UBCF | 0.4941246 | 0.2441591 | 0.4250010 |
IBCF | 0.5654305 | 0.3197117 | 0.4300994 |
RMSE | MSE | MAE | |
---|---|---|---|
UBCF | 0.5399805 | 0.291579 | 0.4711546 |
IBCF | 0.5530642 | 0.305880 | 0.4055877 |
RMSE | MSE | MAE | |
---|---|---|---|
UBCF | 0.4591887 | 0.2108542 | 0.3990037 |
IBCF | 0.5018022 | 0.2518054 | 0.3742733 |
FOr the cosine method, both center and z-score scaled best performers were lower in all three error metrics than all other models with using that scale.
Center Scaled Data
In this method, once again all the best error rates for a given combination of reviewers, and scale were associated with the same type of model, and in each case the user-based out performed the item-based recommenders.
Cosine, 6-Reviewer, Center Scaled User-Based
RMSE | MSE | MAE | |
---|---|---|---|
UBCF | 0.4202448 | 0.1766057 | 0.3862834 |
z-score Scaled Data
Using z-scores, again the lowest errors were lowest for all three types of erro in either one or the other item-base or user-based for each combination or were very close to exactly the same. Unlike with the center
scaling the z-score
scaled models were not consistenly the same model type. With smaller reviewer sets (5-reviewer minimum) the user-based method performed better, but as the minimum reviewers grew, the item-based methods error’s performed considerably better.
Cosine, 7-Reviewer, Z-Score Scaled Item-Based
RMSE | MSE | MAE | |
---|---|---|---|
IBCF | 0.4250244 | 0.1806457 | 0.2705459 |
Again, the z-score outperformed the center score at all but the 6-reviewer configuration and the error was considerably lower at 7 reviewers for z-scoring.
RMSE | MSE | MAE | |
---|---|---|---|
UBCF | 0.5108811 | 0.2609995 | 0.3966992 |
IBCF | 0.5920140 | 0.3504806 | 0.4473545 |
RMSE | MSE | MAE | |
---|---|---|---|
UBCF | 0.5080773 | 0.2581426 | 0.4334914 |
IBCF | 0.5017087 | 0.2517116 | 0.3477941 |
RMSE | MSE | MAE | |
---|---|---|---|
UBCF | 0.4687841 | 0.2197585 | 0.3786334 |
IBCF | 0.4592163 | 0.2108796 | 0.3361111 |
RMSE | MSE | MAE | |
---|---|---|---|
UBCF | 0.5263456 | 0.2770397 | 0.4381720 |
IBCF | 0.5801607 | 0.3365864 | 0.4651974 |
RMSE | MSE | MAE | |
---|---|---|---|
UBCF | 0.4507926 | 0.2032140 | 0.3441676 |
IBCF | 0.5926991 | 0.3512923 | 0.4717391 |
RMSE | MSE | MAE | |
---|---|---|---|
UBCF | 0.5026210 | 0.2526279 | 0.4252860 |
IBCF | 0.5579156 | 0.3112698 | 0.4314286 |
The Jaccard Methods are Universally the worst with center scaling error lowest in user-based
with 6
reviewers and lowest for item-based
lowest with 7
reviewers.
Z-score scaling was uniformly best with 6
reviews, the user-based
slightly out-performing iem-based
in RMSE
and MSE
but worse in MAE
but not far off.
Jaccard, 6 - Reviewer, Center-Scaled User-Based
RMSE | MSE | MAE | |
---|---|---|---|
UBCF | 0.434365 | 0.1886729 | 0.3585991 |
Jaccard, 7 - Reviewer, Z-Score Scaled
RMSE | MSE | MAE | |
---|---|---|---|
UBCF | 0.4750463 | 0.2256690 | 0.4168385 |
This comparison was interesting. Comparing Pearson, Cosine and Jaccard metrics, it is clear that you could build a decent recommender with sufficient data using any of the three. The errors ranged from .1 - .65 points on a 2-point scale.
I would not say that .65 is trivial error predicting between 1 and 2, but the .2 range seems reasonable given 138 reviewers and 130 restaurants with 907 ratings and 16897 empty slots. However, this seems like an honorable start which could be improved upon with more reviews.
Making a final selection, selecting for a low mean absolute error seems to make the most sense as it should be the average error without regard for over or under-shooting. This is more interpretable than trying to back-caclualte from the Mean Squared Error or Root Mean Squared Error.
It also makes sense, that if several of the models were very close, to select for the one which requires the least reviews to train, as this means more obeservations are preserved and used in the training and testing, which should make the model more generalizable.
In this case, Pearson, User-Based, 5- Review Z-Score Scaled model, met all of the above criteria. It had the lowest error for all three measure (RMSE, MSE & MAE) and it required only 5 reviews to accomplish this task.
RMSE | MSE | MAE | |
---|---|---|---|
UBCF | 0.345 | 0.119 | 0.229 |
The next closes was Cosine, Z-Score Item-Based, but it required more reviews, 7, in fact to accomplish similar results. This kind of makes sense, as it should require more reviews of a restaurant, and many restaurants to estimate how alike they are in such a sparse matrix. I would guess that as the data grows, this model might surpass the user-based model.
RMSE | MSE | MAE | |
---|---|---|---|
IBCF | 0.425 | 0.180 | 0.270 |
Pearson, 6-Reviewer, Center Scaled
RMSE | MSE | MAE | |
---|---|---|---|
IBCF | 0.498 | 0.248 | 0.349 |
Pearson, 5-Reviewer, z-score Scaled
RMSE | MSE | MAE | |
---|---|---|---|
UBCF | 0.345 | 0.119 | 0.229 |
Cosine, 6-Reviewer, Center Scaled User-Based
RMSE | MSE | MAE | |
---|---|---|---|
UBCF | 0.420 | 0.1767 | 0.386 |
Cosine, 7-Reviewer, Z-Score Scaled Item-Based
RMSE | MSE | MAE | |
---|---|---|---|
IBCF | 0.425 | 0.180 | 0.270 |
Jaccard, 6 - Reviewer, Center-Scaled User-Based
RMSE | MSE | MAE | |
---|---|---|---|
UBCF | 0.434 | 0.188 | 0.358 |
Jaccard, 7 - Reviewer, Z-Score Scaled
RMSE | MSE | MAE | |
---|---|---|---|
UBCF | 0.4750 | 0.225 | 0.416 |
Having created a function that took all the parameters early on, it was just a matter of passing parameters into the function. I could have automated this, but I wanted the ability to evaluate them one at a time and check my results repeatedly without running a whole list and having to parse out just the one set of errors. So, this is just a long set of function calls. Everything else used in this was incorporated into that function up front.
test_ibcf_ubcf_params(restaurants, .80, 6, 'center', 'pearson')
test_ibcf_ubcf_params(restaurants, .80, 6, 'z-score', 'pearson')
test_ibcf_ubcf_params(restaurants, .80, 6, 'center', 'cosine')
test_ibcf_ubcf_params(restaurants, .80, 6, 'z-score', 'cosine')
test_ibcf_ubcf_params(restaurants, .80, 6, 'center', 'jaccard')
test_ibcf_ubcf_params(restaurants, .80, 6, 'z-score', 'jaccard')
test_ibcf_ubcf_params(restaurants, .80, 5, 'center', 'pearson')
test_ibcf_ubcf_params(restaurants, .80, 5, 'z-score', 'pearson')
test_ibcf_ubcf_params(restaurants, .80, 5, 'center', 'cosine')
test_ibcf_ubcf_params(restaurants, .80, 5, 'z-score', 'cosine')
test_ibcf_ubcf_params(restaurants, .80, 5, 'center', 'jaccard')
test_ibcf_ubcf_params(restaurants, .80, 5, 'z-score', 'jaccard')
test_ibcf_ubcf_params(restaurants, .80, 7, 'center', 'pearson')
test_ibcf_ubcf_params(restaurants, .80, 7, 'z-score', 'pearson')
test_ibcf_ubcf_params(restaurants, .80, 7, 'center', 'cosine')
test_ibcf_ubcf_params(restaurants, .80, 7, 'z-score', 'cosine')
test_ibcf_ubcf_params(restaurants, .80, 7, 'center', 'jaccard')
test_ibcf_ubcf_params(restaurants, .80, 7, 'z-score', 'jaccard')
The Full Code is avaliable with this .Rmd file in the Github repository should you choose to repeate any of this . I did change my seed at some point and not remember the original one, so if I did not guess correctly you will likely get similar but not exactly the same results in your run as I worked from the rMarkdown cache for most of this site development.
This dataset was originally downloaded from the UCI ML Repository: Restaurant & Consumer Data Sets
Creators: Rafael Ponce Medellín and Juan Gabriel González Serna rafaponce@cenidet.edu.mx, gabriel@cenidet.edu.mx Department of Computer Science. National Center for Research and Technological Development CENIDET, México
Blanca Vargas-Govea, Juan Gabriel González-Serna, Rafael Ponce-Medellan. Effects of relevant contextual features in the performance of a restaurant recommender system. In RecSys101: Workshop on Context Aware Recommender Systems (CARS-2011), Chicago, IL, USA, October 23, 2011