Project Overview

Overall Approach

For this simple exploration of recommender systems using recommenderlab in R, I focused on getting the techniques down properly and optimizing the parameters within the algorithms using `params = list(normalize=“some-method”, method=“some-method”), so that it was neither advantageous nor necessary to bring in more than a single ratings predictor.

Single Rating

With that in mind I chose to use rating which is he overall rating for each restaurant in the data set. But I hope to revisit this model in the next assignment using more variables to see how it improves predicitions for such a small set.

Algorithms

To see if there was enhanced performance using different dissimilarity metrics and normalization equations, I did some testing across both sets as follows.

Dissimilarity Methods

  • Cosine
  • Pearson
  • Jaccard

Normalization Methods

  • z-score
  • mean centering

Evaluation - Error Estimation

Three separate error metrics were calculated for each pass with a dissimilary/normalization pair to see if they varied at all in magnitude or direction or change as the different models were explored.

Errors Metrics:

  • RMSE (Root Mean Squared Error)
  • MSE (Mean Squared Error)
  • MAE (Mean Absolute Error)

The Data

Column

Sample of Center Normalized Matrix

Column

Data Dictionary - Main Ratings Table

Observations: 1161
Features : 5

Type Levels Possible Values
UserId character 138
placeID character 130
rating numeric 3 0,1,2
food_rating numeric 3 0,1,2
service_rating numeric 3 0,1,2


Preprocessing Code

Column

Understanding Ratings Matrix

This is a very sparse matrix with users in different geographic areas thus very few of our reviewers have reviewed more than 5-8 restauants.

According, this matrix is sparse, 5% of the fields host a review of 1 or 2. There may be challenges developing a useful predictor with so few reviewers and reviews for 130 locations. It will be important to test the minimum review size to see if it significanly affects the results.

For the follow comaprisons of item-based and user-based recommenders using recommenderlab I tested the effectiveness using 5, 6, & 7 as minimum reviews per reviewer to see if the density of reviews or number of reviewers were more influential on the results.

Proportions of Restaurant Reviews

Rating Percent of Matrix
0 - No Review 94.9
1 23.6
2 027.3

Recommender Function

Sample Output from Function:

RMSE MSE MAE
UBCF 0.4087671 0.1670905 0.3682016
IBCF 0.3482584 0.1212839 0.2181978

Pearson

Column {data-width =450}

Pearsons Method Center Scaling 80% Training 20% Test

Five Reviewers
RMSE MSE MAE
UBCF 0.3157068 0.0996708 0.2991736
IBCF 0.5724105 0.3276537 0.3973006
Six Reviewers
RMSE MSE MAE
UBCF 0.6084335 0.3701913 0.5490075
IBCF 0.4347148 0.1889769 0.3239961
Seven Reviewers
RMSE MSE MAE
UBCF 0.5740619 0.3295471 0.5487022
IBCF 0.5364686 0.2877985 0.4130695

Column {data-width =450}

Pearsons Method with z-score Scaling 80% Training 20% Test

Five Reviewers
RMSE MSE MAE
UBCF 0.6628161 0.4393252 0.5095242
IBCF 0.4923255 0.2423844 0.3142065
Six Reviewers
RMSE MSE MAE
UBCF 0.2392441 0.0572378 0.2392441
IBCF 0.6306225 0.3976847 0.4690007
Seven Reviewers
RMSE MSE MAE
UBCF 0.4000711 0.1600569 0.3280428
IBCF 0.5579159 0.3112702 0.3815400

Column {data-width =250}

Comparing Pearson’s Method

Using a full 80% Training and 20% testing on the matrix subset with 5, 6, or 7 mimimum reviews per reviewer (set as given in params) and comparing them across scaling methods center and z-score` there are a wide range of possible recommenders.

The best of the Center Scaled

When comparing user-based to item-based recommenders across all three minimum reviewer sizes, all three metrics were associated with the same (either UBCF or IBCF) for that number of reviewers, but which method had the lowest error varied with the number of reviewers.

The absolute lowest set of scores for center scaling with Pearson's similarity was the model with 6 reviewers using Item-Based

Pearson, 6-Reviewer, Center Scaled

RMSE MSE MAE
IBCF 0.4984133 0.2484158 0.3492304

The best of the z-score Scaled The models were computed in exactly the same configurations for pearson changing only the scaling to z-score. In this case, for 5 & 6 reviewers all three errors for each comarison were aligned with the same model (either user or item). For 7 reviewers , however, the user-based had lower RMSE and MSE but a higher MAE and the disparity was pretty significant.

The lowest overall Scores were with just 5 reviewers, this model outperformed the center-scaled best model by approximately .1 across the board.

Pearson, 5-Reviewer, z-score Scaled

RMSE MSE MAE
UBCF 0.3458464 00.1196097 0.2295455

Cosine

Column {data-width =450}

Cosine Method Center Scaling 80% Training 20% Test

Five Reviewers
RMSE MSE MAE
UBCF 0.5233471 0.2738921 0.4745330
IBCF 0.5705011 0.3254715 0.3957425
Six Reviewers
RMSE MSE MAE
UBCF 0.5568457 0.3100771 0.4556102
IBCF 0.5888832 0.3467835 0.4363687
Seven Reviewers
RMSE MSE MAE
UBCF 0.4996145 0.2496146 0.4056109
IBCF 0.6150877 0.3783329 0.4969680

Column {data-width =450}

Cosine Method with z-score Scaling 80% Training 20% Test

Five Reviewers
RMSE MSE MAE
UBCF 0.4941246 0.2441591 0.4250010
IBCF 0.5654305 0.3197117 0.4300994
Six Reviewers
RMSE MSE MAE
UBCF 0.5399805 0.291579 0.4711546
IBCF 0.5530642 0.305880 0.4055877
Seven Reviewers
RMSE MSE MAE
UBCF 0.4591887 0.2108542 0.3990037
IBCF 0.5018022 0.2518054 0.3742733

Column {data-width =250}

Comparing Cosine Method

FOr the cosine method, both center and z-score scaled best performers were lower in all three error metrics than all other models with using that scale.

Center Scaled Data

In this method, once again all the best error rates for a given combination of reviewers, and scale were associated with the same type of model, and in each case the user-based out performed the item-based recommenders.

Cosine, 6-Reviewer, Center Scaled User-Based

RMSE MSE MAE
UBCF 0.4202448 0.1766057 0.3862834

z-score Scaled Data

Using z-scores, again the lowest errors were lowest for all three types of erro in either one or the other item-base or user-based for each combination or were very close to exactly the same. Unlike with the center scaling the z-score scaled models were not consistenly the same model type. With smaller reviewer sets (5-reviewer minimum) the user-based method performed better, but as the minimum reviewers grew, the item-based methods error’s performed considerably better.

Cosine, 7-Reviewer, Z-Score Scaled Item-Based

RMSE MSE MAE
IBCF 0.4250244 0.1806457 0.2705459

Again, the z-score outperformed the center score at all but the 6-reviewer configuration and the error was considerably lower at 7 reviewers for z-scoring.

Jaccard

Column {data-width =450}

Jaccard Method Center Scaling 80% Training 20% Test

Five Reviewers
RMSE MSE MAE
UBCF 0.5108811 0.2609995 0.3966992
IBCF 0.5920140 0.3504806 0.4473545
Six Reviewers
RMSE MSE MAE
UBCF 0.5080773 0.2581426 0.4334914
IBCF 0.5017087 0.2517116 0.3477941
Seven Reviewers
RMSE MSE MAE
UBCF 0.4687841 0.2197585 0.3786334
IBCF 0.4592163 0.2108796 0.3361111

Column {data-width =450}

Jaccard Method with z-score Scaling 80% Training 20% Test

Five Reviewers
RMSE MSE MAE
UBCF 0.5263456 0.2770397 0.4381720
IBCF 0.5801607 0.3365864 0.4651974
Six Reviewers
RMSE MSE MAE
UBCF 0.4507926 0.2032140 0.3441676
IBCF 0.5926991 0.3512923 0.4717391
Seven Reviewers
RMSE MSE MAE
UBCF 0.5026210 0.2526279 0.4252860
IBCF 0.5579156 0.3112698 0.4314286

Column {data-width =250}

Comparing Jaccard Method

The Jaccard Methods are Universally the worst with center scaling error lowest in user-based with 6 reviewers and lowest for item-based lowest with 7 reviewers.

Z-score scaling was uniformly best with 6 reviews, the user-based slightly out-performing iem-based in RMSE and MSE but worse in MAE but not far off.

Jaccard, 6 - Reviewer, Center-Scaled User-Based

RMSE MSE MAE
UBCF 0.434365 0.1886729 0.3585991

Jaccard, 7 - Reviewer, Z-Score Scaled

RMSE MSE MAE
UBCF 0.4750463 0.2256690 0.4168385

Summary

Column

Interpreting the Models

This comparison was interesting. Comparing Pearson, Cosine and Jaccard metrics, it is clear that you could build a decent recommender with sufficient data using any of the three. The errors ranged from .1 - .65 points on a 2-point scale.

I would not say that .65 is trivial error predicting between 1 and 2, but the .2 range seems reasonable given 138 reviewers and 130 restaurants with 907 ratings and 16897 empty slots. However, this seems like an honorable start which could be improved upon with more reviews.

Making a final selection, selecting for a low mean absolute error seems to make the most sense as it should be the average error without regard for over or under-shooting. This is more interpretable than trying to back-caclualte from the Mean Squared Error or Root Mean Squared Error.

It also makes sense, that if several of the models were very close, to select for the one which requires the least reviews to train, as this means more obeservations are preserved and used in the training and testing, which should make the model more generalizable.

The Winner is Pearson - Z-score - 5-Reviewer * User-Based

In this case, Pearson, User-Based, 5- Review Z-Score Scaled model, met all of the above criteria. It had the lowest error for all three measure (RMSE, MSE & MAE) and it required only 5 reviews to accomplish this task.

RMSE MSE MAE
UBCF 0.345 0.119 0.229

The Runner Up: Cosine * Z-score * 7-Reviewer * Item-Based

The next closes was Cosine, Z-Score Item-Based, but it required more reviews, 7, in fact to accomplish similar results. This kind of makes sense, as it should require more reviews of a restaurant, and many restaurants to estimate how alike they are in such a sparse matrix. I would guess that as the data grows, this model might surpass the user-based model.

RMSE MSE MAE
IBCF 0.425 0.180 0.270

Column

Comparing Across the Models

Pearson, 6-Reviewer, Center Scaled

RMSE MSE MAE
IBCF 0.498 0.248 0.349

Pearson, 5-Reviewer, z-score Scaled

RMSE MSE MAE
UBCF 0.345 0.119 0.229

Cosine, 6-Reviewer, Center Scaled User-Based

RMSE MSE MAE
UBCF 0.420 0.1767 0.386

Cosine, 7-Reviewer, Z-Score Scaled Item-Based

RMSE MSE MAE
IBCF 0.425 0.180 0.270

Jaccard, 6 - Reviewer, Center-Scaled User-Based

RMSE MSE MAE
UBCF 0.434 0.188 0.358

Jaccard, 7 - Reviewer, Z-Score Scaled

RMSE MSE MAE
UBCF 0.4750 0.225 0.416

Column

Model Code

Having created a function that took all the parameters early on, it was just a matter of passing parameters into the function. I could have automated this, but I wanted the ability to evaluate them one at a time and check my results repeatedly without running a whole list and having to parse out just the one set of errors. So, this is just a long set of function calls. Everything else used in this was incorporated into that function up front.

The Full Code is avaliable with this .Rmd file in the Github repository should you choose to repeate any of this . I did change my seed at some point and not remember the original one, so if I did not guess correctly you will likely get similar but not exactly the same results in your run as I worked from the rMarkdown cache for most of this site development.

Citations

Citation

This dataset was originally downloaded from the UCI ML Repository: Restaurant & Consumer Data Sets

Creators: Rafael Ponce Medellín and Juan Gabriel González Serna , Department of Computer Science. National Center for Research and Technological Development CENIDET, México

Blanca Vargas-Govea, Juan Gabriel González-Serna, Rafael Ponce-Medellan. Effects of relevant contextual features in the performance of a restaurant recommender system. In RecSys101: Workshop on Context Aware Recommender Systems (CARS-2011), Chicago, IL, USA, October 23, 2011