Accuracy and Beyond

The goal of this assignment is give you practice working with accuracy and other recommender system metrics

Source: For this project I’ll use the Jester joke dataset which I haven’t worked with yet. It has the highest density of the example datasets I was able to find online: http://eigentaste.berkeley.edu/dataset/

Model Building

The recommenderlab package will normalize the ratings so that the average for each user is 0, removing the bias from those users who give exceptionally high or low ratings.

Novelty

At this stage I’ll introduce a random model.
My expectation is that it will score worse, just based on the nature of the model and its reduced dependency on actual historical voting patterns, but users may find it preferable over a more accurate yet tired model.
Alternatives for accomplishing this goal could be including more neighbors in the model training process to ensure some selections from the tail, or Recommenderlab’s popular model.

Summary

As expected, the random model performs very poorly in terms of accuracy.
Another approach to novelty/serendipity could be a weighted/hybrid solution of the top performing UBCF model, along with some randomness.
It’s interesting to note, though not a surprise, that the model “training” time for the random model is 0 seconds for each fold.
If accuracy was the entire goal of this model then at this stage I would start looking into tuning the UBCF model with varying correlation methodologies or neighborhood sizes.

An online evaluation would further tune the model by gauging user reaction to given recommendations in terms of actual ratings to the ones that a UBCF wouldn’t have otherwise recommended.

With more time, I’d have attempted to use the full dataset in a Spark environment, and confirm that generated recommendations are indeed unique.