Project Description

Deliverables
  1. As in your previous assignments, compare the accuracy of at least two recommender system algorithms against your offline data.

  2. Implement support for at least one business or user experience goal such as increased serendipity, novelty, or diversity.

  3. Compare and report on any change in accuracy before and after you’ve made the change in #2.

  4. As part of your textual conclusion, discuss one or more additional experiments that could be performed and/or metrics that could be evaluated only if online evaluation was possible. Also, briefly propose how you would design a reasonable online evaluation environment.

Dataset

I am going to use Jester5K Dataset available as part of the recommenderlab package. First, let’s install packages and take a look at our data - it is a sparse ratings matrix. I am going to compare the accuracy of several recommender system algorythms - SVD, UBCF, and IBCF and evaluate their performance for this data.

Installing packages:

options(warn=-1)
if(!"recommenderlab" %in% rownames(installed.packages())){
install.packages("recommenderlab")}
suppressMessages(library("ggplot2"))
suppressMessages(library("recommenderlab"))
suppressMessages(library(kableExtra))

Loading dataset

Let’s visually inspect the data by viewing a 10x10 sample.

data_package <- data(package = "recommenderlab")
data("Jester5k")

y<-as.matrix(Jester5k@data[1:10,1:10])
y  %>% kable(caption = "Data") %>% kable_styling("striped", full_width = TRUE)
Data
j1 j2 j3 j4 j5 j6 j7 j8 j9 j10
u2841 7.91 9.17 5.34 8.16 -8.74 7.14 8.88 -8.25 5.87 6.21
u15547 -3.20 -3.50 -9.56 -8.74 -6.36 -3.30 0.78 2.18 -8.40 -8.79
u15221 -1.70 1.21 1.55 2.77 5.58 3.06 2.72 -4.66 4.51 -3.06
u15573 -7.38 -8.93 -3.88 -7.23 -4.90 4.13 2.57 3.83 4.37 3.16
u21505 0.10 4.17 4.90 1.55 5.53 1.50 -3.79 1.94 3.59 4.81
u15994 0.83 -4.90 0.68 -7.18 0.34 -4.32 -6.17 6.12 -5.58 5.44
u238 2.91 4.76 0.00 0.00 0.00 0.00 -9.47 3.69 0.00 -2.38
u5809 -2.77 -6.31 2.23 0.00 0.19 1.26 2.43 -5.00 0.00 -5.49
u16636 -3.35 -5.92 -5.63 -3.01 1.70 -2.52 1.60 -0.53 1.65 -7.28
u12843 -1.99 -6.89 2.09 -4.42 -4.90 2.43 -3.06 3.98 -1.46 0.68

Inspecting the data

Let’s take a look at the number of ratings and the possible rating values and their distribution. This is a 5000x100 ratings matrix that includes 362106 ratings. We see that on average users have rated 72 jokes out of a 100 and a minimum of 32 jokes.

Jester5k
## 5000 x 100 rating matrix of class 'realRatingMatrix' with 362106 ratings.
summary(rowCounts(Jester5k)) 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   36.00   53.00   72.00   72.42  100.00  100.00
hist(getRatings(Jester5k),main="Joke ratings")

The next step is to take a look at the number of ratings per user.

jratings <- Jester5k
rating_cnt_per_member <- rowCounts(jratings)
qplot(rating_cnt_per_member) + stat_bin(binwidth = 0.1) +
ggtitle("Number of ratings per user")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Now, I am going to visualize the data to see the rating distribution, the data is beautifully and normally distributed.

average_ratings_per_user <- rowMeans(jratings)

qplot(average_ratings_per_user) + stat_bin(binwidth = 0.1) +
ggtitle("Distribution of the average rating per user")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Splitting the data into train/test sets

Let’s split our data into train and test set using 80% to 20% split. After much consideration I am going to consider 3 a good rating. It can be argued that on a scale of -10 to 10 all data above 0 should be considered good, but I want to set my standards higher than that.

set.seed(11)
eval <- evaluationScheme(jratings, method = "split", train = 0.8, given=25, goodRating = 3)
train <- getData(eval, "train")
known <- getData(eval, "known")
unknown <- getData(eval, "unknown")

Using SVD, IBCF, and UCBF for recommendations

I am going to compare 3 recommendation algorythms: SVD, Item Based Collaborative Filtering (IBCF), and User Based Collaborative Filtering (UBCF).

set.seed(44)
recom <- Recommender(train, method = "SVD")
pred <- predict(object = recom, newdata = known, type = "ratings")
eval_accuracy_SVD <- calcPredictionAccuracy(pred, unknown, byUser = FALSE)

recom <- Recommender(train, method = "IBCF")
pred <- predict(object = recom, newdata = known, type = "ratings")
eval_accuracy_IBCF <- calcPredictionAccuracy(pred, unknown, byUser = FALSE)

recom <- Recommender(train, method = "UBCF")
pred <- predict(object = recom, newdata = known, type = "ratings")
eval_accuracy_UBCF <- calcPredictionAccuracy(pred, unknown, byUser = FALSE)

rbind(eval_accuracy_SVD, eval_accuracy_IBCF, eval_accuracy_UBCF)
##                        RMSE      MSE      MAE
## eval_accuracy_SVD  4.487716 20.13959 3.562392
## eval_accuracy_IBCF 5.030148 25.30239 3.990891
## eval_accuracy_UBCF 4.507066 20.31364 3.564556

The most effective method with the lowest Root Mean Squared Error(RMSE) is SVD and the method with the highest RMSE is IBCF.

Implementing support for at least one business or user experience goal

I will narrow my dataset down to only the users that rated more than 80 jokes and the jokes that have more that 1000 ratings in order to improve the quality of my recommendations.

jratings <- Jester5k[rowCounts(Jester5k) > 80, colCounts(Jester5k) > 1000] 
jratings
## 1701 x 100 rating matrix of class 'realRatingMatrix' with 167062 ratings.
set.seed(11)
eval <- evaluationScheme(jratings, method = "split", train = 0.8, given=25, goodRating = 7)
train <- getData(eval, "train")
known <- getData(eval, "known")
unknown <- getData(eval, "unknown")

set.seed(44)
recom <- Recommender(train, method = "SVD")
pred <- predict(object = recom, newdata = known, type = "ratings")
eval_accuracy_SVD_2 <- calcPredictionAccuracy(pred, unknown, byUser = FALSE)

recom <- Recommender(train, method = "IBCF")
pred <- predict(object = recom, newdata = known, type = "ratings")
eval_accuracy_IBCF_2 <- calcPredictionAccuracy(pred, unknown, byUser = FALSE)

recom <- Recommender(train, method = "UBCF")
pred <- predict(object = recom, newdata = known, type = "ratings")
eval_accuracy_UBCF_2 <- calcPredictionAccuracy(pred, unknown, byUser = FALSE)

#Results with updated data
rbind(eval_accuracy_SVD_2, eval_accuracy_IBCF_2, eval_accuracy_UBCF_2)
##                          RMSE      MSE      MAE
## eval_accuracy_SVD_2  4.363654 19.04148 3.449774
## eval_accuracy_IBCF_2 4.840423 23.42969 3.803264
## eval_accuracy_UBCF_2 4.260597 18.15269 3.328305

After I have made that change User Based Collaborative filtering became a more effective algorythm.

Sample top Recommendations for a user

Here are the top 3 recomendations for a user:

set.seed(444)
pred2 <- predict(object = recom, newdata = unknown, type = "topNList", n = 3)

recc_user_1 <- pred2@items[[1]]
movies_user_1 <- pred2@itemLabels[recc_user_1]
movies_user_1 %>% kable(caption = "User1 Predictions") %>% kable_styling("striped", full_width = TRUE)
User1 Predictions
x
j62
j68
j94

Let’s look at these recommended jokes, by simple observation - it appears that all three have a common theme of professional humor and even though it is subjective - could be considered sarcastic so it looks like it is a good recommendation.

cat(JesterJokes[movies_user_1[1]])
## A group of managers were given the assignment to measure the height of a flagpole. So they go out to the flagpole with ladders and tape measures, and they're falling off the ladders, dropping the tape measures - the whole thing is just a mess. An engineer comes along and sees what they're trying to do, walks over, pulls the flagpole out of the ground, lays it flat, measures it from end to end, gives the measurement to one of the managers and walks away. After the engineer has gone, one manager turns to another and laughs. "Isn't that just like an engineer, we're looking for the height and he gives us the length."
#
cat(JesterJokes[movies_user_1[2]])
## A man piloting a hot air balloon discovers he has wandered off course and is hopelessly lost. He descends to a lower altitude and locates a man down on the ground. He lowers the balloon further and shouts "Excuse me, can you tell me where I am?" The man below says: "Yes, you're in a hot air balloon, about 30 feet above this field." "You must work in Information Technology," says the balloonist. "Yes I do," replies the man. "And how did you know that?" "Well," says the balloonist, "what you told me is technically correct, but of no use to anyone." The man below says, "You must work in management." "I do," replies the balloonist, "how did you know?" "Well," says the man, "you don't know where you are, or where you're going, but you expect my immediate help. You're in the same position you were before we met, but now it's my fault!"
#
cat(JesterJokes[movies_user_1[3]])
## Two atoms are walking down the street when one atom says to the other "Oh, my! I've lost an electron!" The second atom says"Are you sure" The first replies "I'm positive!"

Summary

In conclusion, SVD and User Based Collaborative Filtering appear to be the most effective techniques with better RMSE results than Item Based Collaborative Filtering and the recommendations appear to be consistent and reasonable. A lot of the jokes I have seen in the dataset were pretty bad - so the fact that these recommendations are pretty funny in my opinion is a good sign. Sense of humor varies dramatically and it is not an easy things to predict so I am impressed with my results.

If online evaluation was possible, I imagine that our results would be improved significantly. Since the dataset was gathered a while ago I imagine the sort of things people find humorous might change from one decade to another. Also, I assume that the joke selection was done by the scientists performing the experiment, based on the amount of science related jokes… Perhaps, allowing people to contribute jokes might produce a more diverse dataset and recommendations.