Project4_Accuracy-and-Beyond.knit

Column

Instruction

The goal of this assignment is give you practice working with accuracy and other recommender system metrics.In this assignment you’re asked to do at least one or (if you like) both of the following:

Work in a small group, and/or Choose a different dataset to work with from your previous projects. Deliverables As in your previous assignments, compare the accuracy of at least two recommender system algorithms against your offline data.

Implement support for at least one business or user experience goal such as increased serendipity, novelty, or diversity.

Compare and report on any change in accuracy before and after you’ve made the change in #2.

As part of your textual conclusion, discuss one or more additional experiments that could be performed and/or metrics that could be evaluated only if online evaluation was possible. Also, briefly propose how you would design a reasonable online evaluation environment.

Introduction

The project at hand is to build a recommender system that can provide users with personalized joke recommendations. In addition to optimizing for recommendations that are accurate, the recommender also focuses on maximizing user experience by adding some level of serendipity to the recommendations - meaning users will receive unexpected items with the potential to be enjoyable within their most highly recommended items.

Initially, we investigate the performance of five recommendation algorithms:

User-Based Collaborative Filtering (UBCF) using cosine and Pearson similarity
Item-Based Collaborative Filtering (IBCF) using cosine and Pearson similarity
A Random recommender algorithm (a baseline)

For each method, we conduct a 3-fold cross-validation to evaluate precision and performance. After we establish the best performing algorithm, we promote novel items in the recommendation list by replacing a selection of the most recommended items with more novel items. In this approach we try to balance accurate recommendations with user engagement and discovery.

Load Data

We use the Jester dataset, which includes joke ratings from 24,983 users. Each user has rated at least 36 jokes, with rating values ranging from -10.00 (least funny) to +10.00 (most funny). A rating of 99 indicates that a joke was not rated by the user. In the dataset, each row corresponds to a single user: the first column records the total number of jokes they rated, while the next 100 columns contain their individual ratings for each joke.

# Load the jester dataset
library(readxl)

# Read the Excel file (first sheet)
jester <- data.frame(read_xls("jester-data-1.xls", col_names = FALSE))

# Assign column names: first column is rating count, followed by 100 joke ratings
##colnames(jester) <- c("ratingCount", paste0("Joke_", 1:100))
##row.names(jester) <- 1:nrow(jester)

# Display the structure and first few rows of the data
# str(jester)
head(jester[, 1:6])  # Show the first 5 columns (rating count + 5 jokes)

  ...1  ...2  ...3  ...4  ...5  ...6
1   74 -7.82  8.79 -9.66 -8.16 -7.52
2  100  4.08 -0.29  6.36  4.37 -2.38
3   49 99.00 99.00 99.00 99.00  9.03
4   48 99.00  8.35 99.00 99.00  1.80
5   91  8.50  4.61 -4.17 -5.39  1.36
6  100 -6.17 -3.54  0.44 -8.50 -7.09

# Basic summary statistics for joke ratings
summary(jester[, 2:6])

      ...2            ...3            ...4             ...5      
 Min.   :-9.95   Min.   :-9.95   Min.   :-9.950   Min.   :-9.95  
 1st Qu.:-0.15   1st Qu.:-2.18   1st Qu.:-1.145   1st Qu.:-3.35  
 Median : 5.05   Median : 3.69   Median : 4.660   Median : 3.45  
 Mean   :34.40   Mean   :27.09   Mean   :33.264   Mean   :36.24  
 3rd Qu.:99.00   3rd Qu.:99.00   3rd Qu.:99.000   3rd Qu.:99.00  
 Max.   :99.00   Max.   :99.00   Max.   :99.000   Max.   :99.00  
      ...6        
 Min.   :-9.9500  
 1st Qu.:-3.6900  
 Median : 1.1200  
 Mean   : 0.4272  
 3rd Qu.: 4.4700  
 Max.   :99.0000

This gives a quick sense of how the data is structured and how ratings are distributed across the first few jokes. It also confirms that the dataset has the correct dimensions and types before moving on to data preprocessing.

Data Cleansing

First, we will load in the jester data set. We will remove the column with the number of rated jokes because this will not be used in the recommendation system. Additionally, the raw data represents non-rated jokes as the number 99, so we will replace these values with nulls. Finally, we will subset the data to 5,000 users to speed up computation time.

# Read jester data
##jester <- data.frame(read_xls("jester-data-1.xls", col_names = FALSE))
colnames(jester) <- c("ratingCount", seq(100))
row.names(jester) <- 1:nrow(jester)

# remove num jokes column
ratings <- jester[-1]

# replace 0 (no rating) with NULL
ratings[ratings == 99] <- NA
ratings <- ratings[1:5000,]
ratings <- as.matrix(ratings)

# Create large dgCMatrix
finalRatings <- as(ratings, 'realRatingMatrix')

Data Exploration

Let’s dive a little deeper into our data.

First, let’s take a look at the number of jokes that each user has rated. We set the threshold at 36 jokes, and it appears that most individuals have rated either around 70 or 100 jokes.

jokeCount <- rowCounts(finalRatings)
hist(jokeCount,
     main = 'Number of Jokes Rated per User',
     xlab = 'Number of Jokes Rated',
     ylab = 'Number of Users')

Next, we can look at the number of ratings that each joke has. We can see that many of the jokes were rated by all 5000 users.

ratingCount <- colCounts(finalRatings)
hist(ratingCount,
     main = 'Number of Individuals Rating each Joke',
     xlab = 'Number of Users that Rated Joke',
     ylab = 'Number of Jokes')

Now we can take a look at the average rating across all jokes. The median average rating is a little over 0 (neutral), which means that jokes are typically rated with a small positive skew.

# average rating
mean_rating <- colMeans(finalRatings, na.rm = T)
quantile(mean_rating)

        0%        25%        50%        75%       100% 
-3.8296705 -0.3050142  0.8847812  1.8412310  4.0056984

We can also plot the average ratings. A look at the distribution shows that most jokes have an average rating between -2 and 3. We have a few outliers that are rated more positively (+4) and more negatively (-4).

goodrating <- quantile(mean_rating, .5)
qplot(mean_rating) + ggtitle("Distribution of Average Joke Rating") + geom_vline(xintercept = goodrating, col='red')

From this data exploration, we can see that most users rated all jokes and conversely, most jokes were rated by all users. The median average rating for the users is a little over 0, so we will use 1 as our threshold for a good joke.

Genre Exploration

### Genre Exploration 

library(readxl)
library(ggplot2)

# Load the Jester dataset
jester <- read_excel("jester-data-1.xls", col_names = FALSE)

# Remove first column (rating count) and replace 99s with NA
ratings <- jester[ , -1]
ratings[ratings == 99] <- NA

# Summary of ratings
summary(ratings)

      ...2              ...3              ...4              ...5       
 Min.   :-9.9500   Min.   :-9.9500   Min.   :-9.9500   Min.   :-9.950  
 1st Qu.:-2.6700   1st Qu.:-4.4200   1st Qu.:-4.0300   1st Qu.:-5.870  
 Median : 1.6000   Median : 0.8300   Median : 0.7800   Median :-1.460  
 Mean   : 0.9046   Mean   : 0.2085   Mean   : 0.3166   Mean   :-1.449  
 3rd Qu.: 4.9000   3rd Qu.: 4.8200   3rd Qu.: 4.6600   3rd Qu.: 2.520  
 Max.   : 9.3700   Max.   : 9.3700   Max.   : 9.3700   Max.   : 9.370  
 NA's   :8531      NA's   :6799      NA's   :8341      NA's   :9375    
      ...6              ...7             ...8              ...9        
 Min.   :-9.9500   Min.   :-9.950   Min.   :-9.9500   Min.   :-9.9500  
 1st Qu.:-3.6900   1st Qu.:-1.310   1st Qu.:-5.2900   1st Qu.:-4.6100  
 Median : 1.1200   Median : 2.230   Median :-0.0500   Median :-0.2900  
 Mean   : 0.4036   Mean   : 1.594   Mean   :-0.4297   Mean   :-0.6241  
 3rd Qu.: 4.4700   3rd Qu.: 5.440   3rd Qu.: 3.8800   3rd Qu.: 3.0600  
 Max.   : 9.3700   Max.   : 9.470   Max.   : 9.6100   Max.   : 9.7600  
 NA's   :6         NA's   :4552     NA's   :7         NA's   :2        
     ...10            ...11            ...12            ...13       
 Min.   :-9.950   Min.   :-9.950   Min.   :-9.950   Min.   :-9.950  
 1st Qu.:-4.760   1st Qu.:-2.090   1st Qu.:-1.210   1st Qu.:-1.650  
 Median :-0.190   Median : 1.940   Median : 2.430   Median : 2.040  
 Mean   :-0.557   Mean   : 1.301   Mean   : 1.806   Mean   : 1.449  
 3rd Qu.: 3.300   3rd Qu.: 5.390   3rd Qu.: 5.780   3rd Qu.: 5.290  
 Max.   : 9.900   Max.   : 9.370   Max.   : 9.610   Max.   : 9.850  
 NA's   :9231     NA's   :4659     NA's   :3162     NA's   :2560    
     ...14            ...15            ...16            ...17       
 Min.   :-9.950   Min.   :-9.950   Min.   :-9.950   Min.   :-9.950  
 1st Qu.:-6.260   1st Qu.:-2.090   1st Qu.:-6.210   1st Qu.:-7.430  
 Median :-1.700   Median : 2.090   Median :-1.750   Median :-3.790  
 Mean   :-1.763   Mean   : 1.396   Mean   :-1.706   Mean   :-3.105  
 3rd Qu.: 2.280   3rd Qu.: 5.555   3rd Qu.: 2.330   3rd Qu.: 0.580  
 Max.   : 9.370   Max.   : 9.560   Max.   : 9.900   Max.   : 9.370  
 NA's   :2        NA's   :2468     NA's   :6        NA's   :8       
     ...18            ...19             ...20             ...21       
 Min.   :-9.950   Min.   :-9.9500   Min.   :-9.9500   Min.   :-9.950  
 1st Qu.:-4.420   1st Qu.:-4.4200   1st Qu.:-3.4500   1st Qu.:-5.050  
 Median :-0.630   Median :-0.2900   Median : 0.5300   Median :-0.630  
 Mean   :-1.115   Mean   :-0.6351   Mean   : 0.1542   Mean   :-0.934  
 3rd Qu.: 1.940   3rd Qu.: 3.0100   3rd Qu.: 3.7400   3rd Qu.: 2.910  
 Max.   : 9.370   Max.   : 9.8100   Max.   : 9.5600   Max.   : 9.370  
 NA's   :3        NA's   :4         NA's   :5         NA's   :3       
     ...22            ...23             ...24             ...25       
 Min.   :-9.950   Min.   :-9.9500   Min.   :-9.9500   Min.   :-9.950  
 1st Qu.:-0.780   1st Qu.:-2.8200   1st Qu.:-3.7900   1st Qu.:-6.210  
 Median : 2.820   Median : 1.4100   Median : 0.3400   Median :-1.750  
 Mean   : 2.096   Mean   : 0.8687   Mean   : 0.1092   Mean   :-1.687  
 3rd Qu.: 6.020   3rd Qu.: 4.9000   3rd Qu.: 3.9800   3rd Qu.: 2.230  
 Max.   : 9.560   Max.   : 9.8500   Max.   : 9.8500   Max.   : 9.810  
 NA's   :62       NA's   :3550      NA's   :4950      NA's   :9104    
     ...26             ...27            ...28           ...29       
 Min.   :-9.9500   Min.   :-9.950   Min.   :-9.95   Min.   :-9.950  
 1st Qu.:-3.3500   1st Qu.:-1.840   1st Qu.: 0.68   1st Qu.:-1.650  
 Median : 0.7800   Median : 1.840   Median : 4.13   Median : 2.065  
 Mean   : 0.3824   Mean   : 1.286   Mean   : 3.19   Mean   : 1.519  
 3rd Qu.: 4.2700   3rd Qu.: 5.050   3rd Qu.: 6.75   3rd Qu.: 5.530  
 Max.   : 9.3700   Max.   : 9.370   Max.   : 9.42   Max.   : 9.420  
 NA's   :4167      NA's   :1233     NA's   :79      NA's   :993     
     ...30            ...31             ...32            ...33       
 Min.   :-9.950   Min.   :-9.9500   Min.   :-9.950   Min.   :-9.950  
 1st Qu.: 0.340   1st Qu.:-4.1700   1st Qu.:-0.630   1st Qu.: 0.630  
 Median : 3.980   Median :-0.2900   Median : 3.010   Median : 3.930  
 Mean   : 2.972   Mean   :-0.4365   Mean   : 2.199   Mean   : 3.156  
 3rd Qu.: 6.750   3rd Qu.: 3.1600   3rd Qu.: 6.120   3rd Qu.: 6.600  
 Max.   : 9.370   Max.   : 9.3700   Max.   : 9.370   Max.   : 9.900  
 NA's   :49       NA's   :7006      NA's   :327      NA's   :35      
     ...34            ...35             ...36            ...37       
 Min.   :-9.950   Min.   :-9.9500   Min.   :-9.950   Min.   :-9.950  
 1st Qu.:-5.920   1st Qu.:-2.7700   1st Qu.: 0.390   1st Qu.: 0.970  
 Median :-1.260   Median : 1.2100   Median : 3.830   Median : 4.030  
 Mean   :-1.369   Mean   : 0.8595   Mean   : 3.006   Mean   : 3.313  
 3rd Qu.: 2.620   3rd Qu.: 4.9000   3rd Qu.: 6.600   3rd Qu.: 6.600  
 Max.   : 9.370   Max.   : 9.3700   Max.   : 9.900   Max.   : 9.850  
 NA's   :8279     NA's   :3420      NA's   :31       NA's   :18      
     ...38            ...39           ...40            ...41        
 Min.   :-9.950   Min.   :-9.95   Min.   :-9.950   Min.   :-9.9500  
 1st Qu.:-5.870   1st Qu.:-2.14   1st Qu.:-2.140   1st Qu.:-2.5200  
 Median :-1.260   Median : 1.89   Median : 1.460   Median : 1.4600  
 Mean   :-1.364   Mean   : 1.30   Mean   : 1.067   Mean   : 0.9986  
 3rd Qu.: 2.620   3rd Qu.: 5.29   3rd Qu.: 4.810   3rd Qu.: 5.0000  
 Max.   : 9.370   Max.   : 9.56   Max.   : 9.760   Max.   : 9.3700  
 NA's   :8102     NA's   :2125    NA's   :1903     NA's   :2585     
     ...42             ...43            ...44             ...45       
 Min.   :-9.9500   Min.   :-9.950   Min.   :-9.9500   Min.   :-9.950  
 1st Qu.:-4.9000   1st Qu.:-0.780   1st Qu.:-5.3150   1st Qu.:-6.800  
 Median :-0.0500   Median : 2.620   Median :-0.5800   Median :-2.430  
 Mean   :-0.3205   Mean   : 1.967   Mean   :-0.9249   Mean   :-2.112  
 3rd Qu.: 4.0800   3rd Qu.: 5.830   3rd Qu.: 3.0600   3rd Qu.: 1.890  
 Max.   : 9.3700   Max.   : 9.850   Max.   : 9.4200   Max.   : 9.370  
 NA's   :6216      NA's   :414      NA's   :7472      NA's   :8806    
     ...46            ...47           ...48           ...49       
 Min.   :-9.950   Min.   :-9.95   Min.   :-9.95   Min.   :-9.950  
 1st Qu.:-2.040   1st Qu.:-1.70   1st Qu.:-1.36   1st Qu.:-1.070  
 Median : 1.650   Median : 2.28   Median : 2.14   Median : 2.520  
 Mean   : 1.109   Mean   : 1.50   Mean   : 1.56   Mean   : 1.826  
 3rd Qu.: 4.810   3rd Qu.: 5.53   3rd Qu.: 5.49   3rd Qu.: 5.780  
 Max.   :10.000   Max.   : 9.66   Max.   : 9.81   Max.   : 9.370  
 NA's   :3658     NA's   :1358    NA's   :2742    NA's   :216     
     ...50            ...51            ...52             ...53        
 Min.   :-9.950   Min.   :-9.950   Min.   :-9.9500   Min.   :-9.9500  
 1st Qu.: 0.100   1st Qu.: 1.550   1st Qu.:-5.2900   1st Qu.:-4.1300  
 Median : 3.450   Median : 4.470   Median :-0.5300   Median : 0.1000  
 Mean   : 2.771   Mean   : 3.665   Mean   :-0.7317   Mean   :-0.1331  
 3rd Qu.: 6.360   3rd Qu.: 6.840   3rd Qu.: 3.5400   3rd Qu.: 3.8300  
 Max.   : 9.900   Max.   : 9.950   Max.   : 9.3700   Max.   : 9.6600  
 NA's   :26       NA's   :11       NA's   :6091      NA's   :4919     
     ...54            ...55            ...56             ...57       
 Min.   :-9.950   Min.   :-9.950   Min.   :-9.9500   Min.   :-9.950  
 1st Qu.: 0.390   1st Qu.:-0.150   1st Qu.:-3.3500   1st Qu.:-1.120  
 Median : 3.830   Median : 3.540   Median : 1.0200   Median : 2.430  
 Mean   : 2.939   Mean   : 2.736   Mean   : 0.5057   Mean   : 1.765  
 3rd Qu.: 6.550   3rd Qu.: 6.600   3rd Qu.: 4.5600   3rd Qu.: 5.580  
 Max.   : 9.710   Max.   : 9.510   Max.   : 9.9500   Max.   : 9.370  
 NA's   :12       NA's   :292      NA's   :5256      NA's   :243     
     ...58            ...59            ...60            ...61        
 Min.   :-9.950   Min.   :-9.950   Min.   :-9.950   Min.   :-9.9500  
 1st Qu.:-6.800   1st Qu.:-8.160   1st Qu.:-4.320   1st Qu.:-4.8500  
 Median :-2.230   Median :-4.950   Median :-0.290   Median :-0.1000  
 Mean   :-1.991   Mean   :-3.834   Mean   :-0.578   Mean   :-0.3254  
 3rd Qu.: 2.180   3rd Qu.:-0.340   3rd Qu.: 3.010   3rd Qu.: 3.9800  
 Max.   : 9.370   Max.   : 9.420   Max.   : 9.370   Max.   : 9.3700  
 NA's   :9010     NA's   :9430     NA's   :6848     NA's   :7022     
     ...62            ...63            ...64            ...65        
 Min.   :-9.950   Min.   :-9.950   Min.   :-9.950   Min.   :-9.9500  
 1st Qu.:-0.290   1st Qu.: 0.440   1st Qu.:-3.640   1st Qu.:-4.8100  
 Median : 3.200   Median : 3.740   Median : 0.530   Median :-0.3900  
 Mean   : 2.459   Mean   : 2.976   Mean   : 0.247   Mean   :-0.6679  
 3rd Qu.: 6.170   3rd Qu.: 6.500   3rd Qu.: 4.220   3rd Qu.: 3.2500  
 Max.   : 9.370   Max.   : 9.850   Max.   : 9.470   Max.   : 9.3700  
 NA's   :178      NA's   :47       NA's   :4819     NA's   :7707     
     ...66            ...67            ...68             ...69       
 Min.   :-9.950   Min.   :-9.950   Min.   :-9.9500   Min.   :-9.950  
 1st Qu.:-0.780   1st Qu.:-0.240   1st Qu.:-5.1000   1st Qu.:-0.100  
 Median : 3.250   Median : 3.250   Median :-0.5300   Median : 3.350  
 Mean   : 2.274   Mean   : 2.542   Mean   :-0.8639   Mean   : 2.639  
 3rd Qu.: 6.460   3rd Qu.: 6.260   3rd Qu.: 3.0600   3rd Qu.: 6.260  
 Max.   : 9.850   Max.   :10.000   Max.   : 9.3700   Max.   : 9.710  
 NA's   :255      NA's   :60       NA's   :7490      NA's   :43      
     ...70            ...71             ...72             ...73       
 Min.   :-9.950   Min.   :-9.9500   Min.   :-9.9500   Min.   :-9.950  
 1st Qu.:-0.100   1st Qu.:-2.9100   1st Qu.:-6.0200   1st Qu.: 0.190  
 Median : 3.350   Median : 0.7300   Median :-0.5300   Median : 3.400  
 Mean   : 2.559   Mean   : 0.4184   Mean   :-0.6875   Mean   : 2.757  
 3rd Qu.: 6.170   3rd Qu.: 3.8800   3rd Qu.: 4.2200   3rd Qu.: 6.310  
 Max.   : 9.760   Max.   : 9.3700   Max.   : 9.3700   Max.   : 9.370  
 NA's   :67       NA's   :4761      NA's   :16451     NA's   :16290   
     ...74            ...75            ...76             ...77      
 Min.   :-9.950   Min.   :-9.950   Min.   :-9.9500   Min.   :-9.95  
 1st Qu.:-2.520   1st Qu.:-5.780   1st Qu.:-4.7100   1st Qu.:-0.19  
 Median : 1.500   Median :-1.650   Median :-0.0500   Median : 3.25  
 Mean   : 1.083   Mean   :-1.554   Mean   :-0.2707   Mean   : 2.52  
 3rd Qu.: 5.390   3rd Qu.: 2.140   3rd Qu.: 4.0300   3rd Qu.: 6.21  
 Max.   : 9.370   Max.   : 9.370   Max.   : 9.3700   Max.   : 9.37  
 NA's   :16367    NA's   :16222    NA's   :16190     NA's   :16123  
     ...78             ...79            ...80             ...81       
 Min.   :-9.9500   Min.   :-9.950   Min.   :-9.9500   Min.   :-9.950  
 1st Qu.:-3.0600   1st Qu.:-1.170   1st Qu.:-4.1700   1st Qu.:-3.060  
 Median : 1.0700   Median : 2.430   Median : 0.3400   Median : 1.890  
 Mean   : 0.7154   Mean   : 1.752   Mean   : 0.0917   Mean   : 1.122  
 3rd Qu.: 4.8100   3rd Qu.: 5.730   3rd Qu.: 4.3200   3rd Qu.: 5.830  
 Max.   : 9.3700   Max.   : 9.370   Max.   : 9.3700   Max.   : 9.810  
 NA's   :16039     NA's   :15962    NA's   :15909     NA's   :15959   
     ...82            ...83             ...84            ...85        
 Min.   :-9.950   Min.   :-9.9500   Min.   :-9.950   Min.   :-9.9500  
 1st Qu.:-1.840   1st Qu.:-2.7700   1st Qu.:-1.120   1st Qu.:-2.9600  
 Median : 2.820   Median : 1.3600   Median : 2.820   Median : 1.0700  
 Mean   : 1.837   Mean   : 0.9027   Mean   : 2.028   Mean   : 0.6944  
 3rd Qu.: 6.260   3rd Qu.: 4.9500   3rd Qu.: 6.210   3rd Qu.: 4.7600  
 Max.   : 9.370   Max.   : 9.3700   Max.   : 9.370   Max.   : 9.3700  
 NA's   :15760    NA's   :15859     NA's   :15697    NA's   :15686    
     ...86             ...87             ...88            ...89      
 Min.   :-9.9500   Min.   :-9.9500   Min.   :-9.950   Min.   :-9.95  
 1st Qu.:-2.9600   1st Qu.:-3.5000   1st Qu.:-0.920   1st Qu.:-0.49  
 Median : 1.5000   Median : 0.5800   Median : 2.480   Median : 2.72  
 Mean   : 0.9189   Mean   : 0.3447   Mean   : 1.901   Mean   : 2.11  
 3rd Qu.: 5.2400   3rd Qu.: 4.2700   3rd Qu.: 5.680   3rd Qu.: 5.78  
 Max.   : 9.3700   Max.   : 9.3700   Max.   : 9.370   Max.   : 9.47  
 NA's   :15615     NA's   :15514     NA's   :15531    NA's   :15379  
     ...90            ...91            ...92             ...93      
 Min.   :-9.950   Min.   :-9.950   Min.   :-9.9500   Min.   :-9.95  
 1st Qu.: 0.970   1st Qu.:-3.450   1st Qu.:-0.9325   1st Qu.:-2.38  
 Median : 4.760   Median : 1.070   Median : 2.8600   Median : 1.80  
 Mean   : 3.575   Mean   : 0.612   Mean   : 2.0385   Mean   : 1.18  
 3rd Qu.: 7.430   3rd Qu.: 4.810   3rd Qu.: 6.1200   3rd Qu.: 5.29  
 Max.   : 9.420   Max.   : 9.370   Max.   : 9.5600   Max.   : 9.42  
 NA's   :15419    NA's   :15255    NA's   :15311     NA's   :15094  
     ...94            ...95            ...96            ...97       
 Min.   :-9.950   Min.   :-9.950   Min.   :-9.950   Min.   :-9.950  
 1st Qu.:-0.100   1st Qu.:-2.280   1st Qu.:-2.570   1st Qu.:-1.347  
 Median : 3.160   Median : 1.750   Median : 1.650   Median : 2.140  
 Mean   : 2.504   Mean   : 1.173   Mean   : 1.039   Mean   : 1.528  
 3rd Qu.: 6.070   3rd Qu.: 5.290   3rd Qu.: 5.100   3rd Qu.: 5.277  
 Max.   : 9.370   Max.   : 9.420   Max.   : 9.370   Max.   : 9.370  
 NA's   :14995    NA's   :14783    NA's   :14718    NA's   :14613   
     ...98            ...99             ...100            ...101      
 Min.   :-9.950   Min.   :-9.9500   Min.   :-9.9500   Min.   :-9.950  
 1st Qu.:-1.170   1st Qu.:-3.4000   1st Qu.:-4.1700   1st Qu.:-2.330  
 Median : 2.480   Median : 1.4600   Median : 0.2900   Median : 2.140  
 Mean   : 1.674   Mean   : 0.7677   Mean   :-0.0312   Mean   : 1.355  
 3rd Qu.: 5.630   3rd Qu.: 5.1500   3rd Qu.: 3.9800   3rd Qu.: 5.730  
 Max.   : 9.370   Max.   : 9.3700   Max.   : 9.3700   Max.   : 9.370  
 NA's   :14463    NA's   :14317     NA's   :14073     NA's   :15040

# Histogram: how many jokes each user rated
hist(rowSums(!is.na(ratings)),
     main = "Number of Jokes Rated per User",
     xlab = "Jokes Rated", ylab = "Users")

# Histogram: how many users rated each joke
hist(colSums(!is.na(ratings)),
     main = "Number of Users per Joke",
     xlab = "Users per Joke", ylab = "Jokes")

# Histogram: average joke ratings
avg_joke_rating <- colMeans(ratings, na.rm = TRUE)
hist(avg_joke_rating,
     main = "Average Rating per Joke",
     xlab = "Avg Rating", ylab = "Jokes")

We explored the distribution of movie genres in the dataset. This helps evaluate whether the recommender covers a wide range of genres or mostly sticks to popular ones.

Initial Recommender Systems

Parameters

We’ll define the following:

Training Percent: The percent of the data that should be used in training. The remaining data will be used for testing.
Items To Keep: The total number of items that will be used to generate the recommendations. The remaining items will be used to test the model accuracy. We’ll identify the min number of jokes that an individual has rated and use a few less than that.
Rating Threshold: The threshold to be used for positive ratings. Since our data is on a scale of -10 to 10, we will use 1 as the threshold for a good joke.
Number of Folds: This is the number of folds that will be used for k-fold validation.

Finally, we’ll define our evaluation scheme for the models.

trainPct <- 0.8
toKeep <- min(rowCounts(finalRatings)) - 5
ratingThreshold <- 1
nFold <- 3

# define evaluation scheme
evalScheme <- evaluationScheme(finalRatings, 
                               method = "cross-validation", 
                               k = nFold, 
                               given = toKeep, 
                               goodRating = ratingThreshold)

Creation of Systems

Now that we’ve set up the evaluation scheme for our recommender systems, we can compare different models. We will evaluate the output of 2 IBCF models (using cosine and pearson similarities), 2 UBCF models (once again, using cosine and pearson similarities), and 1 random model for a baseline. We will also vary the number of recommendations from 5 to 20.

# models to compare
evalModels <- list(
 IBCF_cos = list(name = "IBCF", param = list(method =
 "cosine")),
 IBCF_cor = list(name = "IBCF", param = list(method =
 "pearson")),
 UBCF_cos = list(name = "UBCF", param = list(method =
 "cosine")),
 UBCF_cor = list(name = "UBCF", param = list(method =
 "pearson")),
 RANDOM = list(name = "RANDOM", param = NULL)
)

# number of recommendations
nRecs <- c(1, seq(5, 20, 5))

finalResults <- evaluate(x = evalScheme, method = evalModels, n = nRecs)

IBCF run fold/sample [model time/prediction time]
     1  [0.42sec/0.38sec] 
     2  [0.28sec/0.26sec] 
     3  [0.22sec/0.26sec] 
IBCF run fold/sample [model time/prediction time]
     1  [0.23sec/0.29sec] 
     2  [0.27sec/0.27sec] 
     3  [0.26sec/0.27sec] 
UBCF run fold/sample [model time/prediction time]
     1  [0.01sec/8.95sec] 
     2  [0.03sec/9.08sec] 
     3  [0.02sec/9.03sec] 
UBCF run fold/sample [model time/prediction time]
     1  [0.03sec/7.64sec] 
     2  [0.02sec/7.59sec] 
     3  [0.02sec/7.51sec] 
RANDOM run fold/sample [model time/prediction time]
     1  [0sec/0.25sec] 
     2  [0.01sec/0.23sec] 
     3  [0sec/0.22sec]

Comparisons

We can look at the average results across all folds for each algorithm. Each row represents a different number of recommendations. We can see that on average, as the number of recommendations increases, so does our accuracy.

avgs <- lapply(finalResults, avg)
avgs

$IBCF_cos
            TP         FP       FN       TN       N precision     recall
[1,] 0.3271383  0.6728617 21.59373 46.40907 69.0028 0.3271383 0.01318181
[2,] 1.8848921  3.1151079 20.03597 43.96683 69.0028 0.3769784 0.08060305
[3,] 4.1486811  5.8513189 17.77218 41.23062 69.0028 0.4148681 0.18438920
[4,] 6.6227018  8.3772982 15.29816 38.70464 69.0028 0.4415135 0.30244992
[5,] 9.0189848 10.9810152 12.90188 36.10092 69.0028 0.4509492 0.41644494
            TPR        FPR  n
[1,] 0.01318181 0.01349611  1
[2,] 0.08060305 0.06261160  5
[3,] 0.18438920 0.11819070 10
[4,] 0.30244992 0.17088881 15
[5,] 0.41644494 0.22700535 20

$IBCF_cor
            TP         FP       FN       TN       N precision     recall
[1,] 0.3531175  0.6468825 21.56775 46.43505 69.0028 0.3531175 0.01485068
[2,] 2.0119904  2.9880096 19.90887 44.09392 69.0028 0.4023981 0.08916292
[3,] 4.3111511  5.6888489 17.60971 41.39309 69.0028 0.4311151 0.19677503
[4,] 6.6145084  8.3854916 15.30635 38.69644 69.0028 0.4409672 0.30444612
[5,] 8.8307354 11.1692646 13.09013 35.91267 69.0028 0.4415368 0.40581214
            TPR        FPR  n
[1,] 0.01485068 0.01276791  1
[2,] 0.08916292 0.05989511  5
[3,] 0.19677503 0.11578899 10
[4,] 0.30444612 0.17241219 15
[5,] 0.40581214 0.23221022 20

$UBCF_cos
            TP         FP       FN       TN       N precision     recall
[1,] 0.3377298  0.6622702 21.58313 46.41966 69.0028 0.3377298 0.01523079
[2,] 2.1658673  2.8341327 19.75500 44.24780 69.0028 0.4331735 0.10552667
[3,] 4.6552758  5.3447242 17.26559 41.73721 69.0028 0.4655276 0.22621144
[4,] 7.0531575  7.9468425 14.86771 39.13509 69.0028 0.4702105 0.34037986
[5,] 9.2729816 10.7270184 12.64788 36.35492 69.0028 0.4636491 0.44420610
            TPR        FPR  n
[1,] 0.01523079 0.01394790  1
[2,] 0.10552667 0.05961588  5
[3,] 0.22621144 0.11236298 10
[4,] 0.34037986 0.16784196 15
[5,] 0.44420610 0.22805323 20

$UBCF_cor
            TP         FP       FN       TN       N precision     recall
[1,] 0.3485212  0.6514788 21.57234 46.43046 69.0028 0.3485212 0.01625357
[2,] 2.2064349  2.7935651 19.71443 44.28837 69.0028 0.4412870 0.10815016
[3,] 4.7340128  5.2659872 17.18685 41.81595 69.0028 0.4734013 0.23239351
[4,] 7.1518785  7.8481215 14.76898 39.23381 69.0028 0.4767919 0.34711634
[5,] 9.3699041 10.6300959 12.55096 36.45184 69.0028 0.4684952 0.44972118
            TPR        FPR  n
[1,] 0.01625357 0.01393754  1
[2,] 0.10815016 0.05935082  5
[3,] 0.23239351 0.11158681 10
[4,] 0.34711634 0.16634573 15
[5,] 0.44972118 0.22633686 20

$RANDOM
            TP         FP       FN       TN       N precision     recall
[1,] 0.3379297  0.6620703 21.58293 46.41986 69.0028 0.3379297 0.01500125
[2,] 1.6876499  3.3123501 20.23321 43.76958 69.0028 0.3375300 0.07423988
[3,] 3.3529177  6.6470823 18.56795 40.43485 69.0028 0.3352918 0.14762206
[4,] 5.0197842  9.9802158 16.90108 37.10172 69.0028 0.3346523 0.22161902
[5,] 6.7158273 13.2841727 15.20504 33.79776 69.0028 0.3357914 0.29627752
            TPR        FPR  n
[1,] 0.01500125 0.01464540  1
[2,] 0.07423988 0.07395019  5
[3,] 0.14762206 0.14877515 10
[4,] 0.22161902 0.22355126 15
[5,] 0.29627752 0.29750875 20

We can also visualize the ROC curves for each of the algorithms we’ve run. Each marker on the graph represents the TP/FP ratio for n recommendations. The plot shows higher performance from the UBCF models.

plot(finalResults, annotate = 1, legend = "topleft")
title("ROC curve")

The goal of our recommender system is to provide jokes that are funny to a user, so we want to minimize the number of false positives (recommendations that are wrong). We will therefore choose the algorithm with the highest precision (true positive rate). Once again, the UBCF models outperform the others.

plot(finalResults, "prec/rec", annotate = 1, legend = "bottomright")
title("Precision-recall")

Model Tuning

Based on this analysis, we will choose the UBCF model with Pearson similarity and 5 recommendations. We can further tune nn parameter for the the model.

numNeighbors <- seq(100,200,50)

ubcfModels <- lapply(numNeighbors, function(n){
 list(name = "UBCF", param = list(method = "pearson", nn = n))
})
names(ubcfModels) <- paste0("UBCF_", numNeighbors)

ubcfSchemes <- evaluate(x = evalScheme, method = ubcfModels, n = 5)

UBCF run fold/sample [model time/prediction time]
     1  [0.02sec/8.93sec] 
     2  [0.03sec/8.27sec] 
     3  [0.03sec/8.61sec] 
UBCF run fold/sample [model time/prediction time]
     1  [0.03sec/9.84sec] 
     2  [0.21sec/11.27sec] 
     3  [0.01sec/11.24sec] 
UBCF run fold/sample [model time/prediction time]
     1  [0.03sec/9.98sec] 
     2  [0.03sec/9.45sec] 
     3  [0.04sec/9.26sec]

We’ll pick the model with the best precision, which is 200 neighbors:

avg(ubcfSchemes)

$UBCF_100
           TP       FP       FN       TN       N precision    recall       TPR
[1,] 2.992006 2.007994 18.92886 45.07394 69.0028 0.5984013 0.1555668 0.1555668
          FPR n
[1,] 0.041739 5

$UBCF_150
           TP       FP       FN       TN       N precision    recall       TPR
[1,] 3.110512 1.889488 18.81035 45.19245 69.0028 0.6221023 0.1637228 0.1637228
            FPR n
[1,] 0.03935922 5

$UBCF_200
           TP       FP      FN       TN       N precision    recall       TPR
[1,] 3.176259 1.823741 18.7446 45.25819 69.0028 0.6352518 0.1680843 0.1680843
            FPR n
[1,] 0.03790321 5

Final Model

Now, we can define our final model and calculate the precision and RMSE:

set.seed(200)

# UBCF Model
ubcfRec <- Recommender(getData(evalScheme, "train"), 'UBCF', parameter = list(method = 'pearson', nn = 200, normalize = 'center'))
ubcfPredN <- predict(ubcfRec, getData(evalScheme, "known"), n = 5)
ubcfPredR <- predict(ubcfRec,getData(evalScheme,'known'), type = 'ratings')

# calc accuracy on test set
ubcfAccN <- calcPredictionAccuracy(ubcfPredN, 
                                   getData(evalScheme, "unknown"), 
                                   given = toKeep, 
                                   goodRating = ratingThreshold)

ubcfAccR <- calcPredictionAccuracy(ubcfPredR, getData(evalScheme, "unknown"))

ubcfAccN

         TP          FP          FN          TN           N   precision 
 3.13309353  1.86690647 18.81055156 45.18944844 69.00000000  0.62661871 
     recall         TPR         FPR 
 0.16670923  0.16670923  0.03730136

ubcfAccR

     RMSE       MSE       MAE 
 4.220717 17.814451  3.329481

Introduce Novelty

In order to introduce novelty to the recommendations, we’ll take a percentage of the top recommendations from the UBCF model and switch the recommendations out with randomly selected jokes. To do this, we’ll define a recommendation system using the RANDOM methodology and then create a hybrid recommender that combines it with the UBCF model.

# Random Model
randRec <- Recommender(getData(evalScheme, "train"), 'RANDOM')

hybridRec <- HybridRecommender(ubcfRec,randRec, weights = c(0.8,0.2))
hybridPredN <- predict(hybridRec, getData(evalScheme, "known"), n = 5)
hybridPredR <- predict(hybridRec, getData(evalScheme,'known'), type = 'ratings')

# calc accuracy on test set
hybridAccN <- calcPredictionAccuracy(hybridPredN, 
                                     getData(evalScheme, "unknown"), 
                                     given = toKeep, 
                                     goodRating = ratingThreshold)
hybridAccR <- calcPredictionAccuracy(hybridPredR, getData(evalScheme, "unknown"))

hybridAccN

         TP          FP          FN          TN           N   precision 
 2.75599520  2.24400480 19.18764988 44.81235012 69.00000000  0.55119904 
     recall         TPR         FPR 
 0.13903076  0.13903076  0.04504358

hybridAccR

     RMSE       MSE       MAE 
 4.387298 19.248386  3.537368

Compare Results

We can compare the results of the UBCF-only model vs the hybrid model. The precision is worse in the hybrid model and the RMSE is higher.

data.frame(METHOD = c('UBCF','HYBRID'), 
           PRECISION = c(ubcfAccN['precision'], hybridAccN['precision']),
           RMSE = c(ubcfAccR['RMSE'], hybridAccR['RMSE']))

  METHOD PRECISION     RMSE
1   UBCF 0.6266187 4.220717
2 HYBRID 0.5511990 4.387298

We can also take a look at a comparison of the top 5 suggestions for one of the users. From this, we can see two things:

The ordering has changed in the recommendations.

There are new items in the hybrid system, which represent the random recommendations.

hybridPredN@items[2]

$`1`
[1] 29 21 89 31  3

ubcfPredN@items[2]

$`1`
[1] 27 21 72 61 49

Discussion

In this project, we evaluated several recommender system algorithms using offline metrics such as precision and RMSE to assess accuracy and effectiveness. These metrics guided our selection of User-Based Collaborative Filtering (UBCF) with Pearson similarity as a strong baseline model. To support user experience goals like novelty and diversity, we introduced a Hybrid Recommender that blends UBCF with a Random model.

In the final phase, we narrowed our focus to these two models—UBCF and the Hybrid—and concentrated on precision as our primary evaluation metric, as it most directly reflects the relevance of top-N recommendations. This focused comparison allowed us to assess the trade-off between pure accuracy and the added value of introducing novelty.

While offline evaluation provides meaningful insights, it cannot fully capture how users engage with recommendations in real-world applications. For a more complete understanding, online metrics such as click-through rate (CTR), dwell time, and user retention are crucial. An ideal future evaluation environment would use A/B testing with tools like Google Optimize or Firebase, and user behavior tracking with platforms like Mixpanel or Amplitude. This would enable a more realistic, data-driven refinement of recommendation strategies tailored to user satisfaction and business impact.

DATA 612 - Project 4 : Accuracy and Beyond

Bikash Bhowmik, Rupendra Shrestha

29 Jun 2025

Column

Column