Project4_Accuracy-and-Beyond.knit

Column

Instruction

The goal of this assignment is give you practice working with accuracy and other recommender system metrics.In this assignment you’re asked to do at least one or (if you like) both of the following:

Work in a small group, and/or Choose a different dataset to work with from your previous projects. Deliverables As in your previous assignments, compare the accuracy of at least two recommender system algorithms against your offline data.

Implement support for at least one business or user experience goal such as increased serendipity, novelty, or diversity.

Compare and report on any change in accuracy before and after you’ve made the change in #2.

As part of your textual conclusion, discuss one or more additional experiments that could be performed and/or metrics that could be evaluated only if online evaluation was possible. Also, briefly propose how you would design a reasonable online evaluation environment.

Introduction

The project at hand is to build a recommender system that can provide users with personalized joke recommendations. In addition to optimizing for recommendations that are accurate, the recommender also focuses on maximizing user experience by adding some level of serendipity to the recommendations - meaning users will receive unexpected items with the potential to be enjoyable within their most highly recommended items.

Initially, we investigate the performance of five recommendation algorithms:

User-Based Collaborative Filtering (UBCF) using cosine and Pearson similarity
Item-Based Collaborative Filtering (IBCF) using cosine and Pearson similarity
A Random recommender algorithm (a baseline)

For each method, we conduct a 3-fold cross-validation to evaluate precision and performance. After we establish the best performing algorithm, we promote novel items in the recommendation list by replacing a selection of the most recommended items with more novel items. In this approach we try to balance accurate recommendations with user engagement and discovery.

Load Data

We use the Jester dataset, which includes joke ratings from 24,983 users. Each user has rated at least 36 jokes, with rating values ranging from -10.00 (least funny) to +10.00 (most funny). A rating of 99 indicates that a joke was not rated by the user. In the dataset, each row corresponds to a single user: the first column records the total number of jokes they rated, while the next 100 columns contain their individual ratings for each joke.

# Load the jester dataset
library(readxl)

# Read the Excel file (first sheet)
jester <- data.frame(read_xls("jester-data-1.xls", col_names = FALSE))

# Assign column names: first column is rating count, followed by 100 joke ratings
##colnames(jester) <- c("ratingCount", paste0("Joke_", 1:100))
##row.names(jester) <- 1:nrow(jester)

# Display the structure and first few rows of the data
# str(jester)
head(jester[, 1:6])  # Show the first 5 columns (rating count + 5 jokes)

  ...1  ...2  ...3  ...4  ...5  ...6
1   74 -7.82  8.79 -9.66 -8.16 -7.52
2  100  4.08 -0.29  6.36  4.37 -2.38
3   49 99.00 99.00 99.00 99.00  9.03
4   48 99.00  8.35 99.00 99.00  1.80
5   91  8.50  4.61 -4.17 -5.39  1.36
6  100 -6.17 -3.54  0.44 -8.50 -7.09

# Basic summary statistics for joke ratings
summary(jester[, 2:6])

      ...2            ...3            ...4             ...5      
 Min.   :-9.95   Min.   :-9.95   Min.   :-9.950   Min.   :-9.95  
 1st Qu.:-0.15   1st Qu.:-2.18   1st Qu.:-1.145   1st Qu.:-3.35  
 Median : 5.05   Median : 3.69   Median : 4.660   Median : 3.45  
 Mean   :34.40   Mean   :27.09   Mean   :33.264   Mean   :36.24  
 3rd Qu.:99.00   3rd Qu.:99.00   3rd Qu.:99.000   3rd Qu.:99.00  
 Max.   :99.00   Max.   :99.00   Max.   :99.000   Max.   :99.00  
      ...6        
 Min.   :-9.9500  
 1st Qu.:-3.6900  
 Median : 1.1200  
 Mean   : 0.4272  
 3rd Qu.: 4.4700  
 Max.   :99.0000

This gives a quick sense of how the data is structured and how ratings are distributed across the first few jokes. It also confirms that the dataset has the correct dimensions and types before moving on to data preprocessing.

Data Cleansing

First, we will load in the jester data set. We will remove the column with the number of rated jokes because this will not be used in the recommendation system. Additionally, the raw data represents non-rated jokes as the number 99, so we will replace these values with nulls. Finally, we will subset the data to 5,000 users to speed up computation time.

# Read jester data
##jester <- data.frame(read_xls("jester-data-1.xls", col_names = FALSE))
colnames(jester) <- c("ratingCount", seq(100))
row.names(jester) <- 1:nrow(jester)

# remove num jokes column
ratings <- jester[-1]

# replace 0 (no rating) with NULL
ratings[ratings == 99] <- NA
ratings <- ratings[1:5000,]
ratings <- as.matrix(ratings)

# Create large dgCMatrix
finalRatings <- as(ratings, 'realRatingMatrix')

Data Exploration

Let’s dive a little deeper into our data.

First, let’s take a look at the number of jokes that each user has rated. We set the threshold at 36 jokes, and it appears that most individuals have rated either around 70 or 100 jokes.

jokeCount <- rowCounts(finalRatings)
hist(jokeCount,
     main = 'Number of Jokes Rated per User',
     xlab = 'Number of Jokes Rated',
     ylab = 'Number of Users')

Next, we can look at the number of ratings that each joke has. We can see that many of the jokes were rated by all 5000 users.

ratingCount <- colCounts(finalRatings)
hist(ratingCount,
     main = 'Number of Individuals Rating each Joke',
     xlab = 'Number of Users that Rated Joke',
     ylab = 'Number of Jokes')

Now we can take a look at the average rating across all jokes. The median average rating is a little over 0 (neutral), which means that jokes are typically rated with a small positive skew.

# average rating
mean_rating <- colMeans(finalRatings, na.rm = T)
quantile(mean_rating)

        0%        25%        50%        75%       100% 
-3.8296705 -0.3050142  0.8847812  1.8412310  4.0056984

We can also plot the average ratings. A look at the distribution shows that most jokes have an average rating between -2 and 3. We have a few outliers that are rated more positively (+4) and more negatively (-4).

goodrating <- quantile(mean_rating, .5)
qplot(mean_rating) + ggtitle("Distribution of Average Joke Rating") + geom_vline(xintercept = goodrating, col='red')

From this data exploration, we can see that most users rated all jokes and conversely, most jokes were rated by all users. The median average rating for the users is a little over 0, so we will use 1 as our threshold for a good joke.

Genre Exploration

### Genre Exploration 

library(readxl)
library(ggplot2)

# Load the Jester dataset
jester <- read_excel("jester-data-1.xls", col_names = FALSE)

# Remove first column (rating count) and replace 99s with NA
ratings <- jester[ , -1]
ratings[ratings == 99] <- NA

# Summary of ratings
summary(ratings)

      ...2              ...3              ...4              ...5       
 Min.   :-9.9500   Min.   :-9.9500   Min.   :-9.9500   Min.   :-9.950  
 1st Qu.:-2.6700   1st Qu.:-4.4200   1st Qu.:-4.0300   1st Qu.:-5.870  
 Median : 1.6000   Median : 0.8300   Median : 0.7800   Median :-1.460  
 Mean   : 0.9046   Mean   : 0.2085   Mean   : 0.3166   Mean   :-1.449  
 3rd Qu.: 4.9000   3rd Qu.: 4.8200   3rd Qu.: 4.6600   3rd Qu.: 2.520  
 Max.   : 9.3700   Max.   : 9.3700   Max.   : 9.3700   Max.   : 9.370  
 NA's   :8531      NA's   :6799      NA's   :8341      NA's   :9375    
      ...6              ...7             ...8              ...9        
 Min.   :-9.9500   Min.   :-9.950   Min.   :-9.9500   Min.   :-9.9500  
 1st Qu.:-3.6900   1st Qu.:-1.310   1st Qu.:-5.2900   1st Qu.:-4.6100  
 Median : 1.1200   Median : 2.230   Median :-0.0500   Median :-0.2900  
 Mean   : 0.4036   Mean   : 1.594   Mean   :-0.4297   Mean   :-0.6241  
 3rd Qu.: 4.4700   3rd Qu.: 5.440   3rd Qu.: 3.8800   3rd Qu.: 3.0600  
 Max.   : 9.3700   Max.   : 9.470   Max.   : 9.6100   Max.   : 9.7600  
 NA's   :6         NA's   :4552     NA's   :7         NA's   :2        
     ...10            ...11            ...12            ...13       
 Min.   :-9.950   Min.   :-9.950   Min.   :-9.950   Min.   :-9.950  
 1st Qu.:-4.760   1st Qu.:-2.090   1st Qu.:-1.210   1st Qu.:-1.650  
 Median :-0.190   Median : 1.940   Median : 2.430   Median : 2.040  
 Mean   :-0.557   Mean   : 1.301   Mean   : 1.806   Mean   : 1.449  
 3rd Qu.: 3.300   3rd Qu.: 5.390   3rd Qu.: 5.780   3rd Qu.: 5.290  
 Max.   : 9.900   Max.   : 9.370   Max.   : 9.610   Max.   : 9.850  
 NA's   :9231     NA's   :4659     NA's   :3162     NA's   :2560    
     ...14            ...15            ...16            ...17       
 Min.   :-9.950   Min.   :-9.950   Min.   :-9.950   Min.   :-9.950  
 1st Qu.:-6.260   1st Qu.:-2.090   1st Qu.:-6.210   1st Qu.:-7.430  
 Median :-1.700   Median : 2.090   Median :-1.750   Median :-3.790  
 Mean   :-1.763   Mean   : 1.396   Mean   :-1.706   Mean   :-3.105  
 3rd Qu.: 2.280   3rd Qu.: 5.555   3rd Qu.: 2.330   3rd Qu.: 0.580  
 Max.   : 9.370   Max.   : 9.560   Max.   : 9.900   Max.   : 9.370  
 NA's   :2        NA's   :2468     NA's   :6        NA's   :8       
     ...18            ...19             ...20             ...21       
 Min.   :-9.950   Min.   :-9.9500   Min.   :-9.9500   Min.   :-9.950  
 1st Qu.:-4.420   1st Qu.:-4.4200   1st Qu.:-3.4500   1st Qu.:-5.050  
 Median :-0.630   Median :-0.2900   Median : 0.5300   Median :-0.630  
 Mean   :-1.115   Mean   :-0.6351   Mean   : 0.1542   Mean   :-0.934  
 3rd Qu.: 1.940   3rd Qu.: 3.0100   3rd Qu.: 3.7400   3rd Qu.: 2.910  
 Max.   : 9.370   Max.   : 9.8100   Max.   : 9.5600   Max.   : 9.370  
 NA's   :3        NA's   :4         NA's   :5         NA's   :3       
     ...22            ...23             ...24             ...25       
 Min.   :-9.950   Min.   :-9.9500   Min.   :-9.9500   Min.   :-9.950  
 1st Qu.:-0.780   1st Qu.:-2.8200   1st Qu.:-3.7900   1st Qu.:-6.210  
 Median : 2.820   Median : 1.4100   Median : 0.3400   Median :-1.750  
 Mean   : 2.096   Mean   : 0.8687   Mean   : 0.1092   Mean   :-1.687  
 3rd Qu.: 6.020   3rd Qu.: 4.9000   3rd Qu.: 3.9800   3rd Qu.: 2.230  
 Max.   : 9.560   Max.   : 9.8500   Max.   : 9.8500   Max.   : 9.810  
 NA's   :62       NA's   :3550      NA's   :4950      NA's   :9104    
     ...26             ...27            ...28           ...29       
 Min.   :-9.9500   Min.   :-9.950   Min.   :-9.95   Min.   :-9.950  
 1st Qu.:-3.3500   1st Qu.:-1.840   1st Qu.: 0.68   1st Qu.:-1.650  
 Median : 0.7800   Median : 1.840   Median : 4.13   Median : 2.065  
 Mean   : 0.3824   Mean   : 1.286   Mean   : 3.19   Mean   : 1.519  
 3rd Qu.: 4.2700   3rd Qu.: 5.050   3rd Qu.: 6.75   3rd Qu.: 5.530  
 Max.   : 9.3700   Max.   : 9.370   Max.   : 9.42   Max.   : 9.420  
 NA's   :4167      NA's   :1233     NA's   :79      NA's   :993     
     ...30            ...31             ...32            ...33       
 Min.   :-9.950   Min.   :-9.9500   Min.   :-9.950   Min.   :-9.950  
 1st Qu.: 0.340   1st Qu.:-4.1700   1st Qu.:-0.630   1st Qu.: 0.630  
 Median : 3.980   Median :-0.2900   Median : 3.010   Median : 3.930  
 Mean   : 2.972   Mean   :-0.4365   Mean   : 2.199   Mean   : 3.156  
 3rd Qu.: 6.750   3rd Qu.: 3.1600   3rd Qu.: 6.120   3rd Qu.: 6.600  
 Max.   : 9.370   Max.   : 9.3700   Max.   : 9.370   Max.   : 9.900  
 NA's   :49       NA's   :7006      NA's   :327      NA's   :35      
     ...34            ...35             ...36            ...37       
 Min.   :-9.950   Min.   :-9.9500   Min.   :-9.950   Min.   :-9.950  
 1st Qu.:-5.920   1st Qu.:-2.7700   1st Qu.: 0.390   1st Qu.: 0.970  
 Median :-1.260   Median : 1.2100   Median : 3.830   Median : 4.030  
 Mean   :-1.369   Mean   : 0.8595   Mean   : 3.006   Mean   : 3.313  
 3rd Qu.: 2.620   3rd Qu.: 4.9000   3rd Qu.: 6.600   3rd Qu.: 6.600  
 Max.   : 9.370   Max.   : 9.3700   Max.   : 9.900   Max.   : 9.850  
 NA's   :8279     NA's   :3420      NA's   :31       NA's   :18      
     ...38            ...39           ...40            ...41        
 Min.   :-9.950   Min.   :-9.95   Min.   :-9.950   Min.   :-9.9500  
 1st Qu.:-5.870   1st Qu.:-2.14   1st Qu.:-2.140   1st Qu.:-2.5200  
 Median :-1.260   Median : 1.89   Median : 1.460   Median : 1.4600  
 Mean   :-1.364   Mean   : 1.30   Mean   : 1.067   Mean   : 0.9986  
 3rd Qu.: 2.620   3rd Qu.: 5.29   3rd Qu.: 4.810   3rd Qu.: 5.0000  
 Max.   : 9.370   Max.   : 9.56   Max.   : 9.760   Max.   : 9.3700  
 NA's   :8102     NA's   :2125    NA's   :1903     NA's   :2585     
     ...42             ...43            ...44             ...45       
 Min.   :-9.9500   Min.   :-9.950   Min.   :-9.9500   Min.   :-9.950  
 1st Qu.:-4.9000   1st Qu.:-0.780   1st Qu.:-5.3150   1st Qu.:-6.800  
 Median :-0.0500   Median : 2.620   Median :-0.5800   Median :-2.430  
 Mean   :-0.3205   Mean   : 1.967   Mean   :-0.9249   Mean   :-2.112  
 3rd Qu.: 4.0800   3rd Qu.: 5.830   3rd Qu.: 3.0600   3rd Qu.: 1.890  
 Max.   : 9.3700   Max.   : 9.850   Max.   : 9.4200   Max.   : 9.370  
 NA's   :6216      NA's   :414      NA's   :7472      NA's   :8806    
     ...46            ...47           ...48           ...49       
 Min.   :-9.950   Min.   :-9.95   Min.   :-9.95   Min.   :-9.950  
 1st Qu.:-2.040   1st Qu.:-1.70   1st Qu.:-1.36   1st Qu.:-1.070  
 Median : 1.650   Median : 2.28   Median : 2.14   Median : 2.520  
 Mean   : 1.109   Mean   : 1.50   Mean   : 1.56   Mean   : 1.826  
 3rd Qu.: 4.810   3rd Qu.: 5.53   3rd Qu.: 5.49   3rd Qu.: 5.780  
 Max.   :10.000   Max.   : 9.66   Max.   : 9.81   Max.   : 9.370  
 NA's   :3658     NA's   :1358    NA's   :2742    NA's   :216     
     ...50            ...51            ...52             ...53        
 Min.   :-9.950   Min.   :-9.950   Min.   :-9.9500   Min.   :-9.9500  
 1st Qu.: 0.100   1st Qu.: 1.550   1st Qu.:-5.2900   1st Qu.:-4.1300  
 Median : 3.450   Median : 4.470   Median :-0.5300   Median : 0.1000  
 Mean   : 2.771   Mean   : 3.665   Mean   :-0.7317   Mean   :-0.1331  
 3rd Qu.: 6.360   3rd Qu.: 6.840   3rd Qu.: 3.5400   3rd Qu.: 3.8300  
 Max.   : 9.900   Max.   : 9.950   Max.   : 9.3700   Max.   : 9.6600  
 NA's   :26       NA's   :11       NA's   :6091      NA's   :4919     
     ...54            ...55            ...56             ...57       
 Min.   :-9.950   Min.   :-9.950   Min.   :-9.9500   Min.   :-9.950  
 1st Qu.: 0.390   1st Qu.:-0.150   1st Qu.:-3.3500   1st Qu.:-1.120  
 Median : 3.830   Median : 3.540   Median : 1.0200   Median : 2.430  
 Mean   : 2.939   Mean   : 2.736   Mean   : 0.5057   Mean   : 1.765  
 3rd Qu.: 6.550   3rd Qu.: 6.600   3rd Qu.: 4.5600   3rd Qu.: 5.580  
 Max.   : 9.710   Max.   : 9.510   Max.   : 9.9500   Max.   : 9.370  
 NA's   :12       NA's   :292      NA's   :5256      NA's   :243     
     ...58            ...59            ...60            ...61        
 Min.   :-9.950   Min.   :-9.950   Min.   :-9.950   Min.   :-9.9500  
 1st Qu.:-6.800   1st Qu.:-8.160   1st Qu.:-4.320   1st Qu.:-4.8500  
 Median :-2.230   Median :-4.950   Median :-0.290   Median :-0.1000  
 Mean   :-1.991   Mean   :-3.834   Mean   :-0.578   Mean   :-0.3254  
 3rd Qu.: 2.180   3rd Qu.:-0.340   3rd Qu.: 3.010   3rd Qu.: 3.9800  
 Max.   : 9.370   Max.   : 9.420   Max.   : 9.370   Max.   : 9.3700  
 NA's   :9010     NA's   :9430     NA's   :6848     NA's   :7022     
     ...62            ...63            ...64            ...65        
 Min.   :-9.950   Min.   :-9.950   Min.   :-9.950   Min.   :-9.9500  
 1st Qu.:-0.290   1st Qu.: 0.440   1st Qu.:-3.640   1st Qu.:-4.8100  
 Median : 3.200   Median : 3.740   Median : 0.530   Median :-0.3900  
 Mean   : 2.459   Mean   : 2.976   Mean   : 0.247   Mean   :-0.6679  
 3rd Qu.: 6.170   3rd Qu.: 6.500   3rd Qu.: 4.220   3rd Qu.: 3.2500  
 Max.   : 9.370   Max.   : 9.850   Max.   : 9.470   Max.   : 9.3700  
 NA's   :178      NA's   :47       NA's   :4819     NA's   :7707     
     ...66            ...67            ...68             ...69       
 Min.   :-9.950   Min.   :-9.950   Min.   :-9.9500   Min.   :-9.950  
 1st Qu.:-0.780   1st Qu.:-0.240   1st Qu.:-5.1000   1st Qu.:-0.100  
 Median : 3.250   Median : 3.250   Median :-0.5300   Median : 3.350  
 Mean   : 2.274   Mean   : 2.542   Mean   :-0.8639   Mean   : 2.639  
 3rd Qu.: 6.460   3rd Qu.: 6.260   3rd Qu.: 3.0600   3rd Qu.: 6.260  
 Max.   : 9.850   Max.   :10.000   Max.   : 9.3700   Max.   : 9.710  
 NA's   :255      NA's   :60       NA's   :7490      NA's   :43      
     ...70            ...71             ...72             ...73       
 Min.   :-9.950   Min.   :-9.9500   Min.   :-9.9500   Min.   :-9.950  
 1st Qu.:-0.100   1st Qu.:-2.9100   1st Qu.:-6.0200   1st Qu.: 0.190  
 Median : 3.350   Median : 0.7300   Median :-0.5300   Median : 3.400  
 Mean   : 2.559   Mean   : 0.4184   Mean   :-0.6875   Mean   : 2.757  
 3rd Qu.: 6.170   3rd Qu.: 3.8800   3rd Qu.: 4.2200   3rd Qu.: 6.310  
 Max.   : 9.760   Max.   : 9.3700   Max.   : 9.3700   Max.   : 9.370  
 NA's   :67       NA's   :4761      NA's   :16451     NA's   :16290   
     ...74            ...75            ...76             ...77      
 Min.   :-9.950   Min.   :-9.950   Min.   :-9.9500   Min.   :-9.95  
 1st Qu.:-2.520   1st Qu.:-5.780   1st Qu.:-4.7100   1st Qu.:-0.19  
 Median : 1.500   Median :-1.650   Median :-0.0500   Median : 3.25  
 Mean   : 1.083   Mean   :-1.554   Mean   :-0.2707   Mean   : 2.52  
 3rd Qu.: 5.390   3rd Qu.: 2.140   3rd Qu.: 4.0300   3rd Qu.: 6.21  
 Max.   : 9.370   Max.   : 9.370   Max.   : 9.3700   Max.   : 9.37  
 NA's   :16367    NA's   :16222    NA's   :16190     NA's   :16123  
     ...78             ...79            ...80             ...81       
 Min.   :-9.9500   Min.   :-9.950   Min.   :-9.9500   Min.   :-9.950  
 1st Qu.:-3.0600   1st Qu.:-1.170   1st Qu.:-4.1700   1st Qu.:-3.060  
 Median : 1.0700   Median : 2.430   Median : 0.3400   Median : 1.890  
 Mean   : 0.7154   Mean   : 1.752   Mean   : 0.0917   Mean   : 1.122  
 3rd Qu.: 4.8100   3rd Qu.: 5.730   3rd Qu.: 4.3200   3rd Qu.: 5.830  
 Max.   : 9.3700   Max.   : 9.370   Max.   : 9.3700   Max.   : 9.810  
 NA's   :16039     NA's   :15962    NA's   :15909     NA's   :15959   
     ...82            ...83             ...84            ...85        
 Min.   :-9.950   Min.   :-9.9500   Min.   :-9.950   Min.   :-9.9500  
 1st Qu.:-1.840   1st Qu.:-2.7700   1st Qu.:-1.120   1st Qu.:-2.9600  
 Median : 2.820   Median : 1.3600   Median : 2.820   Median : 1.0700  
 Mean   : 1.837   Mean   : 0.9027   Mean   : 2.028   Mean   : 0.6944  
 3rd Qu.: 6.260   3rd Qu.: 4.9500   3rd Qu.: 6.210   3rd Qu.: 4.7600  
 Max.   : 9.370   Max.   : 9.3700   Max.   : 9.370   Max.   : 9.3700  
 NA's   :15760    NA's   :15859     NA's   :15697    NA's   :15686    
     ...86             ...87             ...88            ...89      
 Min.   :-9.9500   Min.   :-9.9500   Min.   :-9.950   Min.   :-9.95  
 1st Qu.:-2.9600   1st Qu.:-3.5000   1st Qu.:-0.920   1st Qu.:-0.49  
 Median : 1.5000   Median : 0.5800   Median : 2.480   Median : 2.72  
 Mean   : 0.9189   Mean   : 0.3447   Mean   : 1.901   Mean   : 2.11  
 3rd Qu.: 5.2400   3rd Qu.: 4.2700   3rd Qu.: 5.680   3rd Qu.: 5.78  
 Max.   : 9.3700   Max.   : 9.3700   Max.   : 9.370   Max.   : 9.47  
 NA's   :15615     NA's   :15514     NA's   :15531    NA's   :15379  
     ...90            ...91            ...92             ...93      
 Min.   :-9.950   Min.   :-9.950   Min.   :-9.9500   Min.   :-9.95  
 1st Qu.: 0.970   1st Qu.:-3.450   1st Qu.:-0.9325   1st Qu.:-2.38  
 Median : 4.760   Median : 1.070   Median : 2.8600   Median : 1.80  
 Mean   : 3.575   Mean   : 0.612   Mean   : 2.0385   Mean   : 1.18  
 3rd Qu.: 7.430   3rd Qu.: 4.810   3rd Qu.: 6.1200   3rd Qu.: 5.29  
 Max.   : 9.420   Max.   : 9.370   Max.   : 9.5600   Max.   : 9.42  
 NA's   :15419    NA's   :15255    NA's   :15311     NA's   :15094  
     ...94            ...95            ...96            ...97       
 Min.   :-9.950   Min.   :-9.950   Min.   :-9.950   Min.   :-9.950  
 1st Qu.:-0.100   1st Qu.:-2.280   1st Qu.:-2.570   1st Qu.:-1.347  
 Median : 3.160   Median : 1.750   Median : 1.650   Median : 2.140  
 Mean   : 2.504   Mean   : 1.173   Mean   : 1.039   Mean   : 1.528  
 3rd Qu.: 6.070   3rd Qu.: 5.290   3rd Qu.: 5.100   3rd Qu.: 5.277  
 Max.   : 9.370   Max.   : 9.420   Max.   : 9.370   Max.   : 9.370  
 NA's   :14995    NA's   :14783    NA's   :14718    NA's   :14613   
     ...98            ...99             ...100            ...101      
 Min.   :-9.950   Min.   :-9.9500   Min.   :-9.9500   Min.   :-9.950  
 1st Qu.:-1.170   1st Qu.:-3.4000   1st Qu.:-4.1700   1st Qu.:-2.330  
 Median : 2.480   Median : 1.4600   Median : 0.2900   Median : 2.140  
 Mean   : 1.674   Mean   : 0.7677   Mean   :-0.0312   Mean   : 1.355  
 3rd Qu.: 5.630   3rd Qu.: 5.1500   3rd Qu.: 3.9800   3rd Qu.: 5.730  
 Max.   : 9.370   Max.   : 9.3700   Max.   : 9.3700   Max.   : 9.370  
 NA's   :14463    NA's   :14317     NA's   :14073     NA's   :15040

# Histogram: how many jokes each user rated
hist(rowSums(!is.na(ratings)),
     main = "Number of Jokes Rated per User",
     xlab = "Jokes Rated", ylab = "Users")

# Histogram: how many users rated each joke
hist(colSums(!is.na(ratings)),
     main = "Number of Users per Joke",
     xlab = "Users per Joke", ylab = "Jokes")

# Histogram: average joke ratings
avg_joke_rating <- colMeans(ratings, na.rm = TRUE)
hist(avg_joke_rating,
     main = "Average Rating per Joke",
     xlab = "Avg Rating", ylab = "Jokes")

We explored the distribution of movie genres in the dataset. This helps evaluate whether the recommender covers a wide range of genres or mostly sticks to popular ones.

Initial Recommender Systems

Parameters

We’ll define the following:

Training Percent: The percent of the data that should be used in training. The remaining data will be used for testing.
Items To Keep: The total number of items that will be used to generate the recommendations. The remaining items will be used to test the model accuracy. We’ll identify the min number of jokes that an individual has rated and use a few less than that.
Rating Threshold: The threshold to be used for positive ratings. Since our data is on a scale of -10 to 10, we will use 1 as the threshold for a good joke.
Number of Folds: This is the number of folds that will be used for k-fold validation.

Finally, we’ll define our evaluation scheme for the models.

trainPct <- 0.8
toKeep <- min(rowCounts(finalRatings)) - 5
ratingThreshold <- 1
nFold <- 3

# define evaluation scheme
evalScheme <- evaluationScheme(finalRatings, 
                               method = "cross-validation", 
                               k = nFold, 
                               given = toKeep, 
                               goodRating = ratingThreshold)

Creation of Systems

Now that we’ve set up the evaluation scheme for our recommender systems, we can compare different models. We will evaluate the output of 2 IBCF models (using cosine and pearson similarities), 2 UBCF models (once again, using cosine and pearson similarities), and 1 random model for a baseline. We will also vary the number of recommendations from 5 to 20.

# models to compare
evalModels <- list(
 IBCF_cos = list(name = "IBCF", param = list(method =
 "cosine")),
 IBCF_cor = list(name = "IBCF", param = list(method =
 "pearson")),
 UBCF_cos = list(name = "UBCF", param = list(method =
 "cosine")),
 UBCF_cor = list(name = "UBCF", param = list(method =
 "pearson")),
 RANDOM = list(name = "RANDOM", param = NULL)
)

# number of recommendations
nRecs <- c(1, seq(5, 20, 5))

finalResults <- evaluate(x = evalScheme, method = evalModels, n = nRecs)

IBCF run fold/sample [model time/prediction time]
     1  [0.322sec/0.234sec] 
     2  [0.171sec/0.227sec] 
     3  [0.141sec/0.18sec] 
IBCF run fold/sample [model time/prediction time]
     1  [0.208sec/0.273sec] 
     2  [0.209sec/0.197sec] 
     3  [0.215sec/0.213sec] 
UBCF run fold/sample [model time/prediction time]
     1  [0.019sec/6.751sec] 
     2  [0.019sec/6.424sec] 
     3  [0.016sec/6.936sec] 
UBCF run fold/sample [model time/prediction time]
     1  [0.02sec/5.944sec] 
     2  [0.016sec/6.426sec] 
     3  [0.017sec/6.08sec] 
RANDOM run fold/sample [model time/prediction time]
     1  [0.016sec/0.18sec] 
     2  [0.016sec/0.157sec] 
     3  [0.014sec/0.156sec]

Comparisons

We can look at the average results across all folds for each algorithm. Each row represents a different number of recommendations. We can see that on average, as the number of recommendations increases, so does our accuracy.

avgs <- lapply(finalResults, avg)
avgs

$IBCF_cos
           TP        FP       FN       TN       N precision     recall
[1,] 0.332534  0.667466 21.57974 46.41707 68.9968 0.3325340 0.01335278
[2,] 1.900280  3.099720 20.01199 43.98481 68.9968 0.3800560 0.08177498
[3,] 4.199241  5.800759 17.71303 41.28377 68.9968 0.4199241 0.18881329
[4,] 6.669864  8.330136 15.24241 38.75440 68.9968 0.4446576 0.30605087
[5,] 9.068745 10.931255 12.84353 36.15328 68.9968 0.4534373 0.42111493
            TPR        FPR  n
[1,] 0.01335278 0.01320333  1
[2,] 0.08177498 0.06230248  5
[3,] 0.18881329 0.11748514 10
[4,] 0.30605087 0.17050830 15
[5,] 0.42111493 0.22629252 20

$IBCF_cor
            TP         FP       FN       TN       N precision     recall
[1,] 0.3585132  0.6414868 21.55376 46.44305 68.9968 0.3585132 0.01500556
[2,] 2.0403677  2.9596323 19.87190 44.12490 68.9968 0.4080735 0.09164950
[3,] 4.3573141  5.6426859 17.55496 41.44185 68.9968 0.4357314 0.20010862
[4,] 6.6530775  8.3469225 15.25919 38.73761 68.9968 0.4435385 0.30905396
[5,] 8.8355316 11.1644684 13.07674 35.92006 68.9968 0.4417766 0.41074201
            TPR        FPR  n
[1,] 0.01500556 0.01273883  1
[2,] 0.09164950 0.05965660  5
[3,] 0.20010862 0.11459192 10
[4,] 0.30905396 0.17202538 15
[5,] 0.41074201 0.23254645 20

$UBCF_cos
            TP         FP       FN       TN       N precision     recall
[1,] 0.3429257  0.6570743 21.56934 46.42746 68.9968 0.3429257 0.01612124
[2,] 2.1780576  2.8219424 19.73421 44.26259 68.9968 0.4356115 0.10694014
[3,] 4.6982414  5.3017586 17.21403 41.78277 68.9968 0.4698241 0.22961310
[4,] 7.0943245  7.9056755 14.81795 39.17886 68.9968 0.4729550 0.34400266
[5,] 9.3167466 10.6832534 12.59552 36.40128 68.9968 0.4658373 0.44663493
            TPR        FPR  n
[1,] 0.01612124 0.01407244  1
[2,] 0.10694014 0.05972910  5
[3,] 0.22961310 0.11205072 10
[4,] 0.34400266 0.16750348 15
[5,] 0.44663493 0.22716928 20

$UBCF_cor
           TP        FP       FN       TN       N precision    recall       TPR
[1,] 0.348721  0.651279 21.56355 46.43325 68.9968 0.3487210 0.0163412 0.0163412
[2,] 2.224221  2.775779 19.68805 44.30875 68.9968 0.4448441 0.1099663 0.1099663
[3,] 4.753597  5.246403 17.15867 41.83813 68.9968 0.4753597 0.2342428 0.2342428
[4,] 7.146483  7.853517 14.76579 39.23102 68.9968 0.4764322 0.3487017 0.3487017
[5,] 9.366707 10.633293 12.54556 36.45124 68.9968 0.4683353 0.4511211 0.4511211
            FPR  n
[1,] 0.01389553  1
[2,] 0.05917393  5
[3,] 0.11161485 10
[4,] 0.16705535 15
[5,] 0.22662958 20

$RANDOM
            TP         FP       FN       TN       N precision     recall
[1,] 0.3357314  0.6642686 21.57654 46.42026 68.9968 0.3357314 0.01515819
[2,] 1.6754596  3.3245404 20.23681 43.75999 68.9968 0.3350919 0.07445555
[3,] 3.3637090  6.6362910 18.54856 40.44824 68.9968 0.3363709 0.14764691
[4,] 5.0551559  9.9448441 16.85711 37.13969 68.9968 0.3370104 0.22200189
[5,] 6.7412070 13.2587930 15.17106 33.82574 68.9968 0.3370604 0.29652940
            TPR        FPR  n
[1,] 0.01515819 0.01495774  1
[2,] 0.07445555 0.07426318  5
[3,] 0.14764691 0.14765225 10
[4,] 0.22200189 0.22170309 15
[5,] 0.29652940 0.29551847 20

We can also visualize the ROC curves for each of the algorithms we’ve run. Each marker on the graph represents the TP/FP ratio for n recommendations. The plot shows higher performance from the UBCF models.

plot(finalResults, annotate = 1, legend = "topleft")
title("ROC curve")

The goal of our recommender system is to provide jokes that are funny to a user, so we want to minimize the number of false positives (recommendations that are wrong). We will therefore choose the algorithm with the highest precision (true positive rate). Once again, the UBCF models outperform the others.

plot(finalResults, "prec/rec", annotate = 1, legend = "bottomright")
title("Precision-recall")

Model Tuning

Based on this analysis, we will choose the UBCF model with Pearson similarity and 5 recommendations. We can further tune nn parameter for the the model.

numNeighbors <- seq(100,200,50)

ubcfModels <- lapply(numNeighbors, function(n){
 list(name = "UBCF", param = list(method = "pearson", nn = n))
})
names(ubcfModels) <- paste0("UBCF_", numNeighbors)

ubcfSchemes <- evaluate(x = evalScheme, method = ubcfModels, n = 5)

UBCF run fold/sample [model time/prediction time]
     1  [0.061sec/7.001sec] 
     2  [0.017sec/7.267sec] 
     3  [0.018sec/6.785sec] 
UBCF run fold/sample [model time/prediction time]
     1  [0.016sec/7.293sec] 
     2  [0.026sec/7.476sec] 
     3  [0.033sec/7.325sec] 
UBCF run fold/sample [model time/prediction time]
     1  [0.019sec/7.494sec] 
     2  [0.016sec/7.872sec] 
     3  [0.017sec/7.888sec]

We’ll pick the model with the best precision, which is 200 neighbors:

avg(ubcfSchemes)

$UBCF_100
           TP       FP       FN       TN       N precision    recall       TPR
[1,] 2.996403 2.003597 18.91587 45.08094 68.9968 0.5992806 0.1568248 0.1568248
           FPR n
[1,] 0.0417198 5

$UBCF_150
           TP       FP       FN       TN       N precision    recall       TPR
[1,] 3.118106 1.881894 18.79416 45.20264 68.9968 0.6236211 0.1643766 0.1643766
            FPR n
[1,] 0.03928962 5

$UBCF_200
           TP       FP       FN      TN       N precision    recall       TPR
[1,] 3.169065 1.830935 18.74321 45.2536 68.9968 0.6338129 0.1674701 0.1674701
           FPR n
[1,] 0.0381027 5

Final Model

Now, we can define our final model and calculate the precision and RMSE:

set.seed(200)

# UBCF Model
ubcfRec <- Recommender(getData(evalScheme, "train"), 'UBCF', parameter = list(method = 'pearson', nn = 200, normalize = 'center'))
ubcfPredN <- predict(ubcfRec, getData(evalScheme, "known"), n = 5)
ubcfPredR <- predict(ubcfRec,getData(evalScheme,'known'), type = 'ratings')

# calc accuracy on test set
ubcfAccN <- calcPredictionAccuracy(ubcfPredN, 
                                   getData(evalScheme, "unknown"), 
                                   given = toKeep, 
                                   goodRating = ratingThreshold)

ubcfAccR <- calcPredictionAccuracy(ubcfPredR, getData(evalScheme, "unknown"))

ubcfAccN

         TP          FP          FN          TN           N   precision 
 3.16007194  1.83992806 18.27937650 45.72062350 69.00000000  0.63201439 
     recall         TPR         FPR 
 0.17208091  0.17208091  0.03605055

ubcfAccR

     RMSE       MSE       MAE 
 4.258168 18.131991  3.357915

Introduce Novelty

In order to introduce novelty to the recommendations, we’ll take a percentage of the top recommendations from the UBCF model and switch the recommendations out with randomly selected jokes. To do this, we’ll define a recommendation system using the RANDOM methodology and then create a hybrid recommender that combines it with the UBCF model.

# Random Model
randRec <- Recommender(getData(evalScheme, "train"), 'RANDOM')

hybridRec <- HybridRecommender(ubcfRec,randRec, weights = c(0.8,0.2))
hybridPredN <- predict(hybridRec, getData(evalScheme, "known"), n = 5)
hybridPredR <- predict(hybridRec, getData(evalScheme,'known'), type = 'ratings')

# calc accuracy on test set
hybridAccN <- calcPredictionAccuracy(hybridPredN, 
                                     getData(evalScheme, "unknown"), 
                                     given = toKeep, 
                                     goodRating = ratingThreshold)
hybridAccR <- calcPredictionAccuracy(hybridPredR, getData(evalScheme, "unknown"))

hybridAccN

        TP         FP         FN         TN          N  precision     recall 
 2.7386091  2.2613909 18.7008393 45.2991607 69.0000000  0.5477218  0.1394927 
       TPR        FPR 
 0.1394927  0.0448812

hybridAccR

     RMSE       MSE       MAE 
 4.433347 19.654569  3.572955

Compare Results

We can compare the results of the UBCF-only model vs the hybrid model. The precision is worse in the hybrid model and the RMSE is higher.

data.frame(METHOD = c('UBCF','HYBRID'), 
           PRECISION = c(ubcfAccN['precision'], hybridAccN['precision']),
           RMSE = c(ubcfAccR['RMSE'], hybridAccR['RMSE']))

  METHOD PRECISION     RMSE
1   UBCF 0.6320144 4.258168
2 HYBRID 0.5477218 4.433347

We can also take a look at a comparison of the top 5 suggestions for one of the users. From this, we can see two things:

The ordering has changed in the recommendations.

There are new items in the hybrid system, which represent the random recommendations.

hybridPredN@items[2]

$`1`
[1] 89 21 11 31 91

ubcfPredN@items[2]

$`1`
[1] 89 65 54 21 66

Discussion

In this project, we evaluated several recommender system algorithms using offline metrics such as precision and RMSE to assess accuracy and effectiveness. These metrics guided our selection of User-Based Collaborative Filtering (UBCF) with Pearson similarity as a strong baseline model. To support user experience goals like novelty and diversity, we introduced a Hybrid Recommender that blends UBCF with a Random model.

In the final phase, we narrowed our focus to these two models—UBCF and the Hybrid—and concentrated on precision as our primary evaluation metric, as it most directly reflects the relevance of top-N recommendations. This focused comparison allowed us to assess the trade-off between pure accuracy and the added value of introducing novelty.

While offline evaluation provides meaningful insights, it cannot fully capture how users engage with recommendations in real-world applications. For a more complete understanding, online metrics such as click-through rate (CTR), dwell time, and user retention are crucial. An ideal future evaluation environment would use A/B testing with tools like Google Optimize or Firebase, and user behavior tracking with platforms like Mixpanel or Amplitude. This would enable a more realistic, data-driven refinement of recommendation strategies tailored to user satisfaction and business impact.

DATA 612 - Project 4 : Accuracy and Beyond

Bikash Bhowmik, Rupendra Shrestha

29 Jun 2025

Column

Column