Instruction
The goal of this assignment is give you practice working with accuracy and other recommender system metrics.In this assignment you’re asked to do at least one or (if you like) both of the following:
Work in a small group, and/or Choose a different dataset to work with from your previous projects. Deliverables As in your previous assignments, compare the accuracy of at least two recommender system algorithms against your offline data.
Implement support for at least one business or user experience goal such as increased serendipity, novelty, or diversity.
Compare and report on any change in accuracy before and after you’ve made the change in #2.
As part of your textual conclusion, discuss one or more additional experiments that could be performed and/or metrics that could be evaluated only if online evaluation was possible. Also, briefly propose how you would design a reasonable online evaluation environment.
The project at hand is to build a recommender system that can provide users with personalized joke recommendations. In addition to optimizing for recommendations that are accurate, the recommender also focuses on maximizing user experience by adding some level of serendipity to the recommendations - meaning users will receive unexpected items with the potential to be enjoyable within their most highly recommended items.
Initially, we investigate the performance of five recommendation algorithms:
User-Based Collaborative Filtering (UBCF) using cosine and Pearson similarity
Item-Based Collaborative Filtering (IBCF) using cosine and Pearson similarity
A Random recommender algorithm (a baseline)
For each method, we conduct a 3-fold cross-validation to evaluate precision and performance. After we establish the best performing algorithm, we promote novel items in the recommendation list by replacing a selection of the most recommended items with more novel items. In this approach we try to balance accurate recommendations with user engagement and discovery.
We use the Jester dataset, which includes joke ratings from 24,983 users. Each user has rated at least 36 jokes, with rating values ranging from -10.00 (least funny) to +10.00 (most funny). A rating of 99 indicates that a joke was not rated by the user. In the dataset, each row corresponds to a single user: the first column records the total number of jokes they rated, while the next 100 columns contain their individual ratings for each joke.
# Load the jester dataset
library(readxl)
# Read the Excel file (first sheet)
jester <- data.frame(read_xls("jester-data-1.xls", col_names = FALSE))
# Assign column names: first column is rating count, followed by 100 joke ratings
##colnames(jester) <- c("ratingCount", paste0("Joke_", 1:100))
##row.names(jester) <- 1:nrow(jester)
# Display the structure and first few rows of the data
# str(jester)
head(jester[, 1:6]) # Show the first 5 columns (rating count + 5 jokes)
...1 ...2 ...3 ...4 ...5 ...6
1 74 -7.82 8.79 -9.66 -8.16 -7.52
2 100 4.08 -0.29 6.36 4.37 -2.38
3 49 99.00 99.00 99.00 99.00 9.03
4 48 99.00 8.35 99.00 99.00 1.80
5 91 8.50 4.61 -4.17 -5.39 1.36
6 100 -6.17 -3.54 0.44 -8.50 -7.09
...2 ...3 ...4 ...5
Min. :-9.95 Min. :-9.95 Min. :-9.950 Min. :-9.95
1st Qu.:-0.15 1st Qu.:-2.18 1st Qu.:-1.145 1st Qu.:-3.35
Median : 5.05 Median : 3.69 Median : 4.660 Median : 3.45
Mean :34.40 Mean :27.09 Mean :33.264 Mean :36.24
3rd Qu.:99.00 3rd Qu.:99.00 3rd Qu.:99.000 3rd Qu.:99.00
Max. :99.00 Max. :99.00 Max. :99.000 Max. :99.00
...6
Min. :-9.9500
1st Qu.:-3.6900
Median : 1.1200
Mean : 0.4272
3rd Qu.: 4.4700
Max. :99.0000
This gives a quick sense of how the data is structured and how ratings are distributed across the first few jokes. It also confirms that the dataset has the correct dimensions and types before moving on to data preprocessing.
First, we will load in the jester data set. We will remove the column with the number of rated jokes because this will not be used in the recommendation system. Additionally, the raw data represents non-rated jokes as the number 99, so we will replace these values with nulls. Finally, we will subset the data to 5,000 users to speed up computation time.
# Read jester data
##jester <- data.frame(read_xls("jester-data-1.xls", col_names = FALSE))
colnames(jester) <- c("ratingCount", seq(100))
row.names(jester) <- 1:nrow(jester)
# remove num jokes column
ratings <- jester[-1]
# replace 0 (no rating) with NULL
ratings[ratings == 99] <- NA
ratings <- ratings[1:5000,]
ratings <- as.matrix(ratings)
# Create large dgCMatrix
finalRatings <- as(ratings, 'realRatingMatrix')
Let’s dive a little deeper into our data.
First, let’s take a look at the number of jokes that each user has rated. We set the threshold at 36 jokes, and it appears that most individuals have rated either around 70 or 100 jokes.
jokeCount <- rowCounts(finalRatings)
hist(jokeCount,
main = 'Number of Jokes Rated per User',
xlab = 'Number of Jokes Rated',
ylab = 'Number of Users')
Next, we can look at the number of ratings that each joke has. We can see that many of the jokes were rated by all 5000 users.
ratingCount <- colCounts(finalRatings)
hist(ratingCount,
main = 'Number of Individuals Rating each Joke',
xlab = 'Number of Users that Rated Joke',
ylab = 'Number of Jokes')
Now we can take a look at the average rating across all jokes. The median average rating is a little over 0 (neutral), which means that jokes are typically rated with a small positive skew.
0% 25% 50% 75% 100%
-3.8296705 -0.3050142 0.8847812 1.8412310 4.0056984
We can also plot the average ratings. A look at the distribution shows that most jokes have an average rating between -2 and 3. We have a few outliers that are rated more positively (+4) and more negatively (-4).
goodrating <- quantile(mean_rating, .5)
qplot(mean_rating) + ggtitle("Distribution of Average Joke Rating") + geom_vline(xintercept = goodrating, col='red')
From this data exploration, we can see that most users rated all jokes and conversely, most jokes were rated by all users. The median average rating for the users is a little over 0, so we will use 1 as our threshold for a good joke.
### Genre Exploration
library(readxl)
library(ggplot2)
# Load the Jester dataset
jester <- read_excel("jester-data-1.xls", col_names = FALSE)
# Remove first column (rating count) and replace 99s with NA
ratings <- jester[ , -1]
ratings[ratings == 99] <- NA
# Summary of ratings
summary(ratings)
...2 ...3 ...4 ...5
Min. :-9.9500 Min. :-9.9500 Min. :-9.9500 Min. :-9.950
1st Qu.:-2.6700 1st Qu.:-4.4200 1st Qu.:-4.0300 1st Qu.:-5.870
Median : 1.6000 Median : 0.8300 Median : 0.7800 Median :-1.460
Mean : 0.9046 Mean : 0.2085 Mean : 0.3166 Mean :-1.449
3rd Qu.: 4.9000 3rd Qu.: 4.8200 3rd Qu.: 4.6600 3rd Qu.: 2.520
Max. : 9.3700 Max. : 9.3700 Max. : 9.3700 Max. : 9.370
NA's :8531 NA's :6799 NA's :8341 NA's :9375
...6 ...7 ...8 ...9
Min. :-9.9500 Min. :-9.950 Min. :-9.9500 Min. :-9.9500
1st Qu.:-3.6900 1st Qu.:-1.310 1st Qu.:-5.2900 1st Qu.:-4.6100
Median : 1.1200 Median : 2.230 Median :-0.0500 Median :-0.2900
Mean : 0.4036 Mean : 1.594 Mean :-0.4297 Mean :-0.6241
3rd Qu.: 4.4700 3rd Qu.: 5.440 3rd Qu.: 3.8800 3rd Qu.: 3.0600
Max. : 9.3700 Max. : 9.470 Max. : 9.6100 Max. : 9.7600
NA's :6 NA's :4552 NA's :7 NA's :2
...10 ...11 ...12 ...13
Min. :-9.950 Min. :-9.950 Min. :-9.950 Min. :-9.950
1st Qu.:-4.760 1st Qu.:-2.090 1st Qu.:-1.210 1st Qu.:-1.650
Median :-0.190 Median : 1.940 Median : 2.430 Median : 2.040
Mean :-0.557 Mean : 1.301 Mean : 1.806 Mean : 1.449
3rd Qu.: 3.300 3rd Qu.: 5.390 3rd Qu.: 5.780 3rd Qu.: 5.290
Max. : 9.900 Max. : 9.370 Max. : 9.610 Max. : 9.850
NA's :9231 NA's :4659 NA's :3162 NA's :2560
...14 ...15 ...16 ...17
Min. :-9.950 Min. :-9.950 Min. :-9.950 Min. :-9.950
1st Qu.:-6.260 1st Qu.:-2.090 1st Qu.:-6.210 1st Qu.:-7.430
Median :-1.700 Median : 2.090 Median :-1.750 Median :-3.790
Mean :-1.763 Mean : 1.396 Mean :-1.706 Mean :-3.105
3rd Qu.: 2.280 3rd Qu.: 5.555 3rd Qu.: 2.330 3rd Qu.: 0.580
Max. : 9.370 Max. : 9.560 Max. : 9.900 Max. : 9.370
NA's :2 NA's :2468 NA's :6 NA's :8
...18 ...19 ...20 ...21
Min. :-9.950 Min. :-9.9500 Min. :-9.9500 Min. :-9.950
1st Qu.:-4.420 1st Qu.:-4.4200 1st Qu.:-3.4500 1st Qu.:-5.050
Median :-0.630 Median :-0.2900 Median : 0.5300 Median :-0.630
Mean :-1.115 Mean :-0.6351 Mean : 0.1542 Mean :-0.934
3rd Qu.: 1.940 3rd Qu.: 3.0100 3rd Qu.: 3.7400 3rd Qu.: 2.910
Max. : 9.370 Max. : 9.8100 Max. : 9.5600 Max. : 9.370
NA's :3 NA's :4 NA's :5 NA's :3
...22 ...23 ...24 ...25
Min. :-9.950 Min. :-9.9500 Min. :-9.9500 Min. :-9.950
1st Qu.:-0.780 1st Qu.:-2.8200 1st Qu.:-3.7900 1st Qu.:-6.210
Median : 2.820 Median : 1.4100 Median : 0.3400 Median :-1.750
Mean : 2.096 Mean : 0.8687 Mean : 0.1092 Mean :-1.687
3rd Qu.: 6.020 3rd Qu.: 4.9000 3rd Qu.: 3.9800 3rd Qu.: 2.230
Max. : 9.560 Max. : 9.8500 Max. : 9.8500 Max. : 9.810
NA's :62 NA's :3550 NA's :4950 NA's :9104
...26 ...27 ...28 ...29
Min. :-9.9500 Min. :-9.950 Min. :-9.95 Min. :-9.950
1st Qu.:-3.3500 1st Qu.:-1.840 1st Qu.: 0.68 1st Qu.:-1.650
Median : 0.7800 Median : 1.840 Median : 4.13 Median : 2.065
Mean : 0.3824 Mean : 1.286 Mean : 3.19 Mean : 1.519
3rd Qu.: 4.2700 3rd Qu.: 5.050 3rd Qu.: 6.75 3rd Qu.: 5.530
Max. : 9.3700 Max. : 9.370 Max. : 9.42 Max. : 9.420
NA's :4167 NA's :1233 NA's :79 NA's :993
...30 ...31 ...32 ...33
Min. :-9.950 Min. :-9.9500 Min. :-9.950 Min. :-9.950
1st Qu.: 0.340 1st Qu.:-4.1700 1st Qu.:-0.630 1st Qu.: 0.630
Median : 3.980 Median :-0.2900 Median : 3.010 Median : 3.930
Mean : 2.972 Mean :-0.4365 Mean : 2.199 Mean : 3.156
3rd Qu.: 6.750 3rd Qu.: 3.1600 3rd Qu.: 6.120 3rd Qu.: 6.600
Max. : 9.370 Max. : 9.3700 Max. : 9.370 Max. : 9.900
NA's :49 NA's :7006 NA's :327 NA's :35
...34 ...35 ...36 ...37
Min. :-9.950 Min. :-9.9500 Min. :-9.950 Min. :-9.950
1st Qu.:-5.920 1st Qu.:-2.7700 1st Qu.: 0.390 1st Qu.: 0.970
Median :-1.260 Median : 1.2100 Median : 3.830 Median : 4.030
Mean :-1.369 Mean : 0.8595 Mean : 3.006 Mean : 3.313
3rd Qu.: 2.620 3rd Qu.: 4.9000 3rd Qu.: 6.600 3rd Qu.: 6.600
Max. : 9.370 Max. : 9.3700 Max. : 9.900 Max. : 9.850
NA's :8279 NA's :3420 NA's :31 NA's :18
...38 ...39 ...40 ...41
Min. :-9.950 Min. :-9.95 Min. :-9.950 Min. :-9.9500
1st Qu.:-5.870 1st Qu.:-2.14 1st Qu.:-2.140 1st Qu.:-2.5200
Median :-1.260 Median : 1.89 Median : 1.460 Median : 1.4600
Mean :-1.364 Mean : 1.30 Mean : 1.067 Mean : 0.9986
3rd Qu.: 2.620 3rd Qu.: 5.29 3rd Qu.: 4.810 3rd Qu.: 5.0000
Max. : 9.370 Max. : 9.56 Max. : 9.760 Max. : 9.3700
NA's :8102 NA's :2125 NA's :1903 NA's :2585
...42 ...43 ...44 ...45
Min. :-9.9500 Min. :-9.950 Min. :-9.9500 Min. :-9.950
1st Qu.:-4.9000 1st Qu.:-0.780 1st Qu.:-5.3150 1st Qu.:-6.800
Median :-0.0500 Median : 2.620 Median :-0.5800 Median :-2.430
Mean :-0.3205 Mean : 1.967 Mean :-0.9249 Mean :-2.112
3rd Qu.: 4.0800 3rd Qu.: 5.830 3rd Qu.: 3.0600 3rd Qu.: 1.890
Max. : 9.3700 Max. : 9.850 Max. : 9.4200 Max. : 9.370
NA's :6216 NA's :414 NA's :7472 NA's :8806
...46 ...47 ...48 ...49
Min. :-9.950 Min. :-9.95 Min. :-9.95 Min. :-9.950
1st Qu.:-2.040 1st Qu.:-1.70 1st Qu.:-1.36 1st Qu.:-1.070
Median : 1.650 Median : 2.28 Median : 2.14 Median : 2.520
Mean : 1.109 Mean : 1.50 Mean : 1.56 Mean : 1.826
3rd Qu.: 4.810 3rd Qu.: 5.53 3rd Qu.: 5.49 3rd Qu.: 5.780
Max. :10.000 Max. : 9.66 Max. : 9.81 Max. : 9.370
NA's :3658 NA's :1358 NA's :2742 NA's :216
...50 ...51 ...52 ...53
Min. :-9.950 Min. :-9.950 Min. :-9.9500 Min. :-9.9500
1st Qu.: 0.100 1st Qu.: 1.550 1st Qu.:-5.2900 1st Qu.:-4.1300
Median : 3.450 Median : 4.470 Median :-0.5300 Median : 0.1000
Mean : 2.771 Mean : 3.665 Mean :-0.7317 Mean :-0.1331
3rd Qu.: 6.360 3rd Qu.: 6.840 3rd Qu.: 3.5400 3rd Qu.: 3.8300
Max. : 9.900 Max. : 9.950 Max. : 9.3700 Max. : 9.6600
NA's :26 NA's :11 NA's :6091 NA's :4919
...54 ...55 ...56 ...57
Min. :-9.950 Min. :-9.950 Min. :-9.9500 Min. :-9.950
1st Qu.: 0.390 1st Qu.:-0.150 1st Qu.:-3.3500 1st Qu.:-1.120
Median : 3.830 Median : 3.540 Median : 1.0200 Median : 2.430
Mean : 2.939 Mean : 2.736 Mean : 0.5057 Mean : 1.765
3rd Qu.: 6.550 3rd Qu.: 6.600 3rd Qu.: 4.5600 3rd Qu.: 5.580
Max. : 9.710 Max. : 9.510 Max. : 9.9500 Max. : 9.370
NA's :12 NA's :292 NA's :5256 NA's :243
...58 ...59 ...60 ...61
Min. :-9.950 Min. :-9.950 Min. :-9.950 Min. :-9.9500
1st Qu.:-6.800 1st Qu.:-8.160 1st Qu.:-4.320 1st Qu.:-4.8500
Median :-2.230 Median :-4.950 Median :-0.290 Median :-0.1000
Mean :-1.991 Mean :-3.834 Mean :-0.578 Mean :-0.3254
3rd Qu.: 2.180 3rd Qu.:-0.340 3rd Qu.: 3.010 3rd Qu.: 3.9800
Max. : 9.370 Max. : 9.420 Max. : 9.370 Max. : 9.3700
NA's :9010 NA's :9430 NA's :6848 NA's :7022
...62 ...63 ...64 ...65
Min. :-9.950 Min. :-9.950 Min. :-9.950 Min. :-9.9500
1st Qu.:-0.290 1st Qu.: 0.440 1st Qu.:-3.640 1st Qu.:-4.8100
Median : 3.200 Median : 3.740 Median : 0.530 Median :-0.3900
Mean : 2.459 Mean : 2.976 Mean : 0.247 Mean :-0.6679
3rd Qu.: 6.170 3rd Qu.: 6.500 3rd Qu.: 4.220 3rd Qu.: 3.2500
Max. : 9.370 Max. : 9.850 Max. : 9.470 Max. : 9.3700
NA's :178 NA's :47 NA's :4819 NA's :7707
...66 ...67 ...68 ...69
Min. :-9.950 Min. :-9.950 Min. :-9.9500 Min. :-9.950
1st Qu.:-0.780 1st Qu.:-0.240 1st Qu.:-5.1000 1st Qu.:-0.100
Median : 3.250 Median : 3.250 Median :-0.5300 Median : 3.350
Mean : 2.274 Mean : 2.542 Mean :-0.8639 Mean : 2.639
3rd Qu.: 6.460 3rd Qu.: 6.260 3rd Qu.: 3.0600 3rd Qu.: 6.260
Max. : 9.850 Max. :10.000 Max. : 9.3700 Max. : 9.710
NA's :255 NA's :60 NA's :7490 NA's :43
...70 ...71 ...72 ...73
Min. :-9.950 Min. :-9.9500 Min. :-9.9500 Min. :-9.950
1st Qu.:-0.100 1st Qu.:-2.9100 1st Qu.:-6.0200 1st Qu.: 0.190
Median : 3.350 Median : 0.7300 Median :-0.5300 Median : 3.400
Mean : 2.559 Mean : 0.4184 Mean :-0.6875 Mean : 2.757
3rd Qu.: 6.170 3rd Qu.: 3.8800 3rd Qu.: 4.2200 3rd Qu.: 6.310
Max. : 9.760 Max. : 9.3700 Max. : 9.3700 Max. : 9.370
NA's :67 NA's :4761 NA's :16451 NA's :16290
...74 ...75 ...76 ...77
Min. :-9.950 Min. :-9.950 Min. :-9.9500 Min. :-9.95
1st Qu.:-2.520 1st Qu.:-5.780 1st Qu.:-4.7100 1st Qu.:-0.19
Median : 1.500 Median :-1.650 Median :-0.0500 Median : 3.25
Mean : 1.083 Mean :-1.554 Mean :-0.2707 Mean : 2.52
3rd Qu.: 5.390 3rd Qu.: 2.140 3rd Qu.: 4.0300 3rd Qu.: 6.21
Max. : 9.370 Max. : 9.370 Max. : 9.3700 Max. : 9.37
NA's :16367 NA's :16222 NA's :16190 NA's :16123
...78 ...79 ...80 ...81
Min. :-9.9500 Min. :-9.950 Min. :-9.9500 Min. :-9.950
1st Qu.:-3.0600 1st Qu.:-1.170 1st Qu.:-4.1700 1st Qu.:-3.060
Median : 1.0700 Median : 2.430 Median : 0.3400 Median : 1.890
Mean : 0.7154 Mean : 1.752 Mean : 0.0917 Mean : 1.122
3rd Qu.: 4.8100 3rd Qu.: 5.730 3rd Qu.: 4.3200 3rd Qu.: 5.830
Max. : 9.3700 Max. : 9.370 Max. : 9.3700 Max. : 9.810
NA's :16039 NA's :15962 NA's :15909 NA's :15959
...82 ...83 ...84 ...85
Min. :-9.950 Min. :-9.9500 Min. :-9.950 Min. :-9.9500
1st Qu.:-1.840 1st Qu.:-2.7700 1st Qu.:-1.120 1st Qu.:-2.9600
Median : 2.820 Median : 1.3600 Median : 2.820 Median : 1.0700
Mean : 1.837 Mean : 0.9027 Mean : 2.028 Mean : 0.6944
3rd Qu.: 6.260 3rd Qu.: 4.9500 3rd Qu.: 6.210 3rd Qu.: 4.7600
Max. : 9.370 Max. : 9.3700 Max. : 9.370 Max. : 9.3700
NA's :15760 NA's :15859 NA's :15697 NA's :15686
...86 ...87 ...88 ...89
Min. :-9.9500 Min. :-9.9500 Min. :-9.950 Min. :-9.95
1st Qu.:-2.9600 1st Qu.:-3.5000 1st Qu.:-0.920 1st Qu.:-0.49
Median : 1.5000 Median : 0.5800 Median : 2.480 Median : 2.72
Mean : 0.9189 Mean : 0.3447 Mean : 1.901 Mean : 2.11
3rd Qu.: 5.2400 3rd Qu.: 4.2700 3rd Qu.: 5.680 3rd Qu.: 5.78
Max. : 9.3700 Max. : 9.3700 Max. : 9.370 Max. : 9.47
NA's :15615 NA's :15514 NA's :15531 NA's :15379
...90 ...91 ...92 ...93
Min. :-9.950 Min. :-9.950 Min. :-9.9500 Min. :-9.95
1st Qu.: 0.970 1st Qu.:-3.450 1st Qu.:-0.9325 1st Qu.:-2.38
Median : 4.760 Median : 1.070 Median : 2.8600 Median : 1.80
Mean : 3.575 Mean : 0.612 Mean : 2.0385 Mean : 1.18
3rd Qu.: 7.430 3rd Qu.: 4.810 3rd Qu.: 6.1200 3rd Qu.: 5.29
Max. : 9.420 Max. : 9.370 Max. : 9.5600 Max. : 9.42
NA's :15419 NA's :15255 NA's :15311 NA's :15094
...94 ...95 ...96 ...97
Min. :-9.950 Min. :-9.950 Min. :-9.950 Min. :-9.950
1st Qu.:-0.100 1st Qu.:-2.280 1st Qu.:-2.570 1st Qu.:-1.347
Median : 3.160 Median : 1.750 Median : 1.650 Median : 2.140
Mean : 2.504 Mean : 1.173 Mean : 1.039 Mean : 1.528
3rd Qu.: 6.070 3rd Qu.: 5.290 3rd Qu.: 5.100 3rd Qu.: 5.277
Max. : 9.370 Max. : 9.420 Max. : 9.370 Max. : 9.370
NA's :14995 NA's :14783 NA's :14718 NA's :14613
...98 ...99 ...100 ...101
Min. :-9.950 Min. :-9.9500 Min. :-9.9500 Min. :-9.950
1st Qu.:-1.170 1st Qu.:-3.4000 1st Qu.:-4.1700 1st Qu.:-2.330
Median : 2.480 Median : 1.4600 Median : 0.2900 Median : 2.140
Mean : 1.674 Mean : 0.7677 Mean :-0.0312 Mean : 1.355
3rd Qu.: 5.630 3rd Qu.: 5.1500 3rd Qu.: 3.9800 3rd Qu.: 5.730
Max. : 9.370 Max. : 9.3700 Max. : 9.3700 Max. : 9.370
NA's :14463 NA's :14317 NA's :14073 NA's :15040
# Histogram: how many jokes each user rated
hist(rowSums(!is.na(ratings)),
main = "Number of Jokes Rated per User",
xlab = "Jokes Rated", ylab = "Users")
# Histogram: how many users rated each joke
hist(colSums(!is.na(ratings)),
main = "Number of Users per Joke",
xlab = "Users per Joke", ylab = "Jokes")
# Histogram: average joke ratings
avg_joke_rating <- colMeans(ratings, na.rm = TRUE)
hist(avg_joke_rating,
main = "Average Rating per Joke",
xlab = "Avg Rating", ylab = "Jokes")
We explored the distribution of movie genres in the dataset. This helps evaluate whether the recommender covers a wide range of genres or mostly sticks to popular ones.
We’ll define the following:
Training Percent: The percent of the data that should be used in training. The remaining data will be used for testing.
Items To Keep: The total number of items that will be used to generate the recommendations. The remaining items will be used to test the model accuracy. We’ll identify the min number of jokes that an individual has rated and use a few less than that.
Rating Threshold: The threshold to be used for positive ratings. Since our data is on a scale of -10 to 10, we will use 1 as the threshold for a good joke.
Number of Folds: This is the number of folds that will be used for k-fold validation.
Finally, we’ll define our evaluation scheme for the models.
trainPct <- 0.8
toKeep <- min(rowCounts(finalRatings)) - 5
ratingThreshold <- 1
nFold <- 3
# define evaluation scheme
evalScheme <- evaluationScheme(finalRatings,
method = "cross-validation",
k = nFold,
given = toKeep,
goodRating = ratingThreshold)
Now that we’ve set up the evaluation scheme for our recommender systems, we can compare different models. We will evaluate the output of 2 IBCF models (using cosine and pearson similarities), 2 UBCF models (once again, using cosine and pearson similarities), and 1 random model for a baseline. We will also vary the number of recommendations from 5 to 20.
# models to compare
evalModels <- list(
IBCF_cos = list(name = "IBCF", param = list(method =
"cosine")),
IBCF_cor = list(name = "IBCF", param = list(method =
"pearson")),
UBCF_cos = list(name = "UBCF", param = list(method =
"cosine")),
UBCF_cor = list(name = "UBCF", param = list(method =
"pearson")),
RANDOM = list(name = "RANDOM", param = NULL)
)
# number of recommendations
nRecs <- c(1, seq(5, 20, 5))
finalResults <- evaluate(x = evalScheme, method = evalModels, n = nRecs)
IBCF run fold/sample [model time/prediction time]
1 [0.42sec/0.38sec]
2 [0.28sec/0.26sec]
3 [0.22sec/0.26sec]
IBCF run fold/sample [model time/prediction time]
1 [0.23sec/0.29sec]
2 [0.27sec/0.27sec]
3 [0.26sec/0.27sec]
UBCF run fold/sample [model time/prediction time]
1 [0.01sec/8.95sec]
2 [0.03sec/9.08sec]
3 [0.02sec/9.03sec]
UBCF run fold/sample [model time/prediction time]
1 [0.03sec/7.64sec]
2 [0.02sec/7.59sec]
3 [0.02sec/7.51sec]
RANDOM run fold/sample [model time/prediction time]
1 [0sec/0.25sec]
2 [0.01sec/0.23sec]
3 [0sec/0.22sec]
We can look at the average results across all folds for each algorithm. Each row represents a different number of recommendations. We can see that on average, as the number of recommendations increases, so does our accuracy.
$IBCF_cos
TP FP FN TN N precision recall
[1,] 0.3271383 0.6728617 21.59373 46.40907 69.0028 0.3271383 0.01318181
[2,] 1.8848921 3.1151079 20.03597 43.96683 69.0028 0.3769784 0.08060305
[3,] 4.1486811 5.8513189 17.77218 41.23062 69.0028 0.4148681 0.18438920
[4,] 6.6227018 8.3772982 15.29816 38.70464 69.0028 0.4415135 0.30244992
[5,] 9.0189848 10.9810152 12.90188 36.10092 69.0028 0.4509492 0.41644494
TPR FPR n
[1,] 0.01318181 0.01349611 1
[2,] 0.08060305 0.06261160 5
[3,] 0.18438920 0.11819070 10
[4,] 0.30244992 0.17088881 15
[5,] 0.41644494 0.22700535 20
$IBCF_cor
TP FP FN TN N precision recall
[1,] 0.3531175 0.6468825 21.56775 46.43505 69.0028 0.3531175 0.01485068
[2,] 2.0119904 2.9880096 19.90887 44.09392 69.0028 0.4023981 0.08916292
[3,] 4.3111511 5.6888489 17.60971 41.39309 69.0028 0.4311151 0.19677503
[4,] 6.6145084 8.3854916 15.30635 38.69644 69.0028 0.4409672 0.30444612
[5,] 8.8307354 11.1692646 13.09013 35.91267 69.0028 0.4415368 0.40581214
TPR FPR n
[1,] 0.01485068 0.01276791 1
[2,] 0.08916292 0.05989511 5
[3,] 0.19677503 0.11578899 10
[4,] 0.30444612 0.17241219 15
[5,] 0.40581214 0.23221022 20
$UBCF_cos
TP FP FN TN N precision recall
[1,] 0.3377298 0.6622702 21.58313 46.41966 69.0028 0.3377298 0.01523079
[2,] 2.1658673 2.8341327 19.75500 44.24780 69.0028 0.4331735 0.10552667
[3,] 4.6552758 5.3447242 17.26559 41.73721 69.0028 0.4655276 0.22621144
[4,] 7.0531575 7.9468425 14.86771 39.13509 69.0028 0.4702105 0.34037986
[5,] 9.2729816 10.7270184 12.64788 36.35492 69.0028 0.4636491 0.44420610
TPR FPR n
[1,] 0.01523079 0.01394790 1
[2,] 0.10552667 0.05961588 5
[3,] 0.22621144 0.11236298 10
[4,] 0.34037986 0.16784196 15
[5,] 0.44420610 0.22805323 20
$UBCF_cor
TP FP FN TN N precision recall
[1,] 0.3485212 0.6514788 21.57234 46.43046 69.0028 0.3485212 0.01625357
[2,] 2.2064349 2.7935651 19.71443 44.28837 69.0028 0.4412870 0.10815016
[3,] 4.7340128 5.2659872 17.18685 41.81595 69.0028 0.4734013 0.23239351
[4,] 7.1518785 7.8481215 14.76898 39.23381 69.0028 0.4767919 0.34711634
[5,] 9.3699041 10.6300959 12.55096 36.45184 69.0028 0.4684952 0.44972118
TPR FPR n
[1,] 0.01625357 0.01393754 1
[2,] 0.10815016 0.05935082 5
[3,] 0.23239351 0.11158681 10
[4,] 0.34711634 0.16634573 15
[5,] 0.44972118 0.22633686 20
$RANDOM
TP FP FN TN N precision recall
[1,] 0.3379297 0.6620703 21.58293 46.41986 69.0028 0.3379297 0.01500125
[2,] 1.6876499 3.3123501 20.23321 43.76958 69.0028 0.3375300 0.07423988
[3,] 3.3529177 6.6470823 18.56795 40.43485 69.0028 0.3352918 0.14762206
[4,] 5.0197842 9.9802158 16.90108 37.10172 69.0028 0.3346523 0.22161902
[5,] 6.7158273 13.2841727 15.20504 33.79776 69.0028 0.3357914 0.29627752
TPR FPR n
[1,] 0.01500125 0.01464540 1
[2,] 0.07423988 0.07395019 5
[3,] 0.14762206 0.14877515 10
[4,] 0.22161902 0.22355126 15
[5,] 0.29627752 0.29750875 20
We can also visualize the ROC curves for each of the algorithms we’ve run. Each marker on the graph represents the TP/FP ratio for n recommendations. The plot shows higher performance from the UBCF models.
The goal of our recommender system is to provide jokes that are funny to a user, so we want to minimize the number of false positives (recommendations that are wrong). We will therefore choose the algorithm with the highest precision (true positive rate). Once again, the UBCF models outperform the others.
Based on this analysis, we will choose the UBCF model with Pearson similarity and 5 recommendations. We can further tune nn parameter for the the model.
numNeighbors <- seq(100,200,50)
ubcfModels <- lapply(numNeighbors, function(n){
list(name = "UBCF", param = list(method = "pearson", nn = n))
})
names(ubcfModels) <- paste0("UBCF_", numNeighbors)
ubcfSchemes <- evaluate(x = evalScheme, method = ubcfModels, n = 5)
UBCF run fold/sample [model time/prediction time]
1 [0.02sec/8.93sec]
2 [0.03sec/8.27sec]
3 [0.03sec/8.61sec]
UBCF run fold/sample [model time/prediction time]
1 [0.03sec/9.84sec]
2 [0.21sec/11.27sec]
3 [0.01sec/11.24sec]
UBCF run fold/sample [model time/prediction time]
1 [0.03sec/9.98sec]
2 [0.03sec/9.45sec]
3 [0.04sec/9.26sec]
We’ll pick the model with the best precision, which is 200 neighbors:
$UBCF_100
TP FP FN TN N precision recall TPR
[1,] 2.992006 2.007994 18.92886 45.07394 69.0028 0.5984013 0.1555668 0.1555668
FPR n
[1,] 0.041739 5
$UBCF_150
TP FP FN TN N precision recall TPR
[1,] 3.110512 1.889488 18.81035 45.19245 69.0028 0.6221023 0.1637228 0.1637228
FPR n
[1,] 0.03935922 5
$UBCF_200
TP FP FN TN N precision recall TPR
[1,] 3.176259 1.823741 18.7446 45.25819 69.0028 0.6352518 0.1680843 0.1680843
FPR n
[1,] 0.03790321 5
Now, we can define our final model and calculate the precision and RMSE:
set.seed(200)
# UBCF Model
ubcfRec <- Recommender(getData(evalScheme, "train"), 'UBCF', parameter = list(method = 'pearson', nn = 200, normalize = 'center'))
ubcfPredN <- predict(ubcfRec, getData(evalScheme, "known"), n = 5)
ubcfPredR <- predict(ubcfRec,getData(evalScheme,'known'), type = 'ratings')
# calc accuracy on test set
ubcfAccN <- calcPredictionAccuracy(ubcfPredN,
getData(evalScheme, "unknown"),
given = toKeep,
goodRating = ratingThreshold)
ubcfAccR <- calcPredictionAccuracy(ubcfPredR, getData(evalScheme, "unknown"))
ubcfAccN
TP FP FN TN N precision
3.13309353 1.86690647 18.81055156 45.18944844 69.00000000 0.62661871
recall TPR FPR
0.16670923 0.16670923 0.03730136
RMSE MSE MAE
4.220717 17.814451 3.329481
In order to introduce novelty to the recommendations, we’ll take a percentage of the top recommendations from the UBCF model and switch the recommendations out with randomly selected jokes. To do this, we’ll define a recommendation system using the RANDOM methodology and then create a hybrid recommender that combines it with the UBCF model.
# Random Model
randRec <- Recommender(getData(evalScheme, "train"), 'RANDOM')
hybridRec <- HybridRecommender(ubcfRec,randRec, weights = c(0.8,0.2))
hybridPredN <- predict(hybridRec, getData(evalScheme, "known"), n = 5)
hybridPredR <- predict(hybridRec, getData(evalScheme,'known'), type = 'ratings')
# calc accuracy on test set
hybridAccN <- calcPredictionAccuracy(hybridPredN,
getData(evalScheme, "unknown"),
given = toKeep,
goodRating = ratingThreshold)
hybridAccR <- calcPredictionAccuracy(hybridPredR, getData(evalScheme, "unknown"))
hybridAccN
TP FP FN TN N precision
2.75599520 2.24400480 19.18764988 44.81235012 69.00000000 0.55119904
recall TPR FPR
0.13903076 0.13903076 0.04504358
RMSE MSE MAE
4.387298 19.248386 3.537368
We can compare the results of the UBCF-only model vs the hybrid model. The precision is worse in the hybrid model and the RMSE is higher.
data.frame(METHOD = c('UBCF','HYBRID'),
PRECISION = c(ubcfAccN['precision'], hybridAccN['precision']),
RMSE = c(ubcfAccR['RMSE'], hybridAccR['RMSE']))
METHOD PRECISION RMSE
1 UBCF 0.6266187 4.220717
2 HYBRID 0.5511990 4.387298
We can also take a look at a comparison of the top 5 suggestions for one of the users. From this, we can see two things:
The ordering has changed in the recommendations.
There are new items in the hybrid system, which represent the random recommendations.
$`1`
[1] 29 21 89 31 3
$`1`
[1] 27 21 72 61 49
In this project, we evaluated several recommender system algorithms using offline metrics such as precision and RMSE to assess accuracy and effectiveness. These metrics guided our selection of User-Based Collaborative Filtering (UBCF) with Pearson similarity as a strong baseline model. To support user experience goals like novelty and diversity, we introduced a Hybrid Recommender that blends UBCF with a Random model.
In the final phase, we narrowed our focus to these two models—UBCF and the Hybrid—and concentrated on precision as our primary evaluation metric, as it most directly reflects the relevance of top-N recommendations. This focused comparison allowed us to assess the trade-off between pure accuracy and the added value of introducing novelty.
While offline evaluation provides meaningful insights, it cannot fully capture how users engage with recommendations in real-world applications. For a more complete understanding, online metrics such as click-through rate (CTR), dwell time, and user retention are crucial. An ideal future evaluation environment would use A/B testing with tools like Google Optimize or Firebase, and user behavior tracking with platforms like Mixpanel or Amplitude. This would enable a more realistic, data-driven refinement of recommendation strategies tailored to user satisfaction and business impact.