The goal of this assignment is to implement, configure and evaluate recommender systems using different approaches. To achieve this, the assignment calls for using a user-item ratings dataset (from MovieLens in this case) and implement at least two recommendation algorithms:

Content-Based Filtering
User-User Collaborative Filtering
Item-Item Collaborative filtering

The evaluation should compare different approaches, using different algorithms, normalization techniques, similarity methods, neighborhood sizes, etc.

Data input

The MovieLens dataset was retrieved from the https://grouplens.org/datasets/movielens/ website using the small dataset with 100,000 ratings, 1,300 tags for 9,000 movies and 700 users. This is the most updated dataset (10/2016) with the most manageable size for R. The real challenge in that sense was creating a matrix of a size that R could handle.

Loading libraries/dataset

library(dplyr)
library(tidyr)
library(ggplot2)
library(recommenderlab)
library(knitr)

The MovieLens matrix was created combining two datasets (movies and ratings). The movies dataset included the movieId, the title and the genres, while the ratings dataset included the userId, the movieId, the movie rating and a timestamp.

movies = read.csv("https://raw.githubusercontent.com/Galanopoulog/DATA643/master/Project%202/movies.csv", 
                   header = TRUE, sep = ",", stringsAsFactors = FALSE)
ratings = read.csv("https://raw.githubusercontent.com/Galanopoulog/DATA643/master/Project%202/ratings.csv", 
                    header = TRUE, sep =",", stringsAsFactors = FALSE)

kable(head(movies))

movieId	title	genres
1	Toy Story (1995)	Adventure\|Animation\|Children\|Comedy\|Fantasy
2	Jumanji (1995)	Adventure\|Children\|Fantasy
3	Grumpier Old Men (1995)	Comedy\|Romance
4	Waiting to Exhale (1995)	Comedy\|Drama\|Romance
5	Father of the Bride Part II (1995)	Comedy
6	Heat (1995)	Action\|Crime\|Thriller

kable(head(ratings))

userId	movieId	rating	timestamp
1	31	2.5	1260759144
1	1029	3.0	1260759179
1	1061	3.0	1260759182
1	1129	2.0	1260759185
1	1172	4.0	1260759205
1	1263	2.0	1260759151

In order to create the matrix, both datasets had to be first downsized (for R to be able to handle) and then combined. The datasets with the most reviews and the most active users were preferable, so the data was downsized with that in mind. This resulted in the users with at least 20 reviews being included and the movies with at least 46 reviews. The dataset before being shaped into a matrix was formatted as such:

x = head(data.frame(table(ratings$userId) %>% sort(decreasing =T)), 10000)

colnames(x) = c("userId", "count")
ratex = merge(ratings , x, by="userId")

submovie = subset(movies, select = c("movieId", "title"))
subrating = subset(ratex, select = c("userId","movieId", "rating"))

prematrix = subset(merge(subrating, submovie, by="movieId"), select = c("userId","rating","title"))

y = head(data.frame(table(prematrix$title) %>% sort(decreasing =T)), 500)
colnames(y) = c("title", "count2")

prematrix2 = subset(merge(prematrix, y, by="title"), select = c("title","userId","rating"))
kable(head(prematrix2))

title	userId	rating
10 Things I Hate About You (1999)	599	5.0
10 Things I Hate About You (1999)	207	0.5
10 Things I Hate About You (1999)	638	4.5
10 Things I Hate About You (1999)	128	5.0
10 Things I Hate About You (1999)	596	3.0
10 Things I Hate About You (1999)	95	5.0

Originally, the matrix from the dataset was aimed to be slightly larger but due to formatting issues with the “realRatingMatrix”, the matrix was further reduced to a 500x500 size. It was later in the assignment that more formatting issues presented themselves (in the evaluation section), and the matrix was further reduced in size.

matrix = spread(prematrix2, title, rating)
ex = as.matrix((matrix[-c(501:670),-1]))
data = as(ex, "realRatingMatrix")
data = data[rowCounts(data) > 30, colCounts(data) > 50]
data

## 306 x 281 rating matrix of class 'realRatingMatrix' with 22003 ratings.

Despite the matrix having dimensions of 306 x 281, it still includes 22,003 ratings, which should be enough to create and evaluate the differences between recommender systems with varying algorithms.

Data Exploration

From the heatmap below, we can see the top 30 rows and columns for the dataframe and how sparse the matrix is. The darker color is an indication of higher activity, meaning that the darker the rows, the higher the user’s ratings were across all movies.

image(data[rowCounts(data) > 150,
colCounts(data) > 150], main = "Heatmap of the Top Users
and Movies")

When looking at the ratings themselves, without taking into consideration the users or the movies, the majority of the ratings are 3-stars and above, with 4 stars being the most common grade.

# Ratings
qplot(getRatings(data), geom="histogram", main = "Histogram of Ratings", xlab = "Rating Scores", binwidth = 0.5, fill=I("cornflower blue"))

It is interesting to note that some of the results from the earlier histogram (particularly the frequency of awarding 5-star scores) is different when taking into consideration the movie. The average score per movie was primarily within the range of 3.5-4 stars. The other scores were significantly less, so much, in fact, that combining them into one category would not be enough to compete with the 3.5-star count. The presence of 0-2 star range is non-existant in this matrix.

# Ratings per Movie
qplot(colMeans(data), geom="histogram", main = "Histogram of Movie Ratings", xlab = "Rating Scores Per Movie", binwidth = 0.5, fill=I("FireBrick"))

Taking into consideration the users, the star-rating range goes as low as 2 stars and as high as 4.5. Even though, once again, the rating primarily granted is between the 3.5-4 star range, we notice that users are more lenient to give out 4.5 stars than could be assumed from the “Rating Scores Per Item” histogram.

# Ratings per User
qplot(rowMeans(data), geom="histogram", main = "Histogram of User Ratings", xlab = "Rating Scores Per User", binwidth = 0.5, fill=I("Plum"))

Though some users may give out low scores, it is not a usual case that one user consistently gives out low ratings (tangent topic: this may be used to help detect accounts for individual with the intention of sabotaging a recommender system). Overall, it appears that users typically rate movies as a 3.5 or 4 and it can be assummed that the users who typically rate movies highly (4.5 and above) either do so for movies with low ratings or do so for movies that are rated lower by a majority of other users (because the movies rating overall as a 4.5 is rather low).

# Splitting training and testing
values = sample(x = c(TRUE, FALSE), size = nrow(data),
                      replace = TRUE, prob = c(0.8, 0.2))

train = data[values, ]
test = data[!values, ]

Item_Item Collab

Item-to-Item Collaborative Filtering takes into account the information of different users. By examining individuals’ ratings of certain items, it calculates the similarity between items and recommends them. The closer the items scores, the more likely the items are to be recommended.

Building the Model

Because the recommender system created is Item-to-Item, the “IBCF” method (Item Based Collaborative Filter method) was used, where the closeness between items was measured using the cosine.

item_item = Recommender(data = train, method = "IBCF",
                          parameter = list(k = 30, method = "Cosine"))
item_item

## Recommender of type 'IBCF' for 'realRatingMatrix' 
## learned using 226 users.

Making Recommendations

A function was then created where inputing the user’s Id number returned a list of six recommended movies. Taking the first user (User 1), the six recommendations were made below:

# recommending
n_recommended = 6
predictions = predict(object = item_item, newdata = test,
                       n = n_recommended)
predictions

## Recommendations as 'topNList' with n = 6 for 80 users.

item_pred = function(idNum){
  user_x = predictions@items[[idNum]]
  user_x = predictions@itemLabels[user_x]
  return(data.frame(user_x))
}
item_pred(1)

##                                                    user_x
## 1                                      Client, The (1994)
## 2 Crouching Tiger, Hidden Dragon (Wo hu cang long) (2000)
## 3                                     Finding Nemo (2003)
## 4                                        Game, The (1997)
## 5                              In the Line of Fire (1993)
## 6                                       Piano, The (1993)

The top ten most recommended movies made overall by this recommender system are listed below in both the ordered table and the histogram:

recc_matrix = sapply(predictions@items, function(x){
colnames(data)[x]
})

number_of_items = data.frame(table(unlist(recc_matrix, use.names=FALSE)))
sorted = number_of_items[order(number_of_items$Freq, decreasing = T),][1:10,]
kable(sorted)

	Var1	Freq
2	Abyss, The (1989)	7
5	Addams Family Values (1993)	7
13	American Beauty (1999)	7
67	Dead Poets Society (1989)	7
82	Fifth Element, The (1997)	7
14	American History X (1998)	6
16	American President, The (1995)	6
27	Batman (1989)	6
38	Bourne Identity, The (2002)	6
39	Bourne Supremacy, The (2004)	6

ggplot(data=sorted, aes(x=Var1, y=Freq)) +
  geom_bar(stat="identity", fill="steelblue")+
  geom_text(aes(label=Freq), color="red", size=3)+
  theme_minimal()+
  xlab("Movie Titles")+
  coord_flip()

User-User Collab

User-to-User Collaborative Filtering takes into account the information of different users by examining the similarity between them and recommending items based on past ratings. The closer the users, the more likely a past item rated by one user (but not by another) is to be recommended.

Building the Model

Because the recommender system created is User-to-User, the “UBCF” method (User Based Collaborative Filter method) was used, where the closeness between users was measured using the cosine.

user_user = Recommender(data = train, method = "UBCF", parameter = list(k = 30, method = "Cosine"))

## Warning: Unknown parameters: k

## Available parameter (with default values):
## method    =  cosine
## nn    =  25
## sample    =  FALSE
## normalize     =  center
## verbose   =  FALSE

user_user

## Recommender of type 'UBCF' for 'realRatingMatrix' 
## learned using 226 users.

Making Recommendations

As done previously with the Item-to-Item system, a function was created where inputing the user’s Id number returned a list of six recommended movies. Again, taking the first user (User 1), for comparison purposes, the six recommendations were made below:

# recommending
n_recommended = 6
predictions2 = predict(object = user_user, newdata = test,
                          n = n_recommended)
predictions2

## Recommendations as 'topNList' with n = 6 for 80 users.

user_pred = function(idNum){
  user_x = predictions2@items[[idNum]]
  user_x = predictions2@itemLabels[user_x]
  return(data.frame(user_x))
}

user_pred(1)

##                                                                        user_x
## 1                                            Shawshank Redemption, The (1994)
## 2                                                              Memento (2000)
## 3                                                         Pianist, The (2002)
## 4                                                      American Beauty (1999)
## 5 Dr. Strangelove or: How I Learned to Stop Worrying and Love the Bomb (1964)
## 6                                                     Schindler's List (1993)

The top ten most recommended movies made overall by this recommender system are listed below in both the ordered table and the histogram:

recc_matrix2 = sapply(predictions2@items, function(x){
colnames(data)[x]
})

number_of_items = data.frame(table(unlist(recc_matrix2, use.names=FALSE)))
sorted = number_of_items[order(number_of_items$Freq, decreasing = T),][1:10,]
kable(sorted)

	Var1	Freq
23	Dr. Strangelove or: How I Learned to Stop Worrying and Love the Bomb (1964)	28
63	Schindler’s List (1993)	26
13	Braveheart (1995)	21
76	Terminator 2: Judgment Day (1991)	21
66	Shawshank Redemption, The (1994)	20
32	Godfather, The (1972)	19
44	Lord of the Rings: The Fellowship of the Ring, The (2001)	16
26	Fight Club (1999)	15
83	Usual Suspects, The (1995)	15
20	Dances with Wolves (1990)	14

ggplot(data=sorted, aes(x=Var1, y=Freq)) +
  geom_bar(stat="identity", fill="steelblue")+
  geom_text(aes(label=Freq), color="red", size=3)+
  theme_minimal()+
  xlab("Movie Titles")+
  coord_flip()

Just by looking at the results of the User-to-User system, it is interesting to note that the recommendations for the same user are different between this approach and the Item-to-Item method. In addition, the User-to-User approach appears to recommend the same movies more frequently. This may be reflective of how popular movies are viewed or recommended simply because they are popular.

Comparing Models

Comparing IBCF and UBCF

In order to test the models’ accuracy, the first thing needed was to determine the minimum number of items rated by each user. The number of recommendations made in the training set was to be determined based on this (it must be a lower value).

set.seed(101)

minimum = min(rowCounts(data))
minimum

## [1] 18

An evaluation set was then created containing a training and testing set (testing included known and unknown sets). The k-fold validation approach was used (where chunks are taken out of the test set and each one is evaluated for accuracy) with k number of repetitions.

evaluation = evaluationScheme(data = data, method = "cross-validation", k = 10, given = 10, goodRating = 3.5)

evaluation

## Evaluation scheme with 10 items given
## Method: 'cross-validation' with 10 run(s).
## Good ratings: >=3.500000
## Data set: 306 x 281 rating matrix of class 'realRatingMatrix' with 22003 ratings.

ev_train = getData(evaluation, "train")
ev_known = getData(evaluation, "known")
ev_unknown = getData(evaluation, "unknown")

Using this evaluation method, the Item Based Collaborative Filter and the User Based Collaborative Filter methods were compared. A table including the Root Mean Square Errors (RMSE), the Mean Squared Errors (MSE) and the Mean Absolute Errors (MAE) is below to more easily compare the each method’s performance.

# Item
item_model = Recommender(data = ev_train, method = "IBCF", parameter = list(method = "Cosine"))

item_model_pred = predict(object = item_model, newdata = ev_known, n = 10, type = "ratings")

item = calcPredictionAccuracy(x = item_model_pred, data = ev_unknown, byUser = FALSE)


# User
user_model = Recommender(data = ev_train, method = "UBCF", parameter = list(method = "Cosine"))

user_model_pred = predict(object = user_model, newdata = ev_known, n = 10, type = "ratings")

user = calcPredictionAccuracy(x = user_model_pred, data = ev_unknown, byUser = FALSE)


# Comparison
kable(rbind(item, user))

	RMSE	MSE	MAE
item	1.3651117	1.8635299	1.0252760
user	0.9378846	0.8796276	0.7180434

Across all three levels (RMSE, MSE and MAE), the user-based method scored lower and, hence, performed better.

To get a closer look at the performance between the IBCF and UBCF methods, confusion matrices of the k-folds were created and are shown below.

eval_sets = evaluationScheme(data = data, method = "cross-validation", k = 4, given = 10, goodRating = 3.5)

I_results = evaluate(x = eval_sets, method = "IBCF", n = seq(10, 100, 10))

## IBCF run fold/sample [model time/prediction time]
##   1  [0.16sec/0.02sec] 
##   2  [0.14sec/0.03sec] 
##   3  [0.11sec/0.03sec] 
##   4  [0.14sec/0.03sec]

kable(head(getConfusionMatrix(I_results)[[1]]))

	TP	FP	FN	TN	precision	recall	TPR	FPR
10	1.551282	8.448718	43.79487	217.2051	0.1551282	0.0350662	0.0350662	0.0373916
20	3.282051	16.717949	42.06410	208.9359	0.1641026	0.0733280	0.0733280	0.0739931
30	5.025641	24.974359	40.32051	200.6795	0.1675214	0.1120181	0.1120181	0.1107101
40	6.820513	33.179487	38.52564	192.4744	0.1705128	0.1497783	0.1497783	0.1468460
50	8.589744	41.410256	36.75641	184.2436	0.1717949	0.1854553	0.1854553	0.1828947
60	10.205128	49.794872	35.14103	175.8590	0.1700855	0.2250039	0.2250039	0.2202029

U_results = evaluate(x = eval_sets, method = "UBCF", n = seq(10, 100, 10))

## UBCF run fold/sample [model time/prediction time]
##   1  [0sec/0.08sec] 
##   2  [0sec/0.1sec] 
##   3  [0.02sec/0.08sec] 
##   4  [0sec/0.11sec]

kable(head(getConfusionMatrix(U_results)[[1]]))

	TP	FP	FN	TN	precision	recall	TPR	FPR
10	3.192308	6.807692	42.15385	218.8462	0.3192308	0.0773929	0.0773929	0.0293254
20	5.666667	14.333333	39.67949	211.3205	0.2833333	0.1322142	0.1322142	0.0619863
30	7.858974	22.141026	37.48718	203.5128	0.2619658	0.1787146	0.1787146	0.0960339
40	9.615385	30.384615	35.73077	195.2692	0.2403846	0.2147292	0.2147292	0.1325856
50	11.294872	38.705128	34.05128	186.9487	0.2258974	0.2507996	0.2507996	0.1695192
60	13.038462	46.961539	32.30769	178.6923	0.2173077	0.2898020	0.2898020	0.2059630

It appears that, overall, the UBCF method has a higher chance of correctly predicting True Positives, but the IBCF has a higher chance of correctly predicting False Negatives. Also, the UBCF has an overall higher precision and recall than the IBCF.

The ROC plots of the IBCF and UBCF performance are R below for comparison.

plot(I_results, annotate = TRUE, main = "ROC curve of IBCF")

plot(U_results, annotate = TRUE, main = "ROC curve of UBCF")

Just from looking at the ROC curves, the performance between the two methods appears to have similar results. However, the UBCF approach has a higher TPR and a slight curve between the 10 and 40 mark, which gives it an edge over the IBCF method.

Comparing Different Model Approaches

Beyond the IBCF-by-Cosine and UBCF-by-Cosine approach, there are other algorithms that can be used to measure distance between users or items and to make recommendations. The Pearson Correlation can be used as measure of distance instead of Cosine. Recommendations can be made at random. Recommendations can also be made based on which items are most popular. Each of these approaches (IBCF by Pearson, UBCF by Pearson, Random and Popular) can be evaluated and compared to the already measured IBCF by Cosine and UBCF by Cosine.

mult_models = list(
  IBCF_cos = list(name = "IBCF", param = list(method = "Cosine")),
  IBCF_pearson = list(name = "IBCF", param = list(method = "pearson")),
  UBCF_cos = list(name = "UBCF", param = list(method = "Cosine")),
  UBCF_pearson = list(name = "UBCF", param = list(method = "pearson")),
  Random = list(name = "RANDOM", param = NULL),
  Popular = list(name = "POPULAR", param = NULL)
)

# Testing models
models = evaluate(eval_sets, mult_models, n= c(1, 5, seq(10, 100, 10)))

## IBCF run fold/sample [model time/prediction time]
##   1  [0.15sec/0.02sec] 
##   2  [0.19sec/0.04sec] 
##   3  [0.16sec/0.03sec] 
##   4  [0.15sec/0.01sec] 
## IBCF run fold/sample [model time/prediction time]
##   1  [0.29sec/0.02sec] 
##   2  [0.16sec/0.03sec] 
##   3  [0.15sec/0.03sec] 
##   4  [0.16sec/0.03sec] 
## UBCF run fold/sample [model time/prediction time]
##   1  [0sec/0.08sec] 
##   2  [0sec/0.09sec] 
##   3  [0sec/0.1sec] 
##   4  [0sec/0.09sec] 
## UBCF run fold/sample [model time/prediction time]
##   1  [0sec/0.11sec] 
##   2  [0sec/0.21sec] 
##   3  [0sec/0.09sec] 
##   4  [0sec/0.08sec] 
## RANDOM run fold/sample [model time/prediction time]
##   1  [0sec/0.03sec] 
##   2  [0.01sec/0.05sec] 
##   3  [0sec/0.04sec] 
##   4  [0sec/0.05sec] 
## POPULAR run fold/sample [model time/prediction time]
##   1  [0sec/0.09sec] 
##   2  [0.03sec/0.14sec] 
##   3  [0sec/0.12sec] 
##   4  [0.01sec/0.1sec]

The ROC and Precision/Recall plots below show the performance of each of the models.

# Plotting models
plot(models, annotate = T, legend="topleft")

plot(models, "prec/rec", annotate = F, main="Precision/Recall", legend="topright")

In both plots, the “Popular” model was surprisingly the one with the best performance. For both the IBCF and the UBCF methods, the Pearson correlation appeared to be the better approach in terms of measuring distance between items or users. In addition, the UBCF approach performed better than the IBCF approach, regardles of metric (Pearson or cosine). The random approach did not perform well, which was not unexpected, however, seeing the results side by side, it is interesting how the IBCF cosine scored just as low.

Conclusion

The best approach using this downsized MovieLens dataset was by recommending Popular movies. That being said, this may have been the result of the downsizing method since the most reviewed movies were the ones retained for evaluation purposes. If that is indeed the case, then the next best approach to consider is the User-to-User Collaborative Filtering algorithm (using the Pearson Correlation as a measurement of distance between users). When viewing the results by comparing RMSE, MSE and MAE values, the UBCF method performed better on all accounts. This was further noted on the ROC plots in the “Comparing Different Model Approaches” section, where the IBCF results were almost as low as randomly selecting movies. The reason for the downsizing being done the way it was can primarily be attributed to the need for guaranteeing a large enough pool of entries in a rather sparse matrix. It will be interesting to note whether these results hold up in a setting where users and movies weren’t specifically picked for their higher traffic in terms of ratings.

Content-Based and Collaborative Filtering

Georgia Galanopoulos

June 19, 2018

Data input

Loading libraries/dataset

Data Exploration

Item_Item Collab

Building the Model

Making Recommendations

User-User Collab

Building the Model

Making Recommendations

Comparing Models

Comparing IBCF and UBCF

Comparing Different Model Approaches

Conclusion