A recommender system is an algorithm designed to suggest items to users based on their past preferences or behaviors. One major category of recommender systems is collaborative filtering, which relies on information about similar users or similar items rather than item content. Collaborative filtering has two main types. In item-item collaborative filtering, the system recommends to a user the items that are most similar to the ones they have previously interacted with or purchased. In user-user collaborative filtering, the system recommends to a user the items that are most preferred by users with similar tastes or behaviors.
For part 1 of my project I will develop and evaluate a User-User Collaborative Filtering recommender system using the MovieLense dataset from the recommenderlab package in R. The goal is to compare the performance of several UBCF configurations by varying key parameters such as similarity metrics (cosine vs. Pearson), neighborhood sizes (10, 30, 50), and normalization techniques (centering). We assess the predictive accuracy of each model using Root Mean Squared Error (RMSE) and provide visualizations to support our analysis. Based on the results, we identify the most effective configuration and provide a recommendation for building an accurate user-based recommender system.
Load the packages and data
library("ggplot2")
library("recommenderlab")
## Loading required package: Matrix
## Loading required package: arules
##
## Attaching package: 'arules'
## The following objects are masked from 'package:base':
##
## abbreviate, write
## Loading required package: proxy
##
## Attaching package: 'proxy'
## The following object is masked from 'package:Matrix':
##
## as.matrix
## The following objects are masked from 'package:stats':
##
## as.dist, dist
## The following object is masked from 'package:base':
##
## as.matrix
## Registered S3 methods overwritten by 'registry':
## method from
## print.registry_field proxy
## print.registry_entry proxy
help(package = "recommenderlab")
data(MovieLense)
MovieLense
## 943 x 1664 rating matrix of class 'realRatingMatrix' with 99392 ratings.
Filter users with >50 ratings
MovieLense <- MovieLense[rowCounts(MovieLense) > 50, ]
Splits the data into 80% training and 20% testing, where each test user is given only 10 known ratings and a good rating is 4.
set.seed(123)
eval_scheme <- evaluationScheme(
MovieLense, method = "split", train = 0.8, given = 10, goodRating = 4)
Here is five UBCF models with different configurations by varying the similarity metric (cosine or Pearson), neighborhood size (10, 30, or 50), and using centered normalization to user ratings. Cosine focus on the the angle between two vectors while pearson correlation coefficient measures the relationship between two variables while adjusting to a ratings scale.The neighbor size determines how many of these similar users are considered when predicting the target user’s preferences.
algorithms <- list(
"UBCF_Cosine_10" = list(name = "UBCF", param = list(method = "cosine", nn = 10, normalize = "center")),
"UBCF_Cosine_30" = list(name = "UBCF", param = list(method = "cosine", nn = 30, normalize = "center")),
"UBCF_Cosine_50" = list(name = "UBCF", param = list(method = "cosine", nn = 50, normalize = "center")),
"UBCF_Pearson_30" = list(name = "UBCF", param = list(method = "pearson", nn = 30, normalize = "center")),
"UBCF_Pearson_50" = list(name = "UBCF", param = list(method = "pearson", nn = 50, normalize = "center"))
)
To evaluate model performance, we use Root Mean Squared Error (RMSE) and Mean Absolute Error (MAE)—both measure the difference between predicted and actual ratings, with RMSE penalizing larger errors more heavily, while MAE provides a straightforward average of absolute prediction errors. I used RMSE as the evaluation metric because it penalizes larger errors more than MAE, making it more sensitive to inaccurate predictions and better suited for identifying models that minimize large deviations in rating predictions.
# Create an empty data frame to store results
results_df <- data.frame(Model = character(), RMSE = numeric(), MAE = numeric(), stringsAsFactors = FALSE)
# Loop through models and compute RMSE/MAE
for (model_name in names(algorithms)) {
# Train model
rec_model <- Recommender(getData(eval_scheme, "train"), method = algorithms[[model_name]]$name,
parameter = algorithms[[model_name]]$param)
# Predict ratings
pred <- predict(rec_model, getData(eval_scheme, "known"), type = "ratings")
# Calculate prediction accuracy
acc <- calcPredictionAccuracy(pred, getData(eval_scheme, "unknown"))
# Store results
results_df <- rbind(results_df, data.frame(Model = model_name, RMSE = acc["RMSE"], MAE = acc["MAE"]))
}
# View results
results_df
## Model RMSE MAE
## RMSE UBCF_Cosine_10 1.227006 0.9706082
## RMSE1 UBCF_Cosine_30 1.139635 0.9049643
## RMSE2 UBCF_Cosine_50 1.092094 0.8699586
## RMSE3 UBCF_Pearson_30 1.104850 0.8756521
## RMSE4 UBCF_Pearson_50 1.058874 0.8373880
# Plot RMSE for each model
ggplot(results_df, aes(x = reorder(Model, RMSE), y = RMSE)) +
geom_bar(stat = "identity", fill = "steelblue") +
geom_text(aes(label = round(RMSE, 3)), vjust = -0.5, size = 2) +
theme_minimal() +
labs(
title = "User-User CF: RMSE Comparison",
x = "Model",
y = "RMSE"
) +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
Based on the results, the recommended configuration for User-User Collaborative Filtering on the MovieLense dataset is:
This setup offers a strong balance between prediction accuracy and collaborative robustness.
The superior performance of the UBCF model using Pearson similarity with a neighborhood size of 50 can be attributed to both the nature of the dataset and the algorithm’s behavior. The MovieLense dataset contains user ratings on a standardized scale (1–5), and Pearson similarity is effective in this context because it accounts for differences in individual user rating tendencies (e.g., some users rate more harshly or leniently). This normalization makes it easier to identify truly similar users. A larger neighborhood size (50) allows the model to aggregate preferences from a broader set of users, reducing the risk of overfitting to a small, potentially noisy sample. At the same time, centered normalization removes user-specific biases, which further improves consistency. These factors combined lead to more accurate and stable predictions. In contrast, cosine similarity does not account for rating scale differences and performed worse across all neighborhood sizes, suggesting that accounting for user bias is particularly important in this dataset.
We now evaluate Item-Item Collaborative Filtering models using the same dataset and evaluation scheme as before. The key parameters tested include: - Similarity metrics: cosine vs. Pearson - Neighborhood sizes: 30 and 50 - Normalization: centered ratings
Here are four IBCF models with different configurations by varying the similarity metric (cosine or Pearson), neighborhood size (30, or 50), and using centered normalization to user ratings. Cosine focus on the the angle between two vectors while pearson correlation coefficient measures the relationship between two variables while adjusting to a ratings scale.The neighbor size determines how many of these similar items.
algorithms_ibcf <- list(
"IBCF_Cosine_30" = list(name = "IBCF", param = list(method = "cosine", k = 30, normalize = "center")),
"IBCF_Cosine_50" = list(name = "IBCF", param = list(method = "cosine", k = 50, normalize = "center")),
"IBCF_Pearson_30" = list(name = "IBCF", param = list(method = "pearson", k = 30, normalize = "center")),
"IBCF_Pearson_50" = list(name = "IBCF", param = list(method = "pearson", k = 50, normalize = "center"))
)
To evaluate model performance, we use Root Mean Squared Error (RMSE) and Mean Absolute Error (MAE)—both measure the difference between predicted and actual ratings, with RMSE penalizing larger errors more heavily, while MAE provides a straightforward average of absolute prediction errors. I used RMSE as the evaluation metric because it penalizes larger errors more than MAE, making it more sensitive to inaccurate predictions and better suited for identifying models that minimize large deviations in rating predictions.
results_ibcf <- data.frame(Model = character(), RMSE = numeric(), MAE = numeric(), stringsAsFactors = FALSE)
for (model_name in names(algorithms_ibcf)) {
rec_model <- Recommender(getData(eval_scheme, "train"), method = algorithms_ibcf[[model_name]]$name,
parameter = algorithms_ibcf[[model_name]]$param)
pred <- predict(rec_model, getData(eval_scheme, "known"), type = "ratings")
acc <- calcPredictionAccuracy(pred, getData(eval_scheme, "unknown"))
results_ibcf <- rbind(results_ibcf, data.frame(Model = model_name, RMSE = acc["RMSE"], MAE = acc["MAE"]))
}
results_ibcf
## Model RMSE MAE
## RMSE IBCF_Cosine_30 1.501602 1.096154
## RMSE1 IBCF_Cosine_50 1.461836 1.084526
## RMSE2 IBCF_Pearson_30 1.472364 1.078385
## RMSE3 IBCF_Pearson_50 1.543455 1.160724
ggplot(results_ibcf, aes(x = reorder(Model, RMSE), y = RMSE)) +
geom_bar(stat = "identity", fill = "darkgreen") +
geom_text(aes(label = round(RMSE, 3)), vjust = 0.1) +
theme_minimal() +
labs(title = "Item-Item CF: RMSE Comparison", x = "Model", y = "RMSE") +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
For item-based collaborative filtering using the MovieLense dataset,
we recommend: - Method: IBCF
- Similarity: Cosine
- Neighborhood size (k): 50
- Normalization: Centered ratings
This delivered the most accurate rating predictions among the IBCF models tested.
For IBCF, the best-performing model used cosine similarity with a neighborhood size of 50, achieving the lowest RMSE among the IBCF configurations. This result may be influenced by the structure of the MovieLense dataset, where certain items (movies) tend to have more consistent co-rating patterns across users. Cosine similarity works well in this setting because it captures co-occurrence patterns between items, and with a larger neighborhood size, the model is able to leverage more item-item relationships for prediction. However, IBCF overall performed worse than UBCF, likely because it relies on item similarity alone and does not account for variability in user rating behavior. Pearson similarity underperformed in the item-based context, possibly due to the sparsity of overlapping ratings between items, making it less reliable for measuring item similarity. These findings suggest that while IBCF can be useful, its effectiveness is more sensitive to sparsity and item coverage compared to UBCF.
Collaborative filtering predicts a user’s preferences based on past behavior. In User-User Collaborative Filtering (UBCF), the system recommends items to a user by finding other users with similar rating patterns and suggesting items those similar users liked. In contrast, Item-Item Collaborative Filtering (IBCF) recommends items based on their similarity to items the user has already rated highly—essentially comparing items instead of users. UBCF can capture diverse user preferences and is more dynamic for rapidly changing user behavior, but it may suffer from scalability issues in large datasets. IBCF tends to be more stable and scalable, as item similarities change less frequently than user preferences.
However, both approaches face common challenges. The cold start problem arises when new users or items have insufficient data, making it difficult to generate accurate recommendations. Data sparsity occurs where the majority of users have interacted with only a small subset of items—can further reduce accuracy. Additionally, collaborative filtering models may inherit and even amplify biases in the data, such as disproportionately recommending popular items while ignoring niche interests.
Modern recommender systems increasingly use hybrid approaches that combine collaborative filtering with content-based methods, machine learning, and even deep learning models. Advanced techniques such as matrix factorization, gradient boosting, and neural collaborative filtering are now commonly used to capture complex user-item relationships. As data grows in volume and diversity, future systems are expected to become even more personalized and adaptive.