The objective of this project is to develop a personalized recommendation system using the provided survey dataset. Unlike the previous assignment, which relied on a global baseline estimate and produced the same recommendations for all users, this project focuses on generating user-specific recommendations based on individual preferences.
Recommendation Method
For this analysis, I implement an Item-to-Item Collaborative Filtering approach. This method recommends items to a user based on the similarity between items, rather than similarities between users. The core idea is that if a user has shown interest in a particular item, they are likely to prefer other items that are similar to it.
To achieve this, a user-item interaction matrix is constructed from the survey data, where rows represent users and columns represent items (e.g., movies, products, or survey responses). Each cell contains the user’s rating or interaction with the item.
Next, similarity scores between items are computed using a similarity metric such as cosine similarity or Pearson correlation. These similarity scores are then used to identify items that are most closely related.
Generating Recommendations
For each user, the system:
Identifies items the user has already interacted with
Finds similar items based on the similarity matrix
Ranks these items according to their similarity scores
Outputs a Top-N list of recommended items
This results in a personalized ranked list of recommendations tailored to each user’s preferences.
Evaluation Strategy
To evaluate the performance of the recommender system, the dataset is split into training and testing sets. The model is trained on the training data and evaluated on unseen test data.
Performance is assessed using appropriate metrics such as:
Root Mean Squared Error (RMSE) for predicted ratings
Precision and Recall for evaluating the quality of Top-N recommendations
This evaluation ensures that the recommender system not only produces personalized results but also maintains accuracy and relevance.
Tools and Implementation
The system is implemented in R using the recommenderlab package, which provides efficient data structures and functions for building collaborative filtering models. Data preprocessing and transformation are performed using the tidyverse suite of packages.
Overall, this approach enables the development of a scalable and effective personalized recommendation system that improves upon non-personalized baseline methods by leveraging patterns in user behavior.Running Code
When you click the Render button a document will be generated that includes both content and the output of embedded code. You can embed code like this:
library(dplyr)
Warning: package 'dplyr' was built under R version 4.5.2
Attaching package: 'dplyr'
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
library(tidyr)
Warning: package 'tidyr' was built under R version 4.5.2
library(readr)
Warning: package 'readr' was built under R version 4.5.2
library(recommenderlab)
Loading required package: Matrix
Attaching package: 'Matrix'
The following objects are masked from 'package:tidyr':
expand, pack, unpack
Loading required package: arules
Warning: package 'arules' was built under R version 4.5.2
Attaching package: 'arules'
The following object is masked from 'package:dplyr':
recode
The following objects are masked from 'package:base':
abbreviate, write
Loading required package: proxy
Warning: package 'proxy' was built under R version 4.5.2
Attaching package: 'proxy'
The following object is masked from 'package:Matrix':
as.matrix
The following objects are masked from 'package:stats':
as.dist, dist
The following object is masked from 'package:base':
as.matrix
# Load the dataset from GitHuburl <-"https://raw.githubusercontent.com/japhet125/global-baseline-assign/refs/heads/main/u.data"ratings <-read_delim( url,delim ="\t",col_names =c("userId", "movieId", "rating", "timestamp"),show_col_types =FALSE)# Keep only the relevant columnsratings <- ratings |>select("userId", "movieId", "rating")# Preview the datasethead(ratings)
First, I convert the rating data into a user-item matrix. This format is required for collaborative filtering because it allows the model to compare item rating patterns across users.
943 x 1682 rating matrix of class 'realRatingMatrix' with 100000 ratings.
The rating matrix contains 943 users and 1682 movies. Each row represents a user, each column represents a movie, and each cell contains a rating when available.
##Explore the Rating Matrix
This step helps summarize the structure of the data and confirms that the recommendation matrix was created successfully.
dim(ratings_rrm)
[1] 943 1682
image(ratings_rrm[1:100, 1:100], main ="Sample of User_Item Rating Matrix")
##Split the Data
To evaluate the recommender fairly, I split the data into training and testing sets. The model is trained on one portion of the data and tested on unseen ratings.
189 x 1682 rating matrix of class 'realRatingMatrix' with 5140 ratings.
##Evaluate the Recommender
To evaluate performance, I use Top-N recommendation accuracy. This measures how well the recommended items match the relevant items in the held-out test data.
IBCF run fold/sample [model time/prediction time]
1 [1.227sec/0.103sec]
results
Evaluation results for 1 folds/samples using method 'IBCF'.
The evaluation plot summarizes the recommender’s performance for different recommendation list sizes. As the number of recommended items increases, recall may improve because the model has more chances to include relevant movies, while precision may decrease because the recommendation list becomes broader.
##Accuracy Metrics
The following plot summarizes model performance using evaluation metrics such as precision and recall.
plot(results, annotate =TRUE, legend ="topright")
Warning in plot.window(...): "legend" is not a graphical parameter
Warning in plot.xy(xy, type, ...): "legend" is not a graphical parameter
Warning in axis(side = side, at = at, labels = labels, ...): "legend" is not a
graphical parameter
Warning in axis(side = side, at = at, labels = labels, ...): "legend" is not a
graphical parameter
Warning in box(...): "legend" is not a graphical parameter
Warning in title(...): "legend" is not a graphical parameter
first_user <-names(top5_list)[1]top5_list[[1]]
[1] "1616"
##Get the Movies Title
# Load movie titlesmovies_url <-"https://files.grouplens.org/datasets/movielens/ml-100k/u.item"movies <-read_delim( movies_url,delim ="|",col_names =FALSE,show_col_types =FALSE,locale =locale(encoding ="latin1"))# Keep only movieId and titlemovies <- movies |>select(X1, X2) |>rename(movieId = X1,title = X2 )head(movies)
# A tibble: 6 × 2
movieId title
<dbl> <chr>
1 1 Toy Story (1995)
2 2 GoldenEye (1995)
3 3 Four Rooms (1995)
4 4 Get Shorty (1995)
5 5 Copycat (1995)
6 6 Shanghai Triad (Yao a yao yao dao waipo qiao) (1995)
# Convert list to dataframerecommendations_df <-stack(top5_list)# Rename columnscolnames(recommendations_df) <-c("movieId", "user")# Convert movieId to numericrecommendations_df$movieId <-as.numeric(recommendations_df$movieId)head(recommendations_df)
recommendations_with_titles <- recommendations_df |>left_join(movies, by ="movieId")head(recommendations_with_titles)
movieId user title
1 1616 0 Desert Winds (1995)
2 1523 1 Good Man in Africa, A (1994)
3 1618 1 King of New York (1990)
4 1122 1 They Made Me a Criminal (1939)
5 1431 1 Legal Deceit (1997)
6 1661 1 New Age, The (1994)
##Recommendations for one User
# Example: first userrecommendations_with_titles |>filter(user ==names(top5_list)[1])
movieId user title
1 1616 0 Desert Winds (1995)
To improve interpretability, movie IDs were mapped to their corresponding titles using the MovieLens metadata. This allows the recommender system to output human-readable recommendations rather than numeric identifiers
##Conclusion
This project developed a personalized recommendation system using Item-to-Item Collaborative Filtering. Unlike a global baseline model, this recommender produces user-specific movie suggestions by identifying relationships among items based on user rating patterns. The model was evaluated using a train/test split and Top-N recommendation metrics. The results show that collaborative filtering can generate more relevant and personalized recommendations than non-personalized approaches. Overall, this method demonstrates how recommender systems can use past user behavior to predict future preferences and improve the quality of recommendations.