Personalized Recommendation System

Author

Ramde Guibril

Approach

The objective of this project is to develop a personalized recommendation system using the provided survey dataset. Unlike the previous assignment, which relied on a global baseline estimate and produced the same recommendations for all users, this project focuses on generating user-specific recommendations based on individual preferences.

Recommendation Method

For this analysis, I implement an Item-to-Item Collaborative Filtering approach. This method recommends items to a user based on the similarity between items, rather than similarities between users. The core idea is that if a user has shown interest in a particular item, they are likely to prefer other items that are similar to it.

To achieve this, a user-item interaction matrix is constructed from the survey data, where rows represent users and columns represent items (e.g., movies, products, or survey responses). Each cell contains the user’s rating or interaction with the item.

Next, similarity scores between items are computed using a similarity metric such as cosine similarity or Pearson correlation. These similarity scores are then used to identify items that are most closely related.

Generating Recommendations

For each user, the system:

Identifies items the user has already interacted with
Finds similar items based on the similarity matrix
Ranks these items according to their similarity scores
Outputs a Top-N list of recommended items

This results in a personalized ranked list of recommendations tailored to each user’s preferences.

Evaluation Strategy

To evaluate the performance of the recommender system, the dataset is split into training and testing sets. The model is trained on the training data and evaluated on unseen test data.

Performance is assessed using appropriate metrics such as:

Root Mean Squared Error (RMSE) for predicted ratings
Precision and Recall for evaluating the quality of Top-N recommendations

This evaluation ensures that the recommender system not only produces personalized results but also maintains accuracy and relevance.

Tools and Implementation

The system is implemented in R using the recommenderlab package, which provides efficient data structures and functions for building collaborative filtering models. Data preprocessing and transformation are performed using the tidyverse suite of packages.

Overall, this approach enables the development of a scalable and effective personalized recommendation system that improves upon non-personalized baseline methods by leveraging patterns in user behavior.Running Code

When you click the Render button a document will be generated that includes both content and the output of embedded code. You can embed code like this:

library(dplyr)

Warning: package 'dplyr' was built under R version 4.5.2


Attaching package: 'dplyr'

The following objects are masked from 'package:stats':

    filter, lag

The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union

library(tidyr)

Warning: package 'tidyr' was built under R version 4.5.2

library(readr)

Warning: package 'readr' was built under R version 4.5.2

library(recommenderlab)

Loading required package: Matrix


Attaching package: 'Matrix'

The following objects are masked from 'package:tidyr':

    expand, pack, unpack

Loading required package: arules

Warning: package 'arules' was built under R version 4.5.2


Attaching package: 'arules'

The following object is masked from 'package:dplyr':

    recode

The following objects are masked from 'package:base':

    abbreviate, write

Loading required package: proxy

Warning: package 'proxy' was built under R version 4.5.2


Attaching package: 'proxy'

The following object is masked from 'package:Matrix':

    as.matrix

The following objects are masked from 'package:stats':

    as.dist, dist

The following object is masked from 'package:base':

    as.matrix

# Load the dataset from GitHub
url <- "https://raw.githubusercontent.com/japhet125/global-baseline-assign/refs/heads/main/u.data"

ratings <- read_delim(
  url,
  delim = "\t",
  col_names = c("userId", "movieId", "rating", "timestamp"),
  show_col_types = FALSE
)

# Keep only the relevant columns
ratings <- ratings |>
  select("userId", "movieId", "rating")

# Preview the dataset
head(ratings)

# A tibble: 6 × 3
  userId movieId rating
   <dbl>   <dbl>  <dbl>
1    196     242      3
2    186     302      3
3     22     377      1
4    244      51      2
5    166     346      1
6    298     474      4

dim(ratings)

[1] 100000      3

##Data Preparation

First, I convert the rating data into a user-item matrix. This format is required for collaborative filtering because it allows the model to compare item rating patterns across users.

ratings_wide <- ratings |>
  pivot_wider(names_from = movieId, values_from = rating)

ratings_mat <- as.matrix(ratings_wide[, -1])
rownames(ratings_mat) <- ratings_wide$userId

ratings_rrm <- as(ratings_mat, "realRatingMatrix")

ratings_rrm

943 x 1682 rating matrix of class 'realRatingMatrix' with 100000 ratings.

The rating matrix contains 943 users and 1682 movies. Each row represents a user, each column represents a movie, and each cell contains a rating when available.

##Explore the Rating Matrix

This step helps summarize the structure of the data and confirms that the recommendation matrix was created successfully.

dim(ratings_rrm)

[1]  943 1682

image(ratings_rrm[1:100, 1:100], main = "Sample of User_Item Rating Matrix")

##Split the Data

To evaluate the recommender fairly, I split the data into training and testing sets. The model is trained on one portion of the data and tested on unseen ratings.

set.seed(123)

scheme <- evaluationScheme(
  data = ratings_rrm,
  method = "split",
  train = 0.8,
  given = 10,
  goodRating = 4
)

getData(scheme, "train")

754 x 1682 rating matrix of class 'realRatingMatrix' with 80191 ratings.

getData(scheme, "known")

189 x 1682 rating matrix of class 'realRatingMatrix' with 1891 ratings.

getData(scheme, "unknown")

189 x 1682 rating matrix of class 'realRatingMatrix' with 17918 ratings.

##Build the Recommender Model

I use an Item-Based Collaborative Filtering model. This method recommends movies by identifying items with similar rating behavior across users.

recommender_model <- Recommender(
  data = getData(scheme, "train"),
  method = "IBCF",
  parameter = list(method = "Cosine", k = 30)
)
recommender_model

Recommender of type 'IBCF' for 'realRatingMatrix' 
learned using 754 users.

##Generating Top-N Recommendations

The recommender outputs a ranked list of movies for each user. Below, I generate the top 5 recommendations for users in the test set.

top5_predictions <- predict(
  object = recommender_model,
  newdata = getData(scheme, "known"),
  n = 5,
  type = "topNList"
)

top5_list <- as(top5_predictions, "list")

head(top5_list, 5)

$`0`
[1] "1616"

$`1`
[1] "1523" "1618" "1122" "1431" "1661"

$`2`
[1] "711"  "1616" "1604"

$`3`
[1] "1398" "1533" "1523" "1616" "1593"

$`4`
[1] "392" "4"   "416" "625" "901"

##Predicted Ratings

In addition to top-N recommendations, the system can estimate predicted ratings for missing items.

predicted_ratings <- predict(
  object = recommender_model,
  newdata = getData(scheme, "known"),
  type = "ratings"
)

predicted_ratings

189 x 1682 rating matrix of class 'realRatingMatrix' with 5140 ratings.

##Evaluate the Recommender

To evaluate performance, I use Top-N recommendation accuracy. This measures how well the recommended items match the relevant items in the held-out test data.

results <- evaluate(
  x = scheme,
  method = "IBCF",
  type = "topNList",
  n = c(1, 3, 5, 10)
)

IBCF run fold/sample [model time/prediction time]
     1  [1.227sec/0.103sec]

results

Evaluation results for 1 folds/samples using method 'IBCF'.

The evaluation plot summarizes the recommender’s performance for different recommendation list sizes. As the number of recommended items increases, recall may improve because the model has more chances to include relevant movies, while precision may decrease because the recommendation list becomes broader.

##Accuracy Metrics

The following plot summarizes model performance using evaluation metrics such as precision and recall.

plot(results, annotate = TRUE, legend = "topright")

Warning in plot.window(...): "legend" is not a graphical parameter

Warning in plot.xy(xy, type, ...): "legend" is not a graphical parameter

Warning in axis(side = side, at = at, labels = labels, ...): "legend" is not a
graphical parameter
Warning in axis(side = side, at = at, labels = labels, ...): "legend" is not a
graphical parameter

Warning in box(...): "legend" is not a graphical parameter

Warning in title(...): "legend" is not a graphical parameter

first_user <- names(top5_list)[1]
top5_list[[1]]

[1] "1616"

##Get the Movies Title

# Load movie titles
movies_url <- "https://files.grouplens.org/datasets/movielens/ml-100k/u.item"

movies <- read_delim(
  movies_url,
  delim = "|",
  col_names = FALSE,
  show_col_types = FALSE,
  locale = locale(encoding = "latin1")
)

# Keep only movieId and title
movies <- movies |>
  select(X1, X2) |>
  rename(
    movieId = X1,
    title = X2
  )

head(movies)

# A tibble: 6 × 2
  movieId title                                               
    <dbl> <chr>                                               
1       1 Toy Story (1995)                                    
2       2 GoldenEye (1995)                                    
3       3 Four Rooms (1995)                                   
4       4 Get Shorty (1995)                                   
5       5 Copycat (1995)                                      
6       6 Shanghai Triad (Yao a yao yao dao waipo qiao) (1995)

# Convert list to dataframe
recommendations_df <- stack(top5_list)

# Rename columns
colnames(recommendations_df) <- c("movieId", "user")

# Convert movieId to numeric
recommendations_df$movieId <- as.numeric(recommendations_df$movieId)

head(recommendations_df)

  movieId user
1    1616    0
2    1523    1
3    1618    1
4    1122    1
5    1431    1
6    1661    1

recommendations_with_titles <- recommendations_df |>
  left_join(movies, by = "movieId")

head(recommendations_with_titles)

  movieId user                          title
1    1616    0            Desert Winds (1995)
2    1523    1   Good Man in Africa, A (1994)
3    1618    1        King of New York (1990)
4    1122    1 They Made Me a Criminal (1939)
5    1431    1            Legal Deceit (1997)
6    1661    1            New Age, The (1994)

##Recommendations for one User

# Example: first user
recommendations_with_titles |>
  filter(user == names(top5_list)[1])

  movieId user               title
1    1616    0 Desert Winds (1995)

To improve interpretability, movie IDs were mapped to their corresponding titles using the MovieLens metadata. This allows the recommender system to output human-readable recommendations rather than numeric identifiers

##Conclusion

This project developed a personalized recommendation system using Item-to-Item Collaborative Filtering. Unlike a global baseline model, this recommender produces user-specific movie suggestions by identifying relationships among items based on user rating patterns. The model was evaluated using a train/test split and Top-N recommendation metrics. The results show that collaborative filtering can generate more relevant and personalized recommendations than non-personalized approaches. Overall, this method demonstrates how recommender systems can use past user behavior to predict future preferences and improve the quality of recommendations.