Assignment: Build a Personalized Recommendation System
In a previous assignment, you implemented a Global Baseline Estimate, which produced non-personalized recommendations. In this assignment, you will use the same survey data to build a personalized recommender system.
Your task is to:
Choose and implement one personalized recommendation algorithm, such as:
Content-based filtering
Item-to-item collaborative filtering
User-to-user collaborative filtering
Matrix factorization
Define what your recommender will output (e.g., top-N items per user, predicted ratings, ranked lists).
Evaluate the performance of your recommender using an appropriate method (e.g., hold-out data, cross-validation, ranking metrics).
You may either:
Use an existing recommender package, or
Implement the algorithm from scratch.
Your submission should include the code, the recommendation output, and a brief explanation of how the model was built and evaluated.
Approach
Due to the small size of the dataset, I chose user-to-user collaborative filtering as the recommendation approach. This method is suitable because it can compute similarities directly from user ratings, is easy to implement, and is straightforward to explain.
For each user, I identify the most similar users based on their rating patterns and recommend movies that those similar users rated highly but the target user has not yet seen.
To make the system more practical and realistic, I extend the approach to generate Top-N recommendations, rather than a single item, by ranking unseen movies based on weighted similarity scores.
The implementation consists of the following steps:
Load the dataset
Compute user-to-user similarity
Apply a Top-N recommendation function
Output the Top-N recommended movies for each user
Code Base
library(DBI)library(RPostgres)# Checking the exiting CSV fileif(file.exists("w11_rating.csv")) {# load both filesprint("Cache files exits\n") w11_data <-read.csv("w11_rating.csv")}else{ con <-dbConnect( RPostgres::Postgres(),dbname ="chatgpt_c", # your database namehost ="192.168.100.61", # or server IPport =5432, # default PostgreSQL portuser ="postgres", # your DB usernamepassword ="ubuntu"# your DB password)dbListTables(con) query <-"SELECT m.title, r.rater_name, r.rating FROM movies m JOIN ratings r ON m.movie_id = r.movie_id ORDER BY m.movie_id, r.rater_name;" w11_data <-dbGetQuery(con, query)write.csv(w11_data, "w11_rating.csv", row.names =FALSE)}
[1] "Cache files exits\n"
head(w11_data)
title rater_name rating
1 Avatar: The Way of Water Alice 5
2 Avatar: The Way of Water Bob 4
3 Avatar: The Way of Water Charlie 3
4 Avatar: The Way of Water David 4
5 Avatar: The Way of Water Eve NA
6 Oppenheimer Alice 4
Option1: Replacing NA with “0”
library(tidyr)library(dplyr)
Attaching package: 'dplyr'
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
title rater_name rating
1 Avatar: The Way of Water Alice 5
2 Avatar: The Way of Water Bob 4
3 Avatar: The Way of Water Charlie 3
4 Avatar: The Way of Water David 4
5 Avatar: The Way of Water Eve 0
6 Oppenheimer Alice 4
Stranger Things S5 Dune Part Two Black Panther The Godfather
4.7007046 2.0016862 1.1797700 0.4937011
Conclusion
In this project, a Top-N recommendation approach was implemented using a relatively small dataset. While the method successfully generates personalized recommendations based on user similarity, the limited size of the dataset introduces notable challenges. Specifically, some movies with very low ratings still appear in the recommendation list. This occurs because the model relies heavily on sparse user interactions, which can lead to unreliable similarity calculations and less accurate ranking of items. As a result, the quality of recommendations is affected.