In this assignment, I will build a personalized recommendation system using the same movie survey data from the Global Baseline Estimate assignment.
The earlier model was not personalized because it gave general predictions based on overall patterns in the data. In this assignment, the goal is different. Here, I need to recommend movies in a way that depends on the specific user, so that different users can receive different recommendations.
Data
I will use the same MovieRatings survey dataset. Each row represents one user and each movie has its own rating column. Some ratings are missing because not every user rated every movie.
This kind of data works well for recommender systems because it shows which users liked or did not like certain movies and it also gives enough structure to compare users with each other.
Recommendation Method
For this assignment, I will use user-to-user collaborative filtering. I chose this method because it is a personalized recommendation approach. The main idea is to compare users based on the ratings. If two users have similar rating behavior, then movies liked by one of them may also be good recommendations for the other one. So instead of making the same recommendation for everyone, this method uses similar users to make more personal suggestions.
How I Plan to Build the Model
First, I will clean and reshape the movie ratings data into a format that is easier to work with in R.
Then I will create a user-item rating matrix, where:
rows represent users
columns represent movies
cells contain the rating values
After that, I will apply a user-to-user collaborative filtering method. This means I will measure similarity between users based on the movies they both rated. Then I will use those similarities to help predict ratings for movies a user has not rated yet.
What the Recommender Will Output
The recommender will produce predicted ratings for movies a user has not already rated.
Then, based on those predicted ratings, I will generate a ranked recommendation list for each user. In other words, the system will identify which unseen movies are most likely to be liked by that specific user.
Evaluation Plan
To evaluate the recommender, I will use a hold-out approach.
This means I will split the data into:
a training set, which is used to build the recommender
a test set, which is used to check how well the model predicts ratings that were not used during training
Then I will compare the predicted ratings to the actual ratings in the test data.
For performance evaluation, I will use an appropriate prediction accuracy measure such as RMSE or MAE. This shows how close the predicted ratings are to the real ratings.
Tools
I may use an existing recommender package in R rather than building the full algorithm completely from scratch. This is allowed by the assignment, and it will let me focus on the recommendation method, the output, and the evaluation.
Final Deliverable
My submission will include:
the code used to prepare the data and build the recommender
the recommendation output produced by the model
a brief explanation of how the personalized recommender was built
a brief explanation of how the model was evaluated
This way will contain a personalized algorithm, recommendation results, model evaluation and explanation.
Code Base
Load Packages
library(readxl)library(dplyr)
Attaching package: 'dplyr'
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
library(tidyr)library(recommenderlab)
Loading required package: Matrix
Attaching package: 'Matrix'
The following objects are masked from 'package:tidyr':
expand, pack, unpack
Loading required package: arules
Attaching package: 'arules'
The following object is masked from 'package:dplyr':
recode
The following objects are masked from 'package:base':
abbreviate, write
Loading required package: proxy
Attaching package: 'proxy'
The following object is masked from 'package:Matrix':
as.matrix
The following objects are masked from 'package:stats':
as.dist, dist
The following object is masked from 'package:base':
as.matrix
library(knitr)
Read the Data
I use the same Excel file from the earlier recommender assignment.
# A tibble: 16 × 7
Critic CaptainAmerica Deadpool Frozen JungleBook PitchPerfect2 StarWarsForce
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Burton NA NA NA 4 NA 4
2 Charley 4 5 4 3 2 3
3 Dan NA 5 NA NA NA 5
4 Dieudo… 5 4 NA NA NA 5
5 Matt 4 NA 2 NA 2 5
6 Mauric… 4 NA 3 3 4 NA
7 Max 4 4 4 2 2 4
8 Nathan NA NA NA NA NA 4
9 Param 4 4 1 NA NA 5
10 Parshu 4 3 5 5 2 3
11 Prasha… 5 5 5 5 NA 4
12 Shipra NA NA 4 5 NA 3
13 Sreeja… 5 5 5 4 4 5
14 Steve 4 NA NA NA NA 4
15 Vuthy 4 5 3 3 3 NA
16 Xingjia NA NA 5 5 NA NA
Prepare the Ratings Matrix
The first column is the user name, and the remaining columns are movie ratings.
CaptainAmerica Deadpool Frozen JungleBook PitchPerfect2 StarWarsForce
Burton NA NA NA 4 NA 4
Charley 4 5 4 3 2 3
Dan NA 5 NA NA NA 5
Dieudonne 5 4 NA NA NA 5
Matt 4 NA 2 NA 2 5
Mauricio 4 NA 3 3 4 NA
Max 4 4 4 2 2 4
Nathan NA NA NA NA NA 4
Param 4 4 1 NA NA 5
Parshu 4 3 5 5 2 3
Prashanth 5 5 5 5 NA 4
Shipra NA NA 4 5 NA 3
Sreejaya 5 5 5 4 4 5
Steve 4 NA NA NA NA 4
Vuthy 4 5 3 3 3 NA
Xingjia NA NA 5 5 NA NA
Convert to Recommender Format
The recommenderlab package uses a special rating matrix format.
CaptainAmerica Deadpool Frozen JungleBook PitchPerfect2 StarWarsForce
Burton NA 4.00 NA NA NA NA
Charley NA NA NA NA NA NA
Dan NA NA NA 5.00 NA NA
Dieudonne NA NA 3.8 4.17 4.31 NA
Matt NA 3.64 NA 2.65 NA NA
Mauricio NA 3.39 NA NA NA 4.39
Max NA NA NA NA NA NA
Nathan NA 4.00 NA 4.00 NA NA
Param NA NA NA 3.00 3.11 NA
Parshu NA NA NA NA NA NA
Prashanth NA NA NA NA 3.13 NA
Shipra 4.26 3.78 NA NA 2.33 NA
Sreejaya NA NA NA NA NA NA
Steve NA 4.00 NA 4.00 NA NA
Vuthy NA NA NA NA NA 4.29
Xingjia 5.50 6.50 NA NA 3.50 4.83
Evaluate the Recommender
To evaluate the model, I use a hold-out method. I split the data into training and test sets, then compare predicted ratings with actual ratings.
At this step, I train the recommender using only the training data. The model can learn user rating patterns from one part of the dataset first, instead of seeing everything at once. I still use user 2 user collaborative filtering, where the model looks for users with similar rating behavior.
ubcf_train_model <-Recommender(data = train_set,method ="UBCF",parameter =list(method ="Cosine", nn =3))
Make Predictions on the Test Data
To make rating predictions on the test portion of the data, the model uses the known ratings in the test set to estimate the ratings for movies that were left out. This helps me see how well the recommender can predict values it did not directly train on.
I compare the predicted ratings to the real ratings from the hold out data. I use RMSE, MSE, and MAE because these are accuracy measures for rating prediction. They help show how close the recommender’s predicted values are to the actual values. Smaller error values mean the model is doing a better job.
This model was built using user-to-user collaborative filtering. It compares users based on similar rating behavior and then uses those similarities to recommend movies.
The recommender output includes:
top 3 recommended movies for each user
predicted ratings for movies not yet rated
To evaluate the model, I used a hold-out split with training and test data. Then I measured how close the predicted ratings were to the real ratings by using RMSE, MSE, and MAE.