I choose to use a package that already exist to do content-based filtering. The package I am using is the recommenderlab package authored by Michael Hahsler. I would like my algorithm to recommend the top 5 items based off different customer buyer history. I will test my algorithm using rank based metrics to see how accurate its based off the data. Some rank based metrics I will use include Precision, Recall, and F1-scores.
Challenges
The challenges I think of now is if the data is missing information to make accurate recommendations and how to tackle that. The first fix I can think of is to recommend items that may be popular in the customers area. I am also worried that the system might overspecialize certain items based on a customers data. I experience this problem from my customer end application like YouTube when I only get recommended videos that I may have watch a lot of recently but got sick of. Using weights or some other method of random item generation recommendations may help diversify item recommendation.
Libraries
library(dplyr)
Attaching package: 'dplyr'
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
library(recommenderlab)
Loading required package: Matrix
Loading required package: arules
Attaching package: 'arules'
The following object is masked from 'package:dplyr':
recode
The following objects are masked from 'package:base':
abbreviate, write
Loading required package: proxy
Attaching package: 'proxy'
The following object is masked from 'package:Matrix':
as.matrix
The following objects are masked from 'package:stats':
as.dist, dist
The following object is masked from 'package:base':
as.matrix
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ tidyr::expand() masks Matrix::expand()
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
✖ tidyr::pack() masks Matrix::pack()
✖ arules::recode() masks dplyr::recode()
✖ tidyr::unpack() masks Matrix::unpack()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(tidyr)library(readxl)library(caret)
Loading required package: lattice
Attaching package: 'caret'
The following object is masked from 'package:purrr':
lift
The following objects are masked from 'package:recommenderlab':
MAE, RMSE
Critic CaptainAmerica Deadpool Frozen JungleBook PitchPerfect2
1 Burton NA NA NA 4 NA
2 Charley 4 5 4 3 2
3 Dan NA 5 NA NA NA
4 Dieudonne 5 4 NA NA NA
5 Matt 4 NA 2 NA 2
6 Mauricio 4 NA 3 3 4
StarWarsForce
1 4
2 3
3 5
4 5
5 5
6 NA
After reviewing this data set I believe the best personalized recommendation algorithm is user to user collaborative filtering. This is because of the focus of user rating input.
CaptainAmerica Deadpool Frozen JungleBook PitchPerfect2 StarWarsForce
Burton NA NA NA 4 NA 4
Charley 4 5 4 3 2 3
Dan NA 5 NA NA NA 5
Dieudonne 5 4 NA NA NA 5
Matt 4 NA 2 NA 2 5
Mauricio 4 NA 3 3 4 NA
Max 4 4 4 2 2 4
Nathan NA NA NA NA NA 4
Param 4 4 1 NA NA 5
Parshu 4 3 5 5 2 3
Prashanth 5 5 5 5 NA 4
Shipra NA NA 4 5 NA 3
Sreejaya 5 5 5 4 4 5
Steve 4 NA NA NA NA 4
Vuthy 4 5 3 3 3 NA
Xingjia NA NA 5 5 NA NA
The accuracy shows that the model is off by one star on average. With a smaller data set this is expected due to low availability of data.
model <-Recommender(ratings, method ="UBCF")for(i in1:nrow(ratings)){ top3 <-predict(model, ratings[i], n =3) rec_list <-as(top3, "list")if(length(rec_list[[1]]) >0){cat("Top 3 Recommendations for", rownames(ratings_matrix)[i], ":\n")print(unlist(rec_list)) }}
Top 3 Recommendations for Dieudonne :
01 02 03
"JungleBook" "Frozen" "PitchPerfect2"
Top 3 Recommendations for Matt :
01 02
"Deadpool" "JungleBook"
Top 3 Recommendations for Mauricio :
01 02
"StarWarsForce" "Deadpool"
Top 3 Recommendations for Param :
01 02
"JungleBook" "PitchPerfect2"
Top 3 Recommendations for Prashanth :
0
"PitchPerfect2"
Top 3 Recommendations for Shipra :
01 02 03
"CaptainAmerica" "Deadpool" "PitchPerfect2"
Top 3 Recommendations for Vuthy :
0
"StarWarsForce"
The critics who do not have three recommended movies most likely have seen most of the movies watch. The recommendation is not going to recommended something a critic already watched. Vuthy as an example has watched every movies except Star Wars so it was recommended.
Conclusions
Using the recommender lab package I was able to make a Userbased content filter to recommend the type of movies each critic would want to watch based off of their ratings. I used cross validation to split the data into testing and training. Using accuracy metrics to test whether the model is accurate and found that it usually is off by one star. I then ran the model for each critic to recommend movies they have not watched.
Citations
Hahsler, M. (2022). recommenderlab: An R framework for developing and testing recommendation algorithms. CRAN. https://cran.r-project.org/package=recommenderlab