Assignment 11

Author

Michael Mayne

Assignment 11

Coding Approach

The purpose of this assignment is to use our previous data in order to implement a recommendation system.

I will be using data from assignment 3A in which the global base estimate was useful in order to calculate the type of movie.

Procedure:

  • I can compare the methods and results used and see how the pair to an improved recommendation system

  • I plan to use the package recommenderlab in order to test User base Collaborative filtering on our data and find the movie recommended for each person.

  • Some data will be generated as hold-out data which will be then be used to test out the model created in the end.

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.2.1     ✔ readr     2.2.0
✔ forcats   1.0.1     ✔ stringr   1.6.0
✔ ggplot2   4.0.3     ✔ tibble    3.3.1
✔ lubridate 1.9.5     ✔ tidyr     1.3.2
✔ purrr     1.2.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Code Base

Loading Data Set

We will be using the dataset from the Global recommendation from assignment 3A Code Base in order to be used for this project

FullRating <- read.csv("https://raw.githubusercontent.com/Mayneman000/DATA607Assignment/refs/heads/main/FullMovieRating.csv")
glimpse(FullRating)
Rows: 36
Columns: 3
$ watcher_name <chr> "michael", "michael", "michael", "michael", "michael", "c…
$ movie_title  <chr> "Sinners", "Zoo_2", "Captain_AmericaBNW", "Thunderbolts",…
$ num_rating   <int> 5, 3, 3, 4, NA, 4, 3, 4, 4, 4, 4, 5, 5, 3, 2, 2, 4, 3, 3,…

Creating an Item profile for our data

I initally wanted to use a user based collabrative filtering with recommenderlab but after searching a reading more about the data, I realized that that would not be advised. I confirm this information when checking with Google gemini with my data set size. In order to get the best recommendation in movies, I decided to focus on creating item profiles. I focused on developing the data and information below.

Criteria : Animated, Live-Action, Mature, Family, Upbeat, Dramatic

itemProfile <- data.frame(
  movie_title = c("Sinners", "Zoo_2", "Captain_AmericaBNW", "Thunderbolts", "AvatarWOF", "KPDH"),
  animated = c(0,1,0,0,1,1),
  live_action= c(1,0,1,1,0,0),
  mature = c(1,0,0,1,0,0),
  family = c(0,1,1,0,1,1),
  upbeat = c(0,1,0,0,0,1),
  dramatic = c(1,0,0,1,1,0)
)

Creation of User Profiles

user_info <- merge(FullRating,itemProfile, by= "movie_title")
glimpse(user_info)
Rows: 36
Columns: 9
$ movie_title  <chr> "AvatarWOF", "AvatarWOF", "AvatarWOF", "AvatarWOF", "Avat…
$ watcher_name <chr> "channa", "dimitri", "chris", "kathy", "michael", "norm",…
$ num_rating   <int> 4, 4, 4, 3, NA, NA, 3, 4, 2, 3, 5, 5, NA, 4, 5, 4, 3, NA,…
$ animated     <dbl> 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, …
$ live_action  <dbl> 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, …
$ mature       <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, …
$ family       <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, …
$ upbeat       <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, …
$ dramatic     <dbl> 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, …

Now the next move is to create a combination of user profiles but comparing the values and giving them weighted ratios based on value.

user_profiles <- user_info %>%
  group_by(watcher_name)%>%
  summarise(across(animated:dramatic, ~ sum(.x * num_rating, na.rm = TRUE) / sum(num_rating, na.rm = TRUE)))

This gives a preference profile for each user

print(user_profiles)
# A tibble: 6 × 7
  watcher_name animated live_action mature family upbeat dramatic
  <chr>           <dbl>       <dbl>  <dbl>  <dbl>  <dbl>    <dbl>
1 channa          0.522       0.478  0.304  0.696  0.348    0.478
2 chris           0.444       0.556  0.389  0.611  0.222    0.611
3 dimitri         0.571       0.429  0.333  0.667  0.381    0.524
4 kathy           0.3         0.7    0.45   0.55   0.15     0.6  
5 michael         0.368       0.632  0.474  0.526  0.368    0.474
6 norm            0           1      0.583  0.417  0        0.583

We can see that I have a preference for live action and family movies based on my profile. Norm has a strict preference for live action movies, simply because he has not watched any animated movies.

Testing a Personal Recommendation

As before we will test the recommendation for a movie for “norm” the created 6th user of the group from Assignment 3A.

target1 <- "norm"

norm_vector <- user_profiles %>% 
  filter(watcher_name == target1) %>% 
  select(-watcher_name) %>% 
  as.matrix()

Creating a score of Criteria listed

item_matrix <- as.matrix(itemProfile[, c("animated", "live_action", "mature", "family", "upbeat", "dramatic")])
itemProfile$score <- as.vector(item_matrix %*% t(norm_vector))

Items that the user rated needs to be removed

norm_watched <- c("Sinners", "Captain_AmericaBNW", "Thunderbolts")

recommendations_for_norm <- itemProfile %>%
  filter(!movie_title %in% norm_watched) %>%
  arrange(desc(score))


head(recommendations_for_norm, 3)
  movie_title animated live_action mature family upbeat dramatic     score
1   AvatarWOF        1           0      0      1      0        1 1.0000000
2       Zoo_2        1           0      0      1      1        0 0.4166667
3        KPDH        1           0      0      1      1        0 0.4166667

The system recommended Avatar WOF to Norm as he tends to enjoy dramatic media more than upbeat media as shown by his user profile

print(norm_vector)
     animated live_action    mature    family upbeat  dramatic
[1,]        0           1 0.5833333 0.4166667      0 0.5833333

Another test- “Michael”

Just to see, I will run another test using my recommendation. In this test I will not account for one of my least liked movies “Zootopia 2” and see what the reccommendation system would go advise for me.

target2 <- "michael"

michael_vector <- user_profiles %>% 
  filter(watcher_name == target2) %>% 
  select(-watcher_name) %>% 
  as.matrix()

# create a score list of items 

itemProfile$score <- as.vector(item_matrix %*% t(michael_vector))

#filtering movies that I watched (I purposely removed Zoo_2 in order to test the value)

michael_watched <- c("Sinners", "Captain_AmericaBNW", "Thunderbolts","KPDH")

recommendations_for_michael <- itemProfile %>%
  filter(!movie_title %in% michael_watched) %>%
  arrange(desc(score))


head(recommendations_for_michael, 2)
  movie_title animated live_action mature family upbeat dramatic    score
1   AvatarWOF        1           0      0      1      0        1 1.368421
2       Zoo_2        1           0      0      1      1        0 1.263158

So, it seems the the recommender gave me a recommendation for Zoo_2 instead of Avatar, This could be because there is less details to describe AvatarWOF compared to Zootopia 2.

Conclusion

The recommender system for these data set are fairly complex. Even with the implementation of packages, it is a lot to understand the background math and understanding of a personal recommendation system. I wish I had a slightly larger data set because I realized that even the item to item comparison was not a perfect system to recommend movie with a data set this small. I would also have been open to testing a more complex recommender style.

Assistance with A.I

Google Gemini was used in assistance with this project in order to review the logic of the recommendation system and the advice of changing my initial approach of a collaborative system to an item profile approach.

https://share.google/aimode/YaMZkin13zRYhpJc9