The goal for this assignment appears to be straight forward. Essentially I plan on taking my data and chart at the end of last weeks assignment to create 3 different working variables. The first is Average Rating for All movies as a base. Then I simply need to get the Average scores for each individual watcher. Then finally the average score for the movies them selves.
Once those are address , it is simply a matter of creating a formula that can take input of movie & watcher and create a viable response predicting their score.
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 4.0.0 ✔ tibble 3.2.1
✔ lubridate 1.9.4 ✔ tidyr 1.3.1
✔ purrr 1.0.4
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
#Load the Ratings as beforeMovies_Ratings <-read.csv("https://raw.githubusercontent.com/Mayneman000/DATA607Assignment/refs/heads/main/Movies.csv")
Code Base (3A)
Preparing the DataSet from Prior Assignment
To begin, I need to view the dataset that I mentioned in my prerecording approach, which is the Movies ratings for my friends before me.
Most of the people around me had a mix of movies and there was a situation when I past the original 4 recommended (excluding myself). The other people I interviewed wouldn’t watch movies often or never cared too much to give a strong score. So to get a reasonable example of the Global estimate asked by this problem, I decided to create a viewer in order to get a reasonable mix of ratings. So this viewer “Norm” has only seen the 2 marvel movies “Captain America: Brave New World”, “Thunderbolts” and Sinners. He has not seen 3 movies so I used the global estimate method in order to see which movie he should watch. So below his ratings have been manually created in a dataframe.
#Lets call him "Norm" for Normal Taste#Only watched 3 total movieswatcher_name <-c("norm","norm", "norm", "norm", "norm", "norm")movie_title <-c("Sinners", "Zoo_2", "Captain_AmericaBNW", "Thunderbolts", "AvatarWOF", "KPDH")rating <-c("3", "NULL", "5", "4","NULL", "NULL") Norm_Review <-data.frame(watcher_name, movie_title, rating)
I decided to keep everything in R for simplicity .
So I added the new reviewer to our total chart my combining the new datasets.
Warning: There was 1 warning in `mutate()`.
ℹ In argument: `num_rating = as.numeric(rating)`.
Caused by warning:
! NAs introduced by coercion
Full_movieRating
watcher_name movie_title num_rating
1 michael Sinners 5
2 michael Zoo_2 3
3 michael Captain_AmericaBNW 3
4 michael Thunderbolts 4
5 michael AvatarWOF NA
6 channa KPDH 4
7 channa Sinners 3
8 channa Zoo_2 4
9 channa Captain_AmericaBNW 4
10 channa Thunderbolts 4
11 channa AvatarWOF 4
12 dimitri KPDH 5
13 dimitri Sinners 5
14 dimitri Zoo_2 3
15 dimitri Captain_AmericaBNW 2
16 dimitri Thunderbolts 2
17 dimitri AvatarWOF 4
18 chris KPDH 3
19 chris Sinners 3
20 chris Zoo_2 1
21 chris Captain_AmericaBNW 3
22 chris Thunderbolts 4
23 chris AvatarWOF 4
24 kathy KPDH NA
25 kathy Sinners 4
26 kathy Zoo_2 3
27 kathy Captain_AmericaBNW 5
28 kathy Thunderbolts 5
29 kathy AvatarWOF 3
30 michael KPDH 4
31 norm Sinners 3
32 norm Zoo_2 NA
33 norm Captain_AmericaBNW 5
34 norm Thunderbolts 4
35 norm AvatarWOF NA
36 norm KPDH NA
#We will keep the N/A as it will be important later. print(Full_movieRating)
watcher_name movie_title num_rating
1 michael Sinners 5
2 michael Zoo_2 3
3 michael Captain_AmericaBNW 3
4 michael Thunderbolts 4
5 michael AvatarWOF NA
6 channa KPDH 4
7 channa Sinners 3
8 channa Zoo_2 4
9 channa Captain_AmericaBNW 4
10 channa Thunderbolts 4
11 channa AvatarWOF 4
12 dimitri KPDH 5
13 dimitri Sinners 5
14 dimitri Zoo_2 3
15 dimitri Captain_AmericaBNW 2
16 dimitri Thunderbolts 2
17 dimitri AvatarWOF 4
18 chris KPDH 3
19 chris Sinners 3
20 chris Zoo_2 1
21 chris Captain_AmericaBNW 3
22 chris Thunderbolts 4
23 chris AvatarWOF 4
24 kathy KPDH NA
25 kathy Sinners 4
26 kathy Zoo_2 3
27 kathy Captain_AmericaBNW 5
28 kathy Thunderbolts 5
29 kathy AvatarWOF 3
30 michael KPDH 4
31 norm Sinners 3
32 norm Zoo_2 NA
33 norm Captain_AmericaBNW 5
34 norm Thunderbolts 4
35 norm AvatarWOF NA
36 norm KPDH NA
Calculating Average Scores of Movies & Users
I calcuated the average of the movie and users and made them in data tables. Instead of bind.rows, I decided to keep the dataframes for the movie’s average and users average separate because it could cause confusion up front.
# User Average Scores watcher_average <- Full_movieRating %>%group_by(watcher_name) %>%summarize(avg_rating =mean(num_rating, na.rm =TRUE))print(watcher_average)
# A tibble: 6 × 2
watcher_name avg_rating
<chr> <dbl>
1 channa 3.83
2 chris 3
3 dimitri 3.5
4 kathy 4
5 michael 3.8
6 norm 4
# The average (Global) score for each moviemovie_average <- Full_movieRating %>%group_by(movie_title) %>%summarize(avg_score =mean(num_rating, na.rm =TRUE))print(movie_average)
We can take the prior information to add columns to the 2 dataframes in order to get the user relative average and the movie relative average.
# Creating a column which shows the average score subtracted by the mean movie averagemovie_average <- movie_average %>%mutate(movie_Total = avg_score- mean_movie_average)#watcher_average <- watcher_average %>%mutate(person_bias = (avg_rating - mean_movie_average))
Creating a recommendation for Norm
Which of the remaining 3 movies would Norm want to watch based on his personal score and the score of the people around him? This requires taking the scores from the watchers average for Norm’s bias. Then placing into code for the equation, although the equation could be mean(all movies) +norm’s bias+ total movie, changes need to be made so we can see what movie matche which score in a table. Also it allows us to scale the table up if more movies are added.
#Establishing Norm biasNorm_bias <- watcher_average %>%filter(watcher_name =="norm")%>%pull(person_bias)# Getting Norm estimated score based on the movies shown Norm_Recommendation <- movie_average %>%filter(movie_title %in%c("KPDH", "Zoo_2", "AvatarWOF")) %>%mutate(Norm_est = mean_movie_average + Norm_bias + movie_Total) %>%arrange(desc(Norm_est))print(Norm_Recommendation)
So using the code above it is possible to figure out the code and information for the best movie to recommend for norm was K-Pop Demon Hunters. This code can also be used to figure out the predicted score for other viewer for example Kathy’s predicted score for KPDH. If given another attempt, I would like to see how to practice this code with a large survey group or at least a larger variety of movies.