Assignment #3A

Author

Michael Mayne

Pre-Coding Approach

The goal for this assignment appears to be straight forward. Essentially I plan on taking my data and chart at the end of last weeks assignment to create 3 different working variables. The first is Average Rating for All movies as a base. Then I simply need to get the Average scores for each individual watcher. Then finally the average score for the movies them selves.

Once those are address , it is simply a matter of creating a formula that can take input of movie & watcher and create a viable response predicting their score.

library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   4.0.0     ✔ tibble    3.2.1
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.0.4     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

#Load the Ratings as before
Movies_Ratings <- read.csv("https://raw.githubusercontent.com/Mayneman000/DATA607Assignment/refs/heads/main/Movies.csv")

Code Base (3A)

Preparing the DataSet from Prior Assignment

To begin, I need to view the dataset that I mentioned in my prerecording approach, which is the Movies ratings for my friends before me.

glimpse(Movies_Ratings)

Rows: 30
Columns: 3
$ watcher_name <chr> "michael", "michael", "michael", "michael", "michael", "c…
$ movie_title  <chr> "Sinners", "Zoo_2", "Captain_AmericaBNW", "Thunderbolts",…
$ rating       <chr> "5", "3", "3", "4", "NULL", "4", "3", "4", "4", "4", "4",…

Adding a new Reviewer

Most of the people around me had a mix of movies and there was a situation when I past the original 4 recommended (excluding myself). The other people I interviewed wouldn’t watch movies often or never cared too much to give a strong score. So to get a reasonable example of the Global estimate asked by this problem, I decided to create a viewer in order to get a reasonable mix of ratings. So this viewer “Norm” has only seen the 2 marvel movies “Captain America: Brave New World”, “Thunderbolts” and Sinners. He has not seen 3 movies so I used the global estimate method in order to see which movie he should watch. So below his ratings have been manually created in a dataframe.

#Lets call him "Norm" for Normal Taste
#Only watched 3 total movies

watcher_name <- c("norm","norm", "norm", "norm", "norm", "norm")
movie_title <- c("Sinners", "Zoo_2", "Captain_AmericaBNW", "Thunderbolts", "AvatarWOF", "KPDH")
rating <- c("3", "NULL", "5", "4","NULL", "NULL") 


Norm_Review <- data.frame(watcher_name, movie_title, rating)

I decided to keep everything in R for simplicity .

So I added the new reviewer to our total chart my combining the new datasets.

Full_movieRating <- bind_rows(Movies_Ratings, Norm_Review)

Full_movieRating <- Full_movieRating %>%
  mutate(num_rating = as.numeric(rating))%>%
  select(-rating)

Warning: There was 1 warning in `mutate()`.
ℹ In argument: `num_rating = as.numeric(rating)`.
Caused by warning:
! NAs introduced by coercion

Full_movieRating

   watcher_name        movie_title num_rating
1       michael            Sinners          5
2       michael              Zoo_2          3
3       michael Captain_AmericaBNW          3
4       michael       Thunderbolts          4
5       michael          AvatarWOF         NA
6        channa               KPDH          4
7        channa            Sinners          3
8        channa              Zoo_2          4
9        channa Captain_AmericaBNW          4
10       channa       Thunderbolts          4
11       channa          AvatarWOF          4
12      dimitri               KPDH          5
13      dimitri            Sinners          5
14      dimitri              Zoo_2          3
15      dimitri Captain_AmericaBNW          2
16      dimitri       Thunderbolts          2
17      dimitri          AvatarWOF          4
18        chris               KPDH          3
19        chris            Sinners          3
20        chris              Zoo_2          1
21        chris Captain_AmericaBNW          3
22        chris       Thunderbolts          4
23        chris          AvatarWOF          4
24        kathy               KPDH         NA
25        kathy            Sinners          4
26        kathy              Zoo_2          3
27        kathy Captain_AmericaBNW          5
28        kathy       Thunderbolts          5
29        kathy          AvatarWOF          3
30      michael               KPDH          4
31         norm            Sinners          3
32         norm              Zoo_2         NA
33         norm Captain_AmericaBNW          5
34         norm       Thunderbolts          4
35         norm          AvatarWOF         NA
36         norm               KPDH         NA

#We will keep the N/A as it will be important later. 

print(Full_movieRating)

   watcher_name        movie_title num_rating
1       michael            Sinners          5
2       michael              Zoo_2          3
3       michael Captain_AmericaBNW          3
4       michael       Thunderbolts          4
5       michael          AvatarWOF         NA
6        channa               KPDH          4
7        channa            Sinners          3
8        channa              Zoo_2          4
9        channa Captain_AmericaBNW          4
10       channa       Thunderbolts          4
11       channa          AvatarWOF          4
12      dimitri               KPDH          5
13      dimitri            Sinners          5
14      dimitri              Zoo_2          3
15      dimitri Captain_AmericaBNW          2
16      dimitri       Thunderbolts          2
17      dimitri          AvatarWOF          4
18        chris               KPDH          3
19        chris            Sinners          3
20        chris              Zoo_2          1
21        chris Captain_AmericaBNW          3
22        chris       Thunderbolts          4
23        chris          AvatarWOF          4
24        kathy               KPDH         NA
25        kathy            Sinners          4
26        kathy              Zoo_2          3
27        kathy Captain_AmericaBNW          5
28        kathy       Thunderbolts          5
29        kathy          AvatarWOF          3
30      michael               KPDH          4
31         norm            Sinners          3
32         norm              Zoo_2         NA
33         norm Captain_AmericaBNW          5
34         norm       Thunderbolts          4
35         norm          AvatarWOF         NA
36         norm               KPDH         NA

Calculating Average Scores of Movies & Users

I calcuated the average of the movie and users and made them in data tables. Instead of bind.rows, I decided to keep the dataframes for the movie’s average and users average separate because it could cause confusion up front.

# User Average Scores 

watcher_average <- Full_movieRating %>%
  group_by(watcher_name) %>%
  summarize(avg_rating = mean(num_rating, na.rm = TRUE))

print(watcher_average)

# A tibble: 6 × 2
  watcher_name avg_rating
  <chr>             <dbl>
1 channa             3.83
2 chris              3   
3 dimitri            3.5 
4 kathy              4   
5 michael            3.8 
6 norm               4

# The average (Global) score for each movie

movie_average <- Full_movieRating %>%
  group_by(movie_title) %>%
  summarize(avg_score = mean(num_rating, na.rm = TRUE))

print(movie_average)

# A tibble: 6 × 2
  movie_title        avg_score
  <chr>                  <dbl>
1 AvatarWOF               3.75
2 Captain_AmericaBNW      3.67
3 KPDH                    4   
4 Sinners                 3.83
5 Thunderbolts            3.83
6 Zoo_2                   2.8

mean_movie_average = (mean(movie_average$avg_score))
print(mean_movie_average)

[1] 3.647222

We can take the prior information to add columns to the 2 dataframes in order to get the user relative average and the movie relative average.

# Creating a column which shows the average score subtracted by the mean movie average

movie_average <- movie_average %>%
  mutate(movie_Total = avg_score- mean_movie_average)


#
watcher_average <- watcher_average %>%
  mutate(person_bias = (avg_rating - mean_movie_average))

Creating a recommendation for Norm

Which of the remaining 3 movies would Norm want to watch based on his personal score and the score of the people around him? This requires taking the scores from the watchers average for Norm’s bias. Then placing into code for the equation, although the equation could be mean(all movies) +norm’s bias+ total movie, changes need to be made so we can see what movie matche which score in a table. Also it allows us to scale the table up if more movies are added.

#Establishing Norm bias

Norm_bias <- watcher_average %>%
  filter(watcher_name == "norm")%>%
  pull(person_bias)


# Getting Norm estimated score based on the movies shown 

Norm_Recommendation <- movie_average %>%
  filter(movie_title %in% c("KPDH", "Zoo_2", "AvatarWOF")) %>%
  mutate(Norm_est = mean_movie_average + Norm_bias + movie_Total) %>%
  arrange(desc(Norm_est))

print(Norm_Recommendation)

# A tibble: 3 × 4
  movie_title avg_score movie_Total Norm_est
  <chr>           <dbl>       <dbl>    <dbl>
1 KPDH             4          0.353     4.35
2 AvatarWOF        3.75       0.103     4.10
3 Zoo_2            2.8       -0.847     3.15

Conclusion

So using the code above it is possible to figure out the code and information for the best movie to recommend for norm was K-Pop Demon Hunters. This code can also be used to figure out the predicted score for other viewer for example Kathy’s predicted score for KPDH. If given another attempt, I would like to see how to practice this code with a large survey group or at least a larger variety of movies.

-End of Report