Assignment#3A

Approach

My approach will be very similar to the confusion matrix assignment and first understand what is being asked in particular what “Global Baseline Estimate” is and what type of value that number provides. I will have to pull in my movie data using the same methods previously from SQLite. So the first step will be the load the data and determine the global mean rating from that data. From the Equation i can see it is Global Estimate = global mean + movie mean vs the global + user rating vs the global.

My goal using this data-set will be to determine a rating for Captain America for Burton. I’m going to focus on 1-cell for this assignment. I decided to use a different data-set provided by the instructor

Loading Data from Github Raw

My first step will be to load data from excel, using the provided data-set

url <- "https://raw.githubusercontent.com/AslamF/DATA607-Assignment-3A-/refs/heads/main/finaldata.csv"

data <- read.csv(url)

str(data)
'data.frame':   16 obs. of  7 variables:
 $ Critic        : chr  "Burton" "Charley" "Dan" "Dieudonne" ...
 $ CaptainAmerica: int  NA 4 NA 5 4 4 4 NA 4 4 ...
 $ Deadpool      : int  NA 5 5 4 NA NA 4 NA 4 3 ...
 $ Frozen        : int  NA 4 NA NA 2 3 4 NA 1 5 ...
 $ JungleBook    : int  4 3 NA NA NA 3 2 NA NA 5 ...
 $ PitchPerfect2 : int  NA 2 NA NA 2 4 2 NA NA 2 ...
 $ StarWarsForce : int  4 3 5 5 5 NA 4 4 5 3 ...

Calculating the Global Mean

Un-list is needed here to separate the numbers from a table structure in order to perform computations and get the mean, the mean function requires it.

global_mean <- mean(as.numeric(unlist(data[, 2:7])), na.rm=TRUE)

print(global_mean)
[1] 3.934426

Movie Averages

the key function here is sapply. Which applies a function to each column. the key syntax is such sapply (data, function, arguments)

movie_avg <- sapply(data[, 2:7], mean, na.rm=TRUE)

print(movie_avg)
CaptainAmerica       Deadpool         Frozen     JungleBook  PitchPerfect2 
      4.272727       4.444444       3.727273       3.900000       2.714286 
 StarWarsForce 
      4.153846 

Movie bias

This is our calculation for Item Bias in the Equation, in our case it is movie bias. (movie average - global mean)

movie_bias <- movie_avg - global_mean

print(movie_bias)
CaptainAmerica       Deadpool         Frozen     JungleBook  PitchPerfect2 
    0.33830104     0.51001821    -0.20715350    -0.03442623    -1.22014052 
 StarWarsForce 
    0.21941992 

User Average

exact same approach as movie average. We are using the function apply to each row in the data-set. apply (data, margin, function, arguments)

user_avgs <- apply(data[, 2:7], 1, mean, na.rm=TRUE)
print(user_avgs)
 [1] 4.000000 3.500000 5.000000 4.666667 3.250000 3.500000 3.333333 4.000000
 [9] 3.500000 3.666667 4.800000 4.000000 4.666667 4.000000 3.600000 5.000000

User Bias

Exact same approach as movie bias. We are calculating user bias now! (Users avg - global mean).

user_bias <- user_avgs - global_mean
print(user_bias)
 [1]  0.06557377 -0.43442623  1.06557377  0.73224044 -0.68442623 -0.43442623
 [7] -0.60109290  0.06557377 -0.43442623 -0.26775956  0.86557377  0.06557377
[13]  0.73224044  0.06557377 -0.33442623  1.06557377

Identifying our Data-Cells

We now have all the bias for every user and every movie. However, I am focusing on the user Burton and the movie Captain America. I will therefore need to identify the user bias for Burton and the movie bias for captain america before i can plug in to the final equation

burton_id <- which(data$Critic == "Burton")
print(burton_id)
[1] 1
burton_bias <- user_bias[burton_id]
print(burton_bias)
[1] 0.06557377
captain_bias <- movie_bias["CaptainAmerica"]
print(captain_bias)
CaptainAmerica 
      0.338301 

Calculating the Predicted Rating

Predicted Rating = Global Mean + User Bias + Item Bias

predicted_rating <- global_mean + captain_bias + burton_bias
print(predicted_rating)
CaptainAmerica 
      4.338301 

Conclusion

The final equation for predicted rating was 3.934426 (Global Mean) + 0.06557377 (User Bias) + 0.338301 (Item Bias) = 4.338301 (Predicted Rating)

What we found is the suggested rating for the movie Captain America using the global baseline estimate is 4.34 out of 5. This is higher than the global mean. Therefore using the algorithm we would recommend Captain America for Burton or rather he would be expected to enjoy the movie.