SPS_DATA607_Week3A_DC

Author

David Chen

Global Baseline Estimate

Using the information you collected on movie ratings, implement a Global Baseline Estimate recommendation system in R.   The attached spreadsheet provides the implementation algorithm.

Most recommender systems use personalized algorithms like “content management” and “item-item collaborative filtering.” Sometimes non-personalized recommenders are also useful or necessary. One of the best non-personalized recommender system algorithms is the “Global Baseline Estimate.

The job here is to use the survey data collected and write the R code that makes a movie recommendation using the Global Baseline Estimate algorithm.  Please see the attached spreadsheet for implementation details.

Approach

According to the Excel file, there is a calculation related to the Global Baseline Estimate. I need to understand the variables involved, populate them using all available data, rerun the model using the Global Baseline Estimate, and then compare the actual ratings with the estimated ratings.

Running Code

Convert the Excel file to a CSV format, upload it to GitHub, and then load the CSV file to preview the dataset.

library(dplyr)

Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union
library(ggplot2)
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ forcats   1.0.1     ✔ stringr   1.6.0
✔ lubridate 1.9.4     ✔ tibble    3.3.1
✔ purrr     1.2.1     ✔ tidyr     1.3.2
✔ readr     2.1.6     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
mydata <- read.csv("https://raw.githubusercontent.com/dyc-sps/SPS_DATA607_Week3A/refs/heads/main/movierating.csv")
summary(mydata)
    Critic          CaptainAmerica     Deadpool         Frozen     
 Length:16          Min.   :4.000   Min.   :3.000   Min.   :1.000  
 Class :character   1st Qu.:4.000   1st Qu.:4.000   1st Qu.:3.000  
 Mode  :character   Median :4.000   Median :5.000   Median :4.000  
                    Mean   :4.273   Mean   :4.444   Mean   :3.727  
                    3rd Qu.:4.500   3rd Qu.:5.000   3rd Qu.:5.000  
                    Max.   :5.000   Max.   :5.000   Max.   :5.000  
                    NA's   :5       NA's   :7       NA's   :5      
   JungleBook  PitchPerfect2   StarWarsForce  
 Min.   :2.0   Min.   :2.000   Min.   :3.000  
 1st Qu.:3.0   1st Qu.:2.000   1st Qu.:4.000  
 Median :4.0   Median :2.000   Median :4.000  
 Mean   :3.9   Mean   :2.714   Mean   :4.154  
 3rd Qu.:5.0   3rd Qu.:3.500   3rd Qu.:5.000  
 Max.   :5.0   Max.   :4.000   Max.   :5.000  
 NA's   :6     NA's   :9       NA's   :3      
head(mydata)
     Critic CaptainAmerica Deadpool Frozen JungleBook PitchPerfect2
1    Burton             NA       NA     NA          4            NA
2   Charley              4        5      4          3             2
3       Dan             NA        5     NA         NA            NA
4 Dieudonne              5        4     NA         NA            NA
5      Matt              4       NA      2         NA             2
6  Mauricio              4       NA      3          3             4
  StarWarsForce
1             4
2             3
3             5
4             5
5             5
6            NA
glimpse(mydata)
Rows: 16
Columns: 7
$ Critic         <chr> "Burton", "Charley", "Dan", "Dieudonne", "Matt", "Mauri…
$ CaptainAmerica <int> NA, 4, NA, 5, 4, 4, 4, NA, 4, 4, 5, NA, 5, 4, 4, NA
$ Deadpool       <int> NA, 5, 5, 4, NA, NA, 4, NA, 4, 3, 5, NA, 5, NA, 5, NA
$ Frozen         <int> NA, 4, NA, NA, 2, 3, 4, NA, 1, 5, 5, 4, 5, NA, 3, 5
$ JungleBook     <int> 4, 3, NA, NA, NA, 3, 2, NA, NA, 5, 5, 5, 4, NA, 3, 5
$ PitchPerfect2  <int> NA, 2, NA, NA, 2, 4, 2, NA, NA, 2, NA, NA, 4, NA, 3, NA
$ StarWarsForce  <int> 4, 3, 5, 5, 5, NA, 4, 4, 5, 3, 4, 3, 5, 4, NA, NA

Calculate the average rating for each movie from each user, ignoring any missing (NA) values.

user_mean_df <- mydata %>%
  mutate(avg = rowMeans(across(where(is.numeric)), na.rm = TRUE))

Compute the average rating for each movie (ignoring NA values) and save the results to a new DataFrame.

movie_mean_df <- data.frame(movie=colnames(mydata[sapply(mydata, is.numeric)]),col_avg = colMeans(mydata[sapply(mydata, is.numeric)], na.rm = TRUE))
print(movie_mean_df)
                        movie  col_avg
CaptainAmerica CaptainAmerica 4.272727
Deadpool             Deadpool 4.444444
Frozen                 Frozen 3.727273
JungleBook         JungleBook 3.900000
PitchPerfect2   PitchPerfect2 2.714286
StarWarsForce   StarWarsForce 4.153846
#summary(movie_mean_df)
colnames(movie_mean_df)
[1] "movie"   "col_avg"

Calculate the overall mean of all ratings to get the global average.

global_mean <- mean(as.matrix(mydata[sapply(mydata, is.numeric)]), na.rm = TRUE)
global_mean
[1] 3.934426

Compute the user bias as the mean rating for each user, ignoring NA values. This represents each user’s tendency to rate higher or lower than the global average.

user_mean_df <- user_mean_df %>%
  mutate(user_bias = user_mean_df$avg - global_mean) %>%
  select(Critic,avg,user_bias)
print(user_mean_df)
      Critic      avg   user_bias
1     Burton 4.000000  0.06557377
2    Charley 3.500000 -0.43442623
3        Dan 5.000000  1.06557377
4  Dieudonne 4.666667  0.73224044
5       Matt 3.250000 -0.68442623
6   Mauricio 3.500000 -0.43442623
7        Max 3.333333 -0.60109290
8     Nathan 4.000000  0.06557377
9      Param 3.500000 -0.43442623
10    Parshu 3.666667 -0.26775956
11 Prashanth 4.800000  0.86557377
12    Shipra 4.000000  0.06557377
13  Sreejaya 4.666667  0.73224044
14     Steve 4.000000  0.06557377
15     Vuthy 3.600000 -0.33442623
16   Xingjia 5.000000  1.06557377

Compute the movie bias as the mean rating for each movie, ignoring NA values. This represents how each movie tends to be rated relative to the global average.

movie_mean_df <- movie_mean_df %>%
  mutate(user_bias = movie_mean_df$col_avg - global_mean)
print(movie_mean_df)
                        movie  col_avg   user_bias
CaptainAmerica CaptainAmerica 4.272727  0.33830104
Deadpool             Deadpool 4.444444  0.51001821
Frozen                 Frozen 3.727273 -0.20715350
JungleBook         JungleBook 3.900000 -0.03442623
PitchPerfect2   PitchPerfect2 2.714286 -1.22014052
StarWarsForce   StarWarsForce 4.153846  0.21941992

Combine the two data frames to prepare for prediction, applying the global baseline estimate calculation. Additionally, reshape the results into a two-dimensional data frame for easier analysis or visualization.

df_predic <- merge(movie_mean_df,user_mean_df, by = NULL)
print(df_predic)
            movie  col_avg user_bias.x    Critic      avg user_bias.y
1  CaptainAmerica 4.272727  0.33830104    Burton 4.000000  0.06557377
2        Deadpool 4.444444  0.51001821    Burton 4.000000  0.06557377
3          Frozen 3.727273 -0.20715350    Burton 4.000000  0.06557377
4      JungleBook 3.900000 -0.03442623    Burton 4.000000  0.06557377
5   PitchPerfect2 2.714286 -1.22014052    Burton 4.000000  0.06557377
6   StarWarsForce 4.153846  0.21941992    Burton 4.000000  0.06557377
7  CaptainAmerica 4.272727  0.33830104   Charley 3.500000 -0.43442623
8        Deadpool 4.444444  0.51001821   Charley 3.500000 -0.43442623
9          Frozen 3.727273 -0.20715350   Charley 3.500000 -0.43442623
10     JungleBook 3.900000 -0.03442623   Charley 3.500000 -0.43442623
11  PitchPerfect2 2.714286 -1.22014052   Charley 3.500000 -0.43442623
12  StarWarsForce 4.153846  0.21941992   Charley 3.500000 -0.43442623
13 CaptainAmerica 4.272727  0.33830104       Dan 5.000000  1.06557377
14       Deadpool 4.444444  0.51001821       Dan 5.000000  1.06557377
15         Frozen 3.727273 -0.20715350       Dan 5.000000  1.06557377
16     JungleBook 3.900000 -0.03442623       Dan 5.000000  1.06557377
17  PitchPerfect2 2.714286 -1.22014052       Dan 5.000000  1.06557377
18  StarWarsForce 4.153846  0.21941992       Dan 5.000000  1.06557377
19 CaptainAmerica 4.272727  0.33830104 Dieudonne 4.666667  0.73224044
20       Deadpool 4.444444  0.51001821 Dieudonne 4.666667  0.73224044
21         Frozen 3.727273 -0.20715350 Dieudonne 4.666667  0.73224044
22     JungleBook 3.900000 -0.03442623 Dieudonne 4.666667  0.73224044
23  PitchPerfect2 2.714286 -1.22014052 Dieudonne 4.666667  0.73224044
24  StarWarsForce 4.153846  0.21941992 Dieudonne 4.666667  0.73224044
25 CaptainAmerica 4.272727  0.33830104      Matt 3.250000 -0.68442623
26       Deadpool 4.444444  0.51001821      Matt 3.250000 -0.68442623
27         Frozen 3.727273 -0.20715350      Matt 3.250000 -0.68442623
28     JungleBook 3.900000 -0.03442623      Matt 3.250000 -0.68442623
29  PitchPerfect2 2.714286 -1.22014052      Matt 3.250000 -0.68442623
30  StarWarsForce 4.153846  0.21941992      Matt 3.250000 -0.68442623
31 CaptainAmerica 4.272727  0.33830104  Mauricio 3.500000 -0.43442623
32       Deadpool 4.444444  0.51001821  Mauricio 3.500000 -0.43442623
33         Frozen 3.727273 -0.20715350  Mauricio 3.500000 -0.43442623
34     JungleBook 3.900000 -0.03442623  Mauricio 3.500000 -0.43442623
35  PitchPerfect2 2.714286 -1.22014052  Mauricio 3.500000 -0.43442623
36  StarWarsForce 4.153846  0.21941992  Mauricio 3.500000 -0.43442623
37 CaptainAmerica 4.272727  0.33830104       Max 3.333333 -0.60109290
38       Deadpool 4.444444  0.51001821       Max 3.333333 -0.60109290
39         Frozen 3.727273 -0.20715350       Max 3.333333 -0.60109290
40     JungleBook 3.900000 -0.03442623       Max 3.333333 -0.60109290
41  PitchPerfect2 2.714286 -1.22014052       Max 3.333333 -0.60109290
42  StarWarsForce 4.153846  0.21941992       Max 3.333333 -0.60109290
43 CaptainAmerica 4.272727  0.33830104    Nathan 4.000000  0.06557377
44       Deadpool 4.444444  0.51001821    Nathan 4.000000  0.06557377
45         Frozen 3.727273 -0.20715350    Nathan 4.000000  0.06557377
46     JungleBook 3.900000 -0.03442623    Nathan 4.000000  0.06557377
47  PitchPerfect2 2.714286 -1.22014052    Nathan 4.000000  0.06557377
48  StarWarsForce 4.153846  0.21941992    Nathan 4.000000  0.06557377
49 CaptainAmerica 4.272727  0.33830104     Param 3.500000 -0.43442623
50       Deadpool 4.444444  0.51001821     Param 3.500000 -0.43442623
51         Frozen 3.727273 -0.20715350     Param 3.500000 -0.43442623
52     JungleBook 3.900000 -0.03442623     Param 3.500000 -0.43442623
53  PitchPerfect2 2.714286 -1.22014052     Param 3.500000 -0.43442623
54  StarWarsForce 4.153846  0.21941992     Param 3.500000 -0.43442623
55 CaptainAmerica 4.272727  0.33830104    Parshu 3.666667 -0.26775956
56       Deadpool 4.444444  0.51001821    Parshu 3.666667 -0.26775956
57         Frozen 3.727273 -0.20715350    Parshu 3.666667 -0.26775956
58     JungleBook 3.900000 -0.03442623    Parshu 3.666667 -0.26775956
59  PitchPerfect2 2.714286 -1.22014052    Parshu 3.666667 -0.26775956
60  StarWarsForce 4.153846  0.21941992    Parshu 3.666667 -0.26775956
61 CaptainAmerica 4.272727  0.33830104 Prashanth 4.800000  0.86557377
62       Deadpool 4.444444  0.51001821 Prashanth 4.800000  0.86557377
63         Frozen 3.727273 -0.20715350 Prashanth 4.800000  0.86557377
64     JungleBook 3.900000 -0.03442623 Prashanth 4.800000  0.86557377
65  PitchPerfect2 2.714286 -1.22014052 Prashanth 4.800000  0.86557377
66  StarWarsForce 4.153846  0.21941992 Prashanth 4.800000  0.86557377
67 CaptainAmerica 4.272727  0.33830104    Shipra 4.000000  0.06557377
68       Deadpool 4.444444  0.51001821    Shipra 4.000000  0.06557377
69         Frozen 3.727273 -0.20715350    Shipra 4.000000  0.06557377
70     JungleBook 3.900000 -0.03442623    Shipra 4.000000  0.06557377
71  PitchPerfect2 2.714286 -1.22014052    Shipra 4.000000  0.06557377
72  StarWarsForce 4.153846  0.21941992    Shipra 4.000000  0.06557377
73 CaptainAmerica 4.272727  0.33830104  Sreejaya 4.666667  0.73224044
74       Deadpool 4.444444  0.51001821  Sreejaya 4.666667  0.73224044
75         Frozen 3.727273 -0.20715350  Sreejaya 4.666667  0.73224044
76     JungleBook 3.900000 -0.03442623  Sreejaya 4.666667  0.73224044
77  PitchPerfect2 2.714286 -1.22014052  Sreejaya 4.666667  0.73224044
78  StarWarsForce 4.153846  0.21941992  Sreejaya 4.666667  0.73224044
79 CaptainAmerica 4.272727  0.33830104     Steve 4.000000  0.06557377
80       Deadpool 4.444444  0.51001821     Steve 4.000000  0.06557377
81         Frozen 3.727273 -0.20715350     Steve 4.000000  0.06557377
82     JungleBook 3.900000 -0.03442623     Steve 4.000000  0.06557377
83  PitchPerfect2 2.714286 -1.22014052     Steve 4.000000  0.06557377
84  StarWarsForce 4.153846  0.21941992     Steve 4.000000  0.06557377
85 CaptainAmerica 4.272727  0.33830104     Vuthy 3.600000 -0.33442623
86       Deadpool 4.444444  0.51001821     Vuthy 3.600000 -0.33442623
87         Frozen 3.727273 -0.20715350     Vuthy 3.600000 -0.33442623
88     JungleBook 3.900000 -0.03442623     Vuthy 3.600000 -0.33442623
89  PitchPerfect2 2.714286 -1.22014052     Vuthy 3.600000 -0.33442623
90  StarWarsForce 4.153846  0.21941992     Vuthy 3.600000 -0.33442623
91 CaptainAmerica 4.272727  0.33830104   Xingjia 5.000000  1.06557377
92       Deadpool 4.444444  0.51001821   Xingjia 5.000000  1.06557377
93         Frozen 3.727273 -0.20715350   Xingjia 5.000000  1.06557377
94     JungleBook 3.900000 -0.03442623   Xingjia 5.000000  1.06557377
95  PitchPerfect2 2.714286 -1.22014052   Xingjia 5.000000  1.06557377
96  StarWarsForce 4.153846  0.21941992   Xingjia 5.000000  1.06557377
df_final <- df_predic %>%
  mutate(estimate=global_mean+user_bias.x+user_bias.y) %>%
  select(Critic,movie,estimate)
print(df_final)
      Critic          movie estimate
1     Burton CaptainAmerica 4.338301
2     Burton       Deadpool 4.510018
3     Burton         Frozen 3.792846
4     Burton     JungleBook 3.965574
5     Burton  PitchPerfect2 2.779859
6     Burton  StarWarsForce 4.219420
7    Charley CaptainAmerica 3.838301
8    Charley       Deadpool 4.010018
9    Charley         Frozen 3.292846
10   Charley     JungleBook 3.465574
11   Charley  PitchPerfect2 2.279859
12   Charley  StarWarsForce 3.719420
13       Dan CaptainAmerica 5.338301
14       Dan       Deadpool 5.510018
15       Dan         Frozen 4.792846
16       Dan     JungleBook 4.965574
17       Dan  PitchPerfect2 3.779859
18       Dan  StarWarsForce 5.219420
19 Dieudonne CaptainAmerica 5.004968
20 Dieudonne       Deadpool 5.176685
21 Dieudonne         Frozen 4.459513
22 Dieudonne     JungleBook 4.632240
23 Dieudonne  PitchPerfect2 3.446526
24 Dieudonne  StarWarsForce 4.886087
25      Matt CaptainAmerica 3.588301
26      Matt       Deadpool 3.760018
27      Matt         Frozen 3.042846
28      Matt     JungleBook 3.215574
29      Matt  PitchPerfect2 2.029859
30      Matt  StarWarsForce 3.469420
31  Mauricio CaptainAmerica 3.838301
32  Mauricio       Deadpool 4.010018
33  Mauricio         Frozen 3.292846
34  Mauricio     JungleBook 3.465574
35  Mauricio  PitchPerfect2 2.279859
36  Mauricio  StarWarsForce 3.719420
37       Max CaptainAmerica 3.671634
38       Max       Deadpool 3.843352
39       Max         Frozen 3.126180
40       Max     JungleBook 3.298907
41       Max  PitchPerfect2 2.113193
42       Max  StarWarsForce 3.552753
43    Nathan CaptainAmerica 4.338301
44    Nathan       Deadpool 4.510018
45    Nathan         Frozen 3.792846
46    Nathan     JungleBook 3.965574
47    Nathan  PitchPerfect2 2.779859
48    Nathan  StarWarsForce 4.219420
49     Param CaptainAmerica 3.838301
50     Param       Deadpool 4.010018
51     Param         Frozen 3.292846
52     Param     JungleBook 3.465574
53     Param  PitchPerfect2 2.279859
54     Param  StarWarsForce 3.719420
55    Parshu CaptainAmerica 4.004968
56    Parshu       Deadpool 4.176685
57    Parshu         Frozen 3.459513
58    Parshu     JungleBook 3.632240
59    Parshu  PitchPerfect2 2.446526
60    Parshu  StarWarsForce 3.886087
61 Prashanth CaptainAmerica 5.138301
62 Prashanth       Deadpool 5.310018
63 Prashanth         Frozen 4.592846
64 Prashanth     JungleBook 4.765574
65 Prashanth  PitchPerfect2 3.579859
66 Prashanth  StarWarsForce 5.019420
67    Shipra CaptainAmerica 4.338301
68    Shipra       Deadpool 4.510018
69    Shipra         Frozen 3.792846
70    Shipra     JungleBook 3.965574
71    Shipra  PitchPerfect2 2.779859
72    Shipra  StarWarsForce 4.219420
73  Sreejaya CaptainAmerica 5.004968
74  Sreejaya       Deadpool 5.176685
75  Sreejaya         Frozen 4.459513
76  Sreejaya     JungleBook 4.632240
77  Sreejaya  PitchPerfect2 3.446526
78  Sreejaya  StarWarsForce 4.886087
79     Steve CaptainAmerica 4.338301
80     Steve       Deadpool 4.510018
81     Steve         Frozen 3.792846
82     Steve     JungleBook 3.965574
83     Steve  PitchPerfect2 2.779859
84     Steve  StarWarsForce 4.219420
85     Vuthy CaptainAmerica 3.938301
86     Vuthy       Deadpool 4.110018
87     Vuthy         Frozen 3.392846
88     Vuthy     JungleBook 3.565574
89     Vuthy  PitchPerfect2 2.379859
90     Vuthy  StarWarsForce 3.819420
91   Xingjia CaptainAmerica 5.338301
92   Xingjia       Deadpool 5.510018
93   Xingjia         Frozen 4.792846
94   Xingjia     JungleBook 4.965574
95   Xingjia  PitchPerfect2 3.779859
96   Xingjia  StarWarsForce 5.219420
df_final <- df_final %>%    
  pivot_wider(
    names_from = movie,   
    values_from = estimate   
  )
df_final
# A tibble: 16 × 7
   Critic  CaptainAmerica Deadpool Frozen JungleBook PitchPerfect2 StarWarsForce
   <chr>            <dbl>    <dbl>  <dbl>      <dbl>         <dbl>         <dbl>
 1 Burton            4.34     4.51   3.79       3.97          2.78          4.22
 2 Charley           3.84     4.01   3.29       3.47          2.28          3.72
 3 Dan               5.34     5.51   4.79       4.97          3.78          5.22
 4 Dieudo…           5.00     5.18   4.46       4.63          3.45          4.89
 5 Matt              3.59     3.76   3.04       3.22          2.03          3.47
 6 Mauric…           3.84     4.01   3.29       3.47          2.28          3.72
 7 Max               3.67     3.84   3.13       3.30          2.11          3.55
 8 Nathan            4.34     4.51   3.79       3.97          2.78          4.22
 9 Param             3.84     4.01   3.29       3.47          2.28          3.72
10 Parshu            4.00     4.18   3.46       3.63          2.45          3.89
11 Prasha…           5.14     5.31   4.59       4.77          3.58          5.02
12 Shipra            4.34     4.51   3.79       3.97          2.78          4.22
13 Sreeja…           5.00     5.18   4.46       4.63          3.45          4.89
14 Steve             4.34     4.51   3.79       3.97          2.78          4.22
15 Vuthy             3.94     4.11   3.39       3.57          2.38          3.82
16 Xingjia           5.34     5.51   4.79       4.97          3.78          5.22

Conclusion

The Global Baseline Estimate provides a simple yet effective method for predicting ratings by combining the overall average rating with adjustments for both movie popularity and user rating tendencies. The global mean serves as the starting point, while the movie bias accounts for how each movie is generally perceived compared to the average, and the user bias reflects individual rating behavior. By summing these components, we capture the main systematic effects in the data, producing a baseline prediction that is more accurate than a naive global average. This approach establishes a foundation benchmark for evaluating more sophisticated recommendation systems and highlights the importance of considering both item and user specific factors in rating predictions.