Probability and Statistics
| FILM | RottenTomatoes | RottenTomatoes_User | Metacritic | Metacritic_User | IMDB | Fandango_Stars | Fandango_Ratingvalue | RT_norm | RT_user_norm | Metacritic_norm | Metacritic_user_nom | IMDB_norm | RT_norm_round | RT_user_norm_round | Metacritic_norm_round | Metacritic_user_norm_round | IMDB_norm_round | Metacritic_user_vote_count | IMDB_user_vote_count | Fandango_votes | Fandango_Difference |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Avengers: Age of Ultron (2015) | 74 | 86 | 66 | 7.1 | 7.8 | 5.0 | 4.5 | 3.70 | 4.3 | 3.30 | 3.55 | 3.90 | 3.5 | 4.5 | 3.5 | 3.5 | 4.0 | 1330 | 271107 | 14846 | 0.5 |
| Cinderella (2015) | 85 | 80 | 67 | 7.5 | 7.1 | 5.0 | 4.5 | 4.25 | 4.0 | 3.35 | 3.75 | 3.55 | 4.5 | 4.0 | 3.5 | 4.0 | 3.5 | 249 | 65709 | 12640 | 0.5 |
| Ant-Man (2015) | 80 | 90 | 64 | 8.1 | 7.8 | 5.0 | 4.5 | 4.00 | 4.5 | 3.20 | 4.05 | 3.90 | 4.0 | 4.5 | 3.0 | 4.0 | 4.0 | 627 | 103660 | 12055 | 0.5 |
| Do You Believe? (2015) | 18 | 84 | 22 | 4.7 | 5.4 | 5.0 | 4.5 | 0.90 | 4.2 | 1.10 | 2.35 | 2.70 | 1.0 | 4.0 | 1.0 | 2.5 | 2.5 | 31 | 3136 | 1793 | 0.5 |
| Hot Tub Time Machine 2 (2015) | 14 | 28 | 29 | 3.4 | 5.1 | 3.5 | 3.0 | 0.70 | 1.4 | 1.45 | 1.70 | 2.55 | 0.5 | 1.5 | 1.5 | 1.5 | 2.5 | 88 | 19560 | 1021 | 0.5 |
| The Water Diviner (2015) | 63 | 62 | 50 | 6.8 | 7.2 | 4.5 | 4.0 | 3.15 | 3.1 | 2.50 | 3.40 | 3.60 | 3.0 | 3.0 | 2.5 | 3.5 | 3.5 | 34 | 39373 | 397 | 0.5 |
[1] "FILM" "RottenTomatoes"
[3] "RottenTomatoes_User" "Metacritic"
[5] "Metacritic_User" "IMDB"
[7] "Fandango_Stars" "Fandango_Ratingvalue"
[9] "RT_norm" "RT_user_norm"
[11] "Metacritic_norm" "Metacritic_user_nom"
[13] "IMDB_norm" "RT_norm_round"
[15] "RT_user_norm_round" "Metacritic_norm_round"
[17] "Metacritic_user_norm_round" "IMDB_norm_round"
[19] "Metacritic_user_vote_count" "IMDB_user_vote_count"
[21] "Fandango_votes" "Fandango_Difference"
movie_ratings %>% filter(IMDB_user_vote_count>60000) %>% select(FILM, IMDB) %>% top_n(10, IMDB) %>% arrange(desc(IMDB))
FILM IMDB
1 Inside Out (2015) 8.6
2 Mad Max: Fury Road (2015) 8.3
3 The Imitation Game (2014) 8.1
4 Birdman (2014) 7.9
5 Avengers: Age of Ultron (2015) 7.8
6 Ant-Man (2015) 7.8
7 Kingsman: The Secret Service (2015) 7.8
8 Mission: Impossible – Rogue Nation (2015) 7.8
9 Ex Machina (2015) 7.7
10 The Hobbit: The Battle of the Five Armies (2014) 7.5
summary(movie_ratings[,2:3])
RottenTomatoes RottenTomatoes_User
Min. : 5.00 Min. :20.00
1st Qu.: 31.25 1st Qu.:50.00
Median : 63.50 Median :66.50
Mean : 60.85 Mean :63.88
3rd Qu.: 89.00 3rd Qu.:81.00
Max. :100.00 Max. :94.00
movie_ratings <- movie_ratings %>% mutate(
z_RT =
(RottenTomatoes - mean(RottenTomatoes))/sd(RottenTomatoes),
z_RT_users =
(RottenTomatoes_User - mean(RottenTomatoes_User))/sd(RottenTomatoes_User)
)
movie_ratings %>% ggplot(aes(x=z_RT , y=z_RT_users))+geom_point()
with(movie_ratings, cor(RottenTomatoes, RottenTomatoes_User))
[1] 0.7814396
with(movie_ratings, cor(z_RT, z_RT_users))
[1] 0.7814396
lm(z_RT_users~z_RT, data=movie_ratings)
Call:
lm(formula = z_RT_users ~ z_RT, data = movie_ratings)
Coefficients:
(Intercept) z_RT
-2.982e-16 7.814e-01
For every 1 standard deviation increase in Rotten Tomato Rating we expect a 0.78 increase in Rotten Tomato User Rating
movie_ratings %>% ggplot(aes(x=z_RT , y=z_RT_users))+geom_point()+geom_smooth(method="lm", se=FALSE)
Recalling our dear friend, \( y = m \cdot x + b, \) we can write:
\[ zscore(y) = r \cdot zscore(x) \]
and with some plugging in and rearranging…
\[ \frac{y - \mu_y}{\sigma_y} = r \cdot \frac{x - \mu_x}{\sigma_x} \]
\[ y - \mu_y = r \cdot (x - \mu_x)\cdot \frac{\sigma_y}{\sigma_x} \]
\[ y = (r \cdot \frac{\sigma_y}{\sigma_x}) x + (\mu_y - r \cdot \mu_x \frac{\sigma_y}{\sigma_x} + ) \]
\[ y = (r \cdot \frac{\sigma_y}{\sigma_x}) x + (\mu_y - r \cdot \mu_x \frac{\sigma_y}{\sigma_x} + ) \]
BestFit <- function(x, y){
m = cor(x,y)*sd(y)/sd(x)
b = mean(y) - cor(x,y)*mean(x)*(sd(y)/sd(x))
return(list(m=m, b=b))
}
with(movie_ratings, BestFit(RottenTomatoes, RottenTomatoes_User))
$m
[1] 0.5186777
$b
[1] 32.31553
lm(RottenTomatoes_User~RottenTomatoes, data=movie_ratings)
Call:
lm(formula = RottenTomatoes_User ~ RottenTomatoes, data = movie_ratings)
Coefficients:
(Intercept) RottenTomatoes
32.3155 0.5187
If RottenTomatoes gave a more a score of 70, what score do we predict from RottenTomatoes_User?
fitline <- with(movie_ratings, BestFit(RottenTomatoes, RottenTomatoes_User))
fitline$m * 70 + fitline$b
[1] 68.62297
What if RottenTomatoes gave a more a score of 90? Or a score of 50?
fitline$m * 90 + fitline$b; fitline$m * 50 + fitline$b
[1] 78.99652
[1] 58.24941