2022-04-26

Introduction

Misinformation is everywhere. You may think that a website provides accurate information on, in this example, movies. However, that may not be the case, as observed by the article [Be Suspicious Of Online Movie Ratings, Especially Fandango’s] (http://fivethirtyeight.com/features/fandango-movies-ratings/). By looking at a data set of a collection of films and their ratings from a collection of popular sites, we are interested in exploring the relationship between a specific film’s ratings from Fandango’s compared to the other movie-reviewing sites: Rotten Tomatoes, Metacritic, and IMDb. More specifically, we desire to explore via plots and tables:

  • “Is Fandango rating their movies higher than other websites?”

  • “Are any of the provided websites unbiased?”

R Preliminaries

  • tidyverse : to reorganize the dataset
    • tidyr: to tidy data
    • ggplot2: to plot charts
    • dplyr: to manipulate data
    • magrittr: to manipulate data
  • patchwork : to visualize plots side-by-side, used alongside ggplot2 package
  • plotly: to plot interactive charts
  • knitr : to generate tables

Data Set Inspection

For the purpose of this analysis, a Fandango directory from Github containing the data behind the article previously mentioned will be utilized. It includes every film that has a Rotten Tomatoes rating, a RT User rating, a Metacritic score, a Metacritic User score, and IMDb score, and at least 30 fan reviews on Fandango. The data from Fandango was pulled on Aug. 24, 2015, meaning that all the films were released in 2014 or 2015.

                            FILM RottenTomatoes RottenTomatoes_User Metacritic
1 Avengers: Age of Ultron (2015)             74                  86         66
2              Cinderella (2015)             85                  80         67
3                 Ant-Man (2015)             80                  90         64
4         Do You Believe? (2015)             18                  84         22
5  Hot Tub Time Machine 2 (2015)             14                  28         29
6       The Water Diviner (2015)             63                  62         50
  Metacritic_User IMDB Fandango_Stars Fandango_Ratingvalue RT_norm RT_user_norm
1             7.1  7.8            5.0                  4.5    3.70          4.3
2             7.5  7.1            5.0                  4.5    4.25          4.0
3             8.1  7.8            5.0                  4.5    4.00          4.5
4             4.7  5.4            5.0                  4.5    0.90          4.2
5             3.4  5.1            3.5                  3.0    0.70          1.4
6             6.8  7.2            4.5                  4.0    3.15          3.1
  Metacritic_norm Metacritic_user_nom IMDB_norm RT_norm_round
1            3.30                3.55      3.90           3.5
2            3.35                3.75      3.55           4.5
3            3.20                4.05      3.90           4.0
4            1.10                2.35      2.70           1.0
5            1.45                1.70      2.55           0.5
6            2.50                3.40      3.60           3.0
  RT_user_norm_round Metacritic_norm_round Metacritic_user_norm_round
1                4.5                   3.5                        3.5
2                4.0                   3.5                        4.0
3                4.5                   3.0                        4.0
4                4.0                   1.0                        2.5
5                1.5                   1.5                        1.5
6                3.0                   2.5                        3.5
  IMDB_norm_round Metacritic_user_vote_count IMDB_user_vote_count
1             4.0                       1330               271107
2             3.5                        249                65709
3             4.0                        627               103660
4             2.5                         31                 3136
5             2.5                         88                19560
6             3.5                         34                39373
  Fandango_votes Fandango_Difference
1          14846                 0.5
2          12640                 0.5
3          12055                 0.5
4           1793                 0.5
5           1021                 0.5
6            397                 0.5

Data Management

# Filter out values needed for further analysis, 
# reorganize columns by separate(), 
# and select(), rename() on misspelled titles
movies$FILM[movies$FILM == "Mission: Impossible – Rogue Nation (2015)"] <- "Mission: Impossible - Rogue Nation (2015)"
movies <- movies %>% 
  rename(Metacritic_user_norm = Metacritic_user_nom) %>%
  select(FILM, 
         Fandango_Stars, Fandango_Ratingvalue, Fandango_Difference, Fandango_votes, 
         RottenTomatoes, RottenTomatoes_User, RT_user_norm, RT_user_norm_round, RT_norm, RT_norm_round, 
         Metacritic, Metacritic_User, Metacritic_user_norm, Metacritic_user_norm_round, Metacritic_norm, Metacritic_norm_round, Metacritic_user_vote_count, 
         IMDB, IMDB_norm, IMDB_norm_round, IMDB_user_vote_count) 

Data Description

Rows: 146
Columns: 22
$ FILM                       <chr> "Avengers: Age of Ultron (2015)", "Cinderel…
$ Fandango_Stars             <dbl> 5.0, 5.0, 5.0, 5.0, 3.5, 4.5, 4.0, 4.0, 4.5…
$ Fandango_Ratingvalue       <dbl> 4.5, 4.5, 4.5, 4.5, 3.0, 4.0, 3.5, 3.5, 4.0…
$ Fandango_Difference        <dbl> 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5…
$ Fandango_votes             <int> 14846, 12640, 12055, 1793, 1021, 397, 252, …
$ RottenTomatoes             <int> 74, 85, 80, 18, 14, 63, 42, 86, 99, 89, 84,…
$ RottenTomatoes_User        <int> 86, 80, 90, 84, 28, 62, 53, 64, 82, 87, 77,…
$ RT_user_norm               <dbl> 4.30, 4.00, 4.50, 4.20, 1.40, 3.10, 2.65, 3…
$ RT_user_norm_round         <dbl> 4.5, 4.0, 4.5, 4.0, 1.5, 3.0, 2.5, 3.0, 4.0…
$ RT_norm                    <dbl> 3.70, 4.25, 4.00, 0.90, 0.70, 3.15, 2.10, 4…
$ RT_norm_round              <dbl> 3.5, 4.5, 4.0, 1.0, 0.5, 3.0, 2.0, 4.5, 5.0…
$ Metacritic                 <int> 66, 67, 64, 22, 29, 50, 53, 81, 81, 80, 71,…
$ Metacritic_User            <dbl> 7.1, 7.5, 8.1, 4.7, 3.4, 6.8, 7.6, 6.8, 8.8…
$ Metacritic_user_norm       <dbl> 3.55, 3.75, 4.05, 2.35, 1.70, 3.40, 3.80, 3…
$ Metacritic_user_norm_round <dbl> 3.5, 4.0, 4.0, 2.5, 1.5, 3.5, 4.0, 3.5, 4.5…
$ Metacritic_norm            <dbl> 3.30, 3.35, 3.20, 1.10, 1.45, 2.50, 2.65, 4…
$ Metacritic_norm_round      <dbl> 3.5, 3.5, 3.0, 1.0, 1.5, 2.5, 2.5, 4.0, 4.0…
$ Metacritic_user_vote_count <int> 1330, 249, 627, 31, 88, 34, 17, 124, 62, 54…
$ IMDB                       <dbl> 7.8, 7.1, 7.8, 5.4, 5.1, 7.2, 6.9, 6.5, 7.4…
$ IMDB_norm                  <dbl> 3.90, 3.55, 3.90, 2.70, 2.55, 3.60, 3.45, 3…
$ IMDB_norm_round            <dbl> 4.0, 3.5, 4.0, 2.5, 2.5, 3.5, 3.5, 3.5, 3.5…
$ IMDB_user_vote_count       <int> 271107, 65709, 103660, 3136, 19560, 39373, …

Data Description

Column Definition
FILM The film in question
Fandango_Stars The number of stars the film had on its Fandango movie page
Fandango_Ratingvalue The Fandango ratingValue for the film, as pulled from the HTML of each page. This is the actual average score the movie obtained.
Fandango_Difference The difference between the presented Fandango_Stars and the actual Fandango_Ratingvalue
Fandango_votes The number of user votes the film had on Fandango
RottenTomatoes The Rotten Tomatoes Tomatometer score for the film
RottenTomatoes_User The Rotten Tomatoes user score for the film

Data Description

Column Definition
RT_user_norm The Rotten Tomatoes user score for the film, normalized to a 0 to 5 point system
RT_user_norm_round The Rotten Tomatoes user score for the film, normalized to a 0 to 5 point system and rounded to the nearest half-star
RT_norm The Rotten Tomatoes Tomatometer score for the film, normalized to a 0 to 5 point system
RT_norm_round The Rotten Tomatoes Tomatometer score for the film, normalized to a 0 to 5 point system and rounded to the nearest half-star

Data Description

Column Definition
Metacritic The Metacritic critic score for the film
Metacritic_User The Metacritic user score for the film
Metacritic_user_norm The Metacritic user score for the film, normalized to a 0 to 5 point system
Metacritic_user_norm_round The Metacritic user score for the film, normalized to a 0 to 5 point system and rounded to the nearest half-star
Metacritic_norm The Metacritic critic score for the film, normalized to a 0 to 5 point system
Metacritic_norm_round The Metacritic critic score for the film, normalized to a 0 to 5 point system and rounded to the nearest half-star
Metacritic_user_vote_count The number of user votes the film had on Metacritic

Data Description

Column Definition
IMDB The IMDb user score for the film
IMDB_norm The IMDb user score for the film, normalized to a 0 to 5 point system
IMDB_norm_round The IMDb user score for the film, normalized to a 0 to 5 point system and rounded to the nearest half-star
IMDB_user_vote_count The number of user votes the film had on IMDb

Exploratory Data Analysis

I used majority of the packages to create some plots to show distribution and some relationships among the movie-critiquing websites, with brief data explanations. With them, I could explore a vast array of relationships by histograms, density plots, and correlations heatmaps. After that, I will use Hypothesis Test and ANOVA to identify if there is a general rule for all or some of them.

Data Distribution

Data Distribution

Data Distribution

Data Distribution

Data Distribution

Data Distribution

The Normal Q-Q plot can help us identify data as well approximated by a Normal distribution, or not, because of: - skew (including distinguishing between right skew and left skew) - behavior in the tails (which could be heavy-tailed [more outliers than expected] or light-tailed)

  1. Normally distributed data would be indicated by close adherence of the points to the diagonal reference line.
  2. Skew is indicated by substantial curving (on both ends of the distribution) in the points away from the reference line (if both ends curve up, we have right skew; if both ends curve down, this indicates left skew)
  3. An abundance or dearth of outliers (as compared to the expectations of a Normal model) are indicated in the tails of the distribution by an “S” shape or reverse “S” shape in the points.

Data Distribution for Fandango

Data Distribution for Rotten Tomatoes

Data Distribution for Metacritic

Data Distribution for IMBb

Hypothesis Test & ANOVA

Null Hypothesis: Film ratings from each website are same Alternative Hypothesis: At lease one website’s score is different

Use Hypothesis Test to identify if there exists difference, and then use ANOVA to determine how these differences exist

Hypothesis Test & ANOVA

Call:
lm(formula = Fandango_Ratingvalue ~ RT_user_norm + Metacritic_user_norm + 
    IMDB_norm, data = movies)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.87281 -0.23192  0.01277  0.23064  0.72506 

Coefficients:
                     Estimate Std. Error t value Pr(>|t|)    
(Intercept)           3.06683    0.27616  11.105  < 2e-16 ***
RT_user_norm          0.48036    0.06180   7.773 1.41e-12 ***
Metacritic_user_norm -0.19702    0.05578  -3.532 0.000556 ***
IMDB_norm            -0.03373    0.14334  -0.235 0.814319    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.3317 on 142 degrees of freedom
Multiple R-squared:  0.5738,    Adjusted R-squared:  0.5648 
F-statistic: 63.72 on 3 and 142 DF,  p-value: < 2.2e-16

Hypothesis Test & ANOVA

At 0.05 significant level, p-value is less than 0.05, we can reject the null hypothesis test. Therefore we can conclude that there is at least one category different from others.

ANOVA for Rotten Tomatoes

Pr(>F)
0

ANOVA for Metacritic

Pr(>F)
3.2e-05

ANOVA for IMDb

Pr(>F)
0

Summary

From my study, I analyzed film ratings of several well-known websites and tried to summarize their differences with some interesting conclusions. No film-critiquing website is rating their movies quite like Fandango. Rotten Tomatoes, Metacritic, and IMDb all show a diverse rating scale unrelated to each other. Therefore, one should be advised against believing the mininformation Fandango provides, and best to compare all these sites against each other before determining if the movie you desire to view is worth your time.