Project Description:

In this project, I will work with datasets on movie ratings data of Fandango.com. Fandango is a media company that sells movie tickets online, as well as a provider of television and streaming media information, e.g., through its subsidiaries Flixster, Movies.com, and Rotten Tomatoes.

1. Background

In October 2015, a data journalist named Walt Hickey analyzed movie ratings data and found strong evidence to suggest that Fandango’s rating system was biased and dishonest. He published his analysis in the article - “Be Suspicious Of Online Movie Ratings, Especially Fandango’s” which can be found here: https://fivethirtyeight.com/features/fandango-movies-ratings/.

Fandango displays a 5-star rating system on their website, where the minimum rating is 0 stars and the maximum is 5 stars. Hickey found that there’s a significant discrepancy between the ratings displayed to users and the actual ratings which he was able to find on Fandango. The actual rating was almost always rounded up to the nearest half-star. For instance, a 4.1 movie would be rounded off to 4.5 stars, not to 4 stars.

Fandango’s officials responded that the biased rounding off was caused by a bug in their system rather than being intentional, and they promised to fix the bug as soon as possible. Presumably, this has already happened, although we can’t tell for sure since the actual rating value doesn’t seem to be displayed anymore.

2. Goal

The goal of this project is to analyze data of recent movie ratings on Fandango to find out if there has been any change in Fandango’s rating system after Hickey’s analysis.

3. Data Sources

Here are two sets of movie ratings data sampled:

library(readr)

fandango_before <- read_csv("fandango_score_comparison.csv")
fandango_after <- read_csv("movie_ratings_16_17.csv")
head(fandango_before)
There were 22 warnings (use warnings() to see them)
head(fandango_after)

I. Determining if the samples are representative of the population

If the goal here is to find out if there has been any change after Hickey’s analysis, then ideally we would want to sample all movie ratings at two different periods of time for comparison. Thus, the population of interest for this goal is: all the movie ratings on Fandango.com.

Good news is, the two samples we are working with were taken before and after Hickey’s analysis. However, we still need to make sure the samples are representative of the population of interest, in order to describe the population as precise as possible and minimize the sampling error.

library(dplyr)
# Cleaning dataframes to include only relevant data
fandango_before_cleaned <- fandango_before %>%
  select(FILM, Fandango_Stars, Fandango_Ratingvalue, Fandango_votes, Fandango_Difference)
fandango_after_cleaned <- fandango_after %>%
  select(movie, year, fandango)
head(fandango_before_cleaned)

According to Hickey’article, here are the sampling criteria for Sample #1 (ratings before Hickey’s analysis):

head(fandango_after_cleaned)

According to Dataquest’s GitHub repository, here are the sampling criteria for Sample #2 (ratings after Hickey’s analysis):

Conslusion

Both samples are not random because not all the movies have an equal probability to be selected. The sampling method applied here is purposive sampling, where samples are produced under pre-defined and subjective conditions. Therefore, these two samples are not representative of the population (all movie ratings on Fandango) we want to describe.


II. Adjusting the goal of the analysis and re-examining sample representativeness

Using non-representative samples to describe a population could lead to a wrong conclusion. Moving forward, I am going to slighly adjust the goal of this project to a new goal that is still a fairly good proxy for the initial goal, so that the population of interest changes and the samples we are working with become representative.

We could also collect new data and obtain representative samples here. However, time has passed since Hickey’s analysis and it would be difficult or even impossible to access precise data prior to Hickey’s analysis at this time.


Adjusted Goal

Instead of trying to find out if there has been any change in Fandango’s rating system after Hickey’s analysis, we will try to determine if there is any difference between Fandango’s ratings for popular movies in 2015 and in 2016. If yes, then it could be an indication that Fandango has made changes.

Adjusted Populations of Interest

I will use Hickey’s benchmark of 30 ratings to define “popular” which is reasonable. That being said, a movie is considered “popular” in this case if it has at least 30 ratings.

Sample #1 now becomes representative, since it was randomly selected under the condition of having at lease 30 ratings.

However for Sample #2, it is unclear how the researcher defined “popular”. So it is imperative to check if the movies selected have at lease 30 ratings.


Based on the dataframe of Sample #2 above, there is no such a variable that shows a movie’s rating counts. Moreover, I wasn’t able to find the rating counts on Fandango pages of the movies, because Fandango has replaced the 5-Star Fan Ratings with the Rotten Tomatoes Audience Score in 2019 so now the rating counts are shown on Rottentomatoes.com.

Therefore, I am going to sample 10 observations from Sample #2 and find their rating counts from Rottentomatoes.com.

set.seed(1)
after_sample <- sample_n(fandango_after_cleaned, size = 10)

# Found rating counts of these 10 samples
sample_rating_counts <- tibble(reviews = c(7384, 7259, 7265, 12212, 30271, 281, 13684, 1212, 56851, 9222))

bind_cols(after_sample, sample_rating_counts)

All 10 movies randomly sampled received more than 30 ratings. It is safe to say that movies in Sample #2 are “popular”. Let’s move on.


III. Reshaping dataframes and extracting relevant data

Since now we are only interested in popular movies released in 2015 and 2016, I am going to extract only the relevant data for the following analysis.

library(stringr)
film_name_year <- str_split(fandango_before_cleaned$FILM, "\\(", simplify = TRUE)

fandango_2015 <- fandango_before_cleaned %>%
  mutate(FILM = film_name_year[,1], year = film_name_year[,2]) %>%
  mutate(year = str_sub(year, 1, 4)) %>%
  filter(year == "2015")

head(fandango_2015)
fandango_2016 <- fandango_after_cleaned %>%
  filter(year == "2016")

head(fandango_2016)

IV. Visualizing the frequency distributions to get an overview

First of all, I am going to use Kernel Density Plots to compare the distribution shapes of two samples in order to get a general overview.

library(ggplot2)

ggplot(data = fandango_2015,
       aes(x = Fandango_Stars)) +
       geom_density() +
       geom_density(data = fandango_2016, 
                    aes(x = fandango), color = "blue") +
       labs(title = "Comparing Distribution shapes for Fandango's ratings (2015 v.s. 2016)",
           x = "Ratings", y = "Density") +
       scale_x_continuous(breaks = seq(0, 5, by = 0.5), limits = c(0, 5))


V. Generating frequency distribution tables to further explore the distributions

Now I am going to generate frequency distribution tables for further details. Since these two samples have different counts of observations, I will use the relative frequency here instead of the absolute frequency, namely, percentages.

fandango_2015_freq <- fandango_2015 %>% 
  group_by(Fandango_Stars) %>% 
  summarize(Percentage = n() / nrow(fandango_2015) * 100)

print(fandango_2015_freq)

fandango_2016_freq <- fandango_2016 %>% 
  group_by(fandango) %>% 
  summarize(Percentage = n() / nrow(fandango_2016) * 100)

print(fandango_2016_freq)

It is clearly shown that there are significantly lower percentages of ratings between 4.5 - 5.0 stars and higher percentages of ratings below 3.5 - 4.0 stars in 2016 than in 2015.

Now I am going to compare some key summary statistics between two samples to get a more precise picture of the direction of the difference.

library(tidyr)
mode <- function(x) {
        ux <- unique(x)
        ux[which.max(tabulate(match(x, ux)))]
}

freq_summary_2015 <- fandango_2015 %>% 
  summarize(year = "2015",
    mean = mean(Fandango_Stars),
    median = median(Fandango_Stars),
    mode = mode(Fandango_Stars))

freq_summary_2016 <- fandango_2016 %>% 
  summarize(year = "2016",
            mean = mean(fandango),
            median = median(fandango),
            mode = mode(fandango))

summary_stat <- bind_rows(freq_summary_2015, freq_summary_2016)

summary_stat2 <- summary_stat %>% 
  gather(key = "statistic", value = "value", - year)

print(summary_stat2)

ggplot(data = summary_stat2, 
       aes(x = statistic, y = value, fill = year)) +
       geom_bar(stat = "identity", position = "dodge") +
       labs(title = "Comparing Summary Statistics: 2015 v.s. 2016", x = "", y = "Rating Stars") + 
       scale_fill_brewer(palette = "Paired")

summary_stat2 %>%
  filter(statistic == "mean") %>%
  summarize(`Change of Mean` = (value[2] - value[1]) / value[1] *100)

summary_stat2 %>%
  filter(statistic == "mode") %>%
  summarize(`Chang of Mode` = (value[2] - value[1]) / value[1] *100)

All three statistics above point to a conclusion that there is a difference between Fandango’s ratings for popular movies in 2015 (before Hickey’s analysis) and in 2016 (after Hickey’s analysis), and that the movie ratings were lower in 2016 than in 2015.


Summary

I started this project with the goal of finding out if there has been any change in Fandango’s rating system after Hickey’s analysis. However as I was examining the sample representativeness, I discovered that the samples are not representative of the population of interest for the initial goal. Therefore, I adopted an adjusted goal which is a fairly good proxy for the initial goal: to determine if there is any difference between Fandango’s ratings for popular movies in 2015 and in 2016.

So far, the analysis has shown us that there is indeed a difference between Fandango’s ratings for popular movies in 2015 and in 2016, and that the ratings were lower in 2016 than in 2015 on average. On one hand, it could be because the movies released in 2015 were overall “better” than the movies of 2016. On the other hand, it could be because Fandango did make changes to its rating system (for example, fixed the rounding off system). The cause of the difference definitely deserves a further and deeper analysis.

