Preparation

  1. Make sure you have MySQL installed. Then please run the following sql script in order to create the movie_review database:

Github_Link

  1. You will need to have the following R packages installed and loaded: RMySQL, dplyr, dbplyr

Loading the SQL tables into R

Be sure you use the correct username and password for your particular MySQL setup!

mr_db <- src_mysql(dbname = 'movie_reviews', username = 'user', password = 'password')

movie_df <- tbl(mr_db, "movie") %>% collect(n=Inf)
critic_df <- tbl(mr_db, "critic") %>% collect(n=Inf)
review_df <- tbl(mr_db, "review") %>% collect(n=Inf)

Analyzing the data

  1. First let’s see all of the reviews, sorted by movie title and rating, along with the critic’s name:
review_enhanced_df <- review_df %>% 
  left_join(movie_df, by="movie_id") %>% 
  left_join(critic_df, by="critic_id") %>% 
  arrange(title, desc(movie_rating))

knitr::kable(review_enhanced_df)
movie_id critic_id movie_rating title critic_name
1 2 3.9 Acquaman Beverly
1 3 3.9 Acquaman Charlie
1 5 3.5 Acquaman Edward
1 1 3.1 Acquaman Aaron
1 4 3.1 Acquaman Dolores
6 1 4.8 Annihilation Aaron
6 3 4.3 Annihilation Charlie
6 4 4.2 Annihilation Dolores
6 5 4.1 Annihilation Edward
6 2 3.1 Annihilation Beverly
2 4 5.0 Black Panther Dolores
2 5 5.0 Black Panther Edward
2 1 4.8 Black Panther Aaron
2 2 4.6 Black Panther Beverly
2 3 4.1 Black Panther Charlie
5 1 3.7 Mary Poppins Returns Aaron
5 2 3.5 Mary Poppins Returns Beverly
5 3 3.5 Mary Poppins Returns Charlie
5 5 3.1 Mary Poppins Returns Edward
5 4 2.6 Mary Poppins Returns Dolores
4 3 3.8 Ready Player One Charlie
4 5 3.8 Ready Player One Edward
4 4 3.7 Ready Player One Dolores
4 2 3.1 Ready Player One Beverly
4 1 2.5 Ready Player One Aaron
3 2 2.3 The Greatest Showman Beverly
3 4 2.3 The Greatest Showman Dolores
3 3 2.0 The Greatest Showman Charlie
3 5 1.8 The Greatest Showman Edward
3 1 1.6 The Greatest Showman Aaron
  1. Which movie had the highest average rating?
average_ratings_df <- review_enhanced_df %>% 
  group_by(movie_id) %>% 
  summarise(movie_title=tail(title,1),
            avg_rating=mean(movie_rating, na.rm=T)) %>% 
  ungroup() %>% arrange(desc(avg_rating))
knitr::kable(average_ratings_df)
movie_id movie_title avg_rating
2 Black Panther 4.70
6 Annihilation 4.10
1 Acquaman 3.50
4 Ready Player One 3.38
5 Mary Poppins Returns 3.28
3 The Greatest Showman 2.00
  1. On average, what rating did each critic give across all movies?
critic_ratings_df <- review_enhanced_df %>% 
  group_by(critic_id) %>% 
  summarise(name=tail(critic_name,1),
            avg_rating=mean(movie_rating, na.rm=T)) %>% 
  ungroup() %>% arrange(desc(avg_rating))
knitr::kable(critic_ratings_df)
critic_id name avg_rating
3 Charlie 3.600000
5 Edward 3.550000
4 Dolores 3.483333
1 Aaron 3.416667
2 Beverly 3.416667

As expected, the spread of ratings by movie is wider than the spread by critic.

boxplot(critic_ratings_df$avg_rating, average_ratings_df$avg_rating, names=c("Ratings, by Critic", "Ratings, by Movie"))