Preparation
- Make sure you have MySQL installed. Then please run the following sql script in order to create the movie_review database:
Github_Link
- You will need to have the following R packages installed and loaded: RMySQL, dplyr, dbplyr
Loading the SQL tables into R
Be sure you use the correct username and password for your particular MySQL setup!
mr_db <- src_mysql(dbname = 'movie_reviews', username = 'user', password = 'password')
movie_df <- tbl(mr_db, "movie") %>% collect(n=Inf)
critic_df <- tbl(mr_db, "critic") %>% collect(n=Inf)
review_df <- tbl(mr_db, "review") %>% collect(n=Inf)
Analyzing the data
- First let’s see all of the reviews, sorted by movie title and rating, along with the critic’s name:
review_enhanced_df <- review_df %>%
left_join(movie_df, by="movie_id") %>%
left_join(critic_df, by="critic_id") %>%
arrange(title, desc(movie_rating))
knitr::kable(review_enhanced_df)
| 1 |
2 |
3.9 |
Acquaman |
Beverly |
| 1 |
3 |
3.9 |
Acquaman |
Charlie |
| 1 |
5 |
3.5 |
Acquaman |
Edward |
| 1 |
1 |
3.1 |
Acquaman |
Aaron |
| 1 |
4 |
3.1 |
Acquaman |
Dolores |
| 6 |
1 |
4.8 |
Annihilation |
Aaron |
| 6 |
3 |
4.3 |
Annihilation |
Charlie |
| 6 |
4 |
4.2 |
Annihilation |
Dolores |
| 6 |
5 |
4.1 |
Annihilation |
Edward |
| 6 |
2 |
3.1 |
Annihilation |
Beverly |
| 2 |
4 |
5.0 |
Black Panther |
Dolores |
| 2 |
5 |
5.0 |
Black Panther |
Edward |
| 2 |
1 |
4.8 |
Black Panther |
Aaron |
| 2 |
2 |
4.6 |
Black Panther |
Beverly |
| 2 |
3 |
4.1 |
Black Panther |
Charlie |
| 5 |
1 |
3.7 |
Mary Poppins Returns |
Aaron |
| 5 |
2 |
3.5 |
Mary Poppins Returns |
Beverly |
| 5 |
3 |
3.5 |
Mary Poppins Returns |
Charlie |
| 5 |
5 |
3.1 |
Mary Poppins Returns |
Edward |
| 5 |
4 |
2.6 |
Mary Poppins Returns |
Dolores |
| 4 |
3 |
3.8 |
Ready Player One |
Charlie |
| 4 |
5 |
3.8 |
Ready Player One |
Edward |
| 4 |
4 |
3.7 |
Ready Player One |
Dolores |
| 4 |
2 |
3.1 |
Ready Player One |
Beverly |
| 4 |
1 |
2.5 |
Ready Player One |
Aaron |
| 3 |
2 |
2.3 |
The Greatest Showman |
Beverly |
| 3 |
4 |
2.3 |
The Greatest Showman |
Dolores |
| 3 |
3 |
2.0 |
The Greatest Showman |
Charlie |
| 3 |
5 |
1.8 |
The Greatest Showman |
Edward |
| 3 |
1 |
1.6 |
The Greatest Showman |
Aaron |
- Which movie had the highest average rating?
average_ratings_df <- review_enhanced_df %>%
group_by(movie_id) %>%
summarise(movie_title=tail(title,1),
avg_rating=mean(movie_rating, na.rm=T)) %>%
ungroup() %>% arrange(desc(avg_rating))
knitr::kable(average_ratings_df)
| 2 |
Black Panther |
4.70 |
| 6 |
Annihilation |
4.10 |
| 1 |
Acquaman |
3.50 |
| 4 |
Ready Player One |
3.38 |
| 5 |
Mary Poppins Returns |
3.28 |
| 3 |
The Greatest Showman |
2.00 |
- On average, what rating did each critic give across all movies?
critic_ratings_df <- review_enhanced_df %>%
group_by(critic_id) %>%
summarise(name=tail(critic_name,1),
avg_rating=mean(movie_rating, na.rm=T)) %>%
ungroup() %>% arrange(desc(avg_rating))
knitr::kable(critic_ratings_df)
| 3 |
Charlie |
3.600000 |
| 5 |
Edward |
3.550000 |
| 4 |
Dolores |
3.483333 |
| 1 |
Aaron |
3.416667 |
| 2 |
Beverly |
3.416667 |
As expected, the spread of ratings by movie is wider than the spread by critic.
boxplot(critic_ratings_df$avg_rating, average_ratings_df$avg_rating, names=c("Ratings, by Critic", "Ratings, by Movie"))
