Data 607 Week 2 Assignment

Introduction

This week’s assignment called for students to choose six recent popular movies and to solicit scored reviews from 1 to 5 of those films from at least five people. The results were to be stored in a SQL database and then migrated into a data frame in R. I chose the top ten highest rated movies released since 1990 from IMDB’s “Top 250” Titles list (IMDB Top 250) for subsequent reviewing. I collected ratings from 5 friends and family members by phone, then entered those scores as well as my own into a MySQL database along with data about each of the films that I sourced from IMDB, including year of release, genre(s), MPAA rating, runtime, and IMDB rating.

The SQL script I wrote to create a movie reviews database and the relevant tables, insert the collected data into the tables, and output a result set as a .csv file with headers to a public folder on my machine can be found here: SQL Script. The .csv file was copied and pasted from the public folder into the working directory for my R project, which out of convenience also served as my GitHub repository for this assignment, and can be accessed here: Week 2 Assignment Repository. With this approach, the .csv file could be easily read into R from the remote GitHub repository rather than from a local source.

Load required libraries

library(readr)
library(ggplot2)

Import CSV file from GitHub repository into R data frame

reviews <- read_csv("https://github.com/juddanderman/Week2_Assignment/raw/master/movie_reviews.csv", 
    col_names = TRUE, na = c("\\N"))
reviews <- as.data.frame(reviews)

Check for successful data import

reviews <- as.data.frame(reviews)
is.data.frame(reviews)

## [1] TRUE

head(reviews)

##                                           title release_year imdb_rating
## 1                      The Shawshank Redemption         1994         9.3
## 2                               The Dark Knight         2008         8.9
## 3                                  Pulp Fiction         1994         8.9
## 4                              Schindler's List         1993         8.9
## 5 The Lord of the Rings: The Return of the King         2003         8.9
## 6                                    Fight Club         1999         8.8
##                     genre mpaa_rating  gross  director critic
## 1             Crime/Drama           R  28.34  Darabont    SA1
## 2      Action/Crime/Drama       PG-13 533.32     Nolan    SA1
## 3             Crime/Drama           R 107.93 Tarantino    SA1
## 4 Biography/Drama/History           R  96.07 Spielberg    SA1
## 5  Action/Adventure/Drama       PG-13 377.02   Jackson    SA1
## 6                   Drama           R  37.02   Fincher    SA1
##   critic_gender critic_score
## 1             m            5
## 2             m            3
## 3             m            4
## 4             m            5
## 5             m            4
## 6             m            3

Visualizing the `reviews` data frame

Below are bar plots of my personal review scores, mean scores by reviewer gender, and mean scores across my small sample of amateur film critics. Perhaps not surprisingly given my method of selecting films, the typical scores provided by my reviewers were quite high, with a median value of 4 and a mean of 4.226.

myscores <- subset(reviews, critic == "JA3", select = c(title, 
    critic_score))

ggplot(myscores, aes(x = title, y = critic_score, fill = title)) + 
    geom_bar(stat = "identity") + ggtitle("My Reviews") + xlab("Film") + 
    ylab("Score") + guides(fill = FALSE) + theme(axis.text.x = element_text(angle = 45, 
    hjust = 1))

gen_avg <- data.frame(c(mean(reviews$critic_score[reviews$critic_gender == 
    "f"], na.rm = TRUE), Male = mean(reviews$critic_score[reviews$critic_gender == 
    "m"], na.rm = TRUE)), c("Female", "Male"))
colnames(gen_avg) <- c("mean_score", "gender")

ggplot(gen_avg, aes(x = gender, y = mean_score, fill = gender)) + 
    geom_bar(stat = "identity") + ggtitle("Average Scores by Gender") + 
    xlab("Gender") + ylab("Mean Score") + guides(fill = FALSE)

meanscores <- aggregate(reviews$critic_score ~ reviews$critic, 
    reviews, mean)
colnames(meanscores) <- c("critic", "mean_score")

ggplot(meanscores, aes(x = critic, y = mean_score, fill = critic)) + 
    geom_bar(stat = "identity") + ggtitle("Average Scores by Critic") + 
    xlab("Reviewer") + ylab("Mean Score") + guides(fill = FALSE)

Data 607 Week 2 Assignment

Judd Anderman

September 11, 2016

Introduction

Load required libraries

Import CSV file from GitHub repository into R data frame

Check for successful data import

Visualizing the `reviews` data frame

Data 607 Week 2 Assignment

Judd Anderman

September 11, 2016

Introduction

Load required libraries

Import CSV file from GitHub repository into R data frame

Check for successful data import

Visualizing the reviews data frame

Visualizing the `reviews` data frame