This week’s assignment called for students to choose six recent popular movies and to solicit scored reviews from 1 to 5 of those films from at least five people. The results were to be stored in a SQL database and then migrated into a data frame in R. I chose the top ten highest rated movies released since 1990 from IMDB’s “Top 250” Titles list (IMDB Top 250) for subsequent reviewing. I collected ratings from 5 friends and family members by phone, then entered those scores as well as my own into a MySQL database along with data about each of the films that I sourced from IMDB, including year of release, genre(s), MPAA rating, runtime, and IMDB rating.
The SQL script I wrote to create a movie reviews database and the relevant tables, insert the collected data into the tables, and output a result set as a .csv file with headers to a public folder on my machine can be found here: SQL Script. The .csv file was copied and pasted from the public folder into the working directory for my R project, which out of convenience also served as my GitHub repository for this assignment, and can be accessed here: Week 2 Assignment Repository. With this approach, the .csv file could be easily read into R from the remote GitHub repository rather than from a local source.
library(readr)
library(ggplot2)
reviews <- read_csv("https://github.com/juddanderman/Week2_Assignment/raw/master/movie_reviews.csv",
col_names = TRUE, na = c("\\N"))
reviews <- as.data.frame(reviews)
reviews <- as.data.frame(reviews)
is.data.frame(reviews)
## [1] TRUE
head(reviews)
## title release_year imdb_rating
## 1 The Shawshank Redemption 1994 9.3
## 2 The Dark Knight 2008 8.9
## 3 Pulp Fiction 1994 8.9
## 4 Schindler's List 1993 8.9
## 5 The Lord of the Rings: The Return of the King 2003 8.9
## 6 Fight Club 1999 8.8
## genre mpaa_rating gross director critic
## 1 Crime/Drama R 28.34 Darabont SA1
## 2 Action/Crime/Drama PG-13 533.32 Nolan SA1
## 3 Crime/Drama R 107.93 Tarantino SA1
## 4 Biography/Drama/History R 96.07 Spielberg SA1
## 5 Action/Adventure/Drama PG-13 377.02 Jackson SA1
## 6 Drama R 37.02 Fincher SA1
## critic_gender critic_score
## 1 m 5
## 2 m 3
## 3 m 4
## 4 m 5
## 5 m 4
## 6 m 3
reviews data frameBelow are bar plots of my personal review scores, mean scores by reviewer gender, and mean scores across my small sample of amateur film critics. Perhaps not surprisingly given my method of selecting films, the typical scores provided by my reviewers were quite high, with a median value of 4 and a mean of 4.226.
myscores <- subset(reviews, critic == "JA3", select = c(title,
critic_score))
ggplot(myscores, aes(x = title, y = critic_score, fill = title)) +
geom_bar(stat = "identity") + ggtitle("My Reviews") + xlab("Film") +
ylab("Score") + guides(fill = FALSE) + theme(axis.text.x = element_text(angle = 45,
hjust = 1))
gen_avg <- data.frame(c(mean(reviews$critic_score[reviews$critic_gender ==
"f"], na.rm = TRUE), Male = mean(reviews$critic_score[reviews$critic_gender ==
"m"], na.rm = TRUE)), c("Female", "Male"))
colnames(gen_avg) <- c("mean_score", "gender")
ggplot(gen_avg, aes(x = gender, y = mean_score, fill = gender)) +
geom_bar(stat = "identity") + ggtitle("Average Scores by Gender") +
xlab("Gender") + ylab("Mean Score") + guides(fill = FALSE)
meanscores <- aggregate(reviews$critic_score ~ reviews$critic,
reviews, mean)
colnames(meanscores) <- c("critic", "mean_score")
ggplot(meanscores, aes(x = critic, y = mean_score, fill = critic)) +
geom_bar(stat = "identity") + ggtitle("Average Scores by Critic") +
xlab("Reviewer") + ylab("Mean Score") + guides(fill = FALSE)