Choose six recent popular movies. Ask at least five people that you know (friends, family, classmates, imaginary friends) to rate each of these movie that they have seen on a scale of 1 to 5. Take the results (observations) and store them in a SQL database. Load the information into an R dataframe.Your deliverables should include your SQL scripts and your R Markdown code, posted to GitHub.
mySQLWorkbench code can be found here: https://github.com/KatherineEvers/Movies-Data/blob/master/movie
library(readr)
library(DT)
library(ggplot2)
movieDF <- read_csv("movie.csv")
## Parsed with column specification:
## cols(
## personID = col_double(),
## movieID = col_double(),
## rating = col_double(),
## ratingsID = col_double(),
## movieTitle = col_character(),
## rated = col_character(),
## name = col_character()
## )
DT::datatable(movieDF, editable = TRUE)
movieSubset <- subset(movieDF, select=c(movieTitle, rating, name))
DT::datatable(movieSubset, editable = TRUE)
table(movieSubset$movieTitle, movieSubset$rating)
##
## 1 2 3 4 5
## Aquaman 0 0 2 2 0
## Bohemian Rhapsody 0 0 1 2 2
## Bumblebee 0 1 1 3 0
## Mary Poppins Returns 1 1 1 0 0
## The Lego Movie 2 0 0 2 1 0
## The Upside 0 2 1 1 0
ratingMean <- aggregate(rating~movieTitle,movieSubset, mean, y = range(x$VALUE, na.rm=TRUE))
ratingMean
## movieTitle rating
## 1 Aquaman 3.500000
## 2 Bohemian Rhapsody 4.200000
## 3 Bumblebee 3.400000
## 4 Mary Poppins Returns 2.000000
## 5 The Lego Movie 2 3.333333
## 6 The Upside 2.750000
plot1 <- ggplot(data=ratingMean, aes(x=movieTitle, y=rating, fill=movieTitle)) +
geom_bar(stat = "identity") +
geom_text(aes(label = rating), vjust = -0.3) +
ggtitle("Movie Mean Ratings") +
ylab("Mean Rating") +
xlab("Movie") +
theme(axis.text.x = element_text(angle = 35, hjust = 1))
plot1
plot2 <- ggplot(movieSubset, aes(x=movieTitle, y=rating)) +
geom_boxplot(fill = "steelblue", alpha = 0.5) +
ggtitle("Boxplot of Mean Ratings") +
ylab("Mean Rating") +
xlab("Movie") +
theme(axis.text.x = element_text(angle = 35, hjust = 1))
plot2
Based on this data and analysis, Mary Poppins Returns received the poorest ratings with a mean of 2 out of 5 and Bohemian Rhapsody was best received with a mean rating of 4.2 out of 5. However, these results are limited by the small sample size.