Assignment:

Choose six recent popular movies. Ask at least five people that you know (friends, family, classmates, imaginary friends) to rate each of these movie that they have seen on a scale of 1 to 5. Take the results (observations) and store them in a SQL database. Load the information into an R dataframe.Your deliverables should include your SQL scripts and your R Markdown code, posted to GitHub.

Load libraries and import csv file:

mySQLWorkbench code can be found here: https://github.com/KatherineEvers/Movies-Data/blob/master/movie

library(readr)
library(DT)
library(ggplot2)

movieDF <- read_csv("movie.csv")
## Parsed with column specification:
## cols(
##   personID = col_double(),
##   movieID = col_double(),
##   rating = col_double(),
##   ratingsID = col_double(),
##   movieTitle = col_character(),
##   rated = col_character(),
##   name = col_character()
## )
DT::datatable(movieDF, editable = TRUE)

Create a subset of the data frame:

movieSubset <- subset(movieDF, select=c(movieTitle, rating, name))
DT::datatable(movieSubset, editable = TRUE)

Create a table that shows a count of each rating by movie:

table(movieSubset$movieTitle, movieSubset$rating)
##                       
##                        1 2 3 4 5
##   Aquaman              0 0 2 2 0
##   Bohemian Rhapsody    0 0 1 2 2
##   Bumblebee            0 1 1 3 0
##   Mary Poppins Returns 1 1 1 0 0
##   The Lego Movie 2     0 0 2 1 0
##   The Upside           0 2 1 1 0

Find the mean rating of each movie:

ratingMean <- aggregate(rating~movieTitle,movieSubset, mean, y = range(x$VALUE, na.rm=TRUE))
ratingMean
##             movieTitle   rating
## 1              Aquaman 3.500000
## 2    Bohemian Rhapsody 4.200000
## 3            Bumblebee 3.400000
## 4 Mary Poppins Returns 2.000000
## 5     The Lego Movie 2 3.333333
## 6           The Upside 2.750000

Visualize the data by creating a histogram and box plots:

plot1 <- ggplot(data=ratingMean, aes(x=movieTitle, y=rating, fill=movieTitle)) +
        geom_bar(stat = "identity") +
        geom_text(aes(label = rating), vjust = -0.3) +
        ggtitle("Movie Mean Ratings") +
        ylab("Mean Rating") +
        xlab("Movie") +
        theme(axis.text.x = element_text(angle = 35, hjust = 1))
plot1

plot2 <- ggplot(movieSubset, aes(x=movieTitle, y=rating)) +
         geom_boxplot(fill = "steelblue", alpha = 0.5) +
         ggtitle("Boxplot of Mean Ratings") +
         ylab("Mean Rating") +
         xlab("Movie") +
         theme(axis.text.x = element_text(angle = 35, hjust = 1))
plot2

Based on this data and analysis, Mary Poppins Returns received the poorest ratings with a mean of 2 out of 5 and Bohemian Rhapsody was best received with a mean rating of 4.2 out of 5. However, these results are limited by the small sample size.