SQL and R

Elina Azrilyan

8 September 2018

Choose six recent popular movies. Ask at least five people that you know (friends, family, classmates, imaginary friends) to rate each of these movie that they have seen on a scale of 1 to 5. Take the results (observations) and store them in a SQL database. Load the information into an R dataframe.

Your deliverables should include your SQL scripts and your R Markdown code, posted to GitHub. This is by design a very open ended assignment. A variety of reasonable approaches are acceptable. You can (and should) blank out your SQL password if your solution requires it; otherwise, full credit requires that your code is “reproducible,” with the assumption that I have the same database server and R software. You may work in a small group.

# loading package
# install.packages("RMySQL")
# install.packages("dplyr")
library(RMySQL)
## Loading required package: DBI
Establish connection
con <- dbConnect(MySQL(), user='root', password='root', dbname='MR', host='localhost')
Run a query to create a table which now contains the summary of movies and their respective ratings
mr <- dbGetQuery(con, "SELECT m.movieid AS ID, m.movietitle AS Title, m.releaseyear AS 'Year', m.genre AS Genre, m.rtscore AS 'RTScore', r.friendname, r.friendrating
  FROM Movies m
    INNER JOIN Ratings r ON m.movieid = r.movieid
  ORDER BY m.movieid;")

# Disconecting from Database
dbDisconnect(con)
## [1] TRUE
head(mr)
##   ID          Title Year              Genre RTScore friendname
## 1  1 Doctor Strange 2016 Action & Adventure      89       Anna
## 2  1 Doctor Strange 2016 Action & Adventure      89     Andrey
## 3  1 Doctor Strange 2016 Action & Adventure      89    Marusya
## 4  1 Doctor Strange 2016 Action & Adventure      89      Lilya
## 5  1 Doctor Strange 2016 Action & Adventure      89       Alla
## 6  2 Thor: Ragnarok 2017 Action & Adventure      92       Anna
##   friendrating
## 1            4
## 2            5
## 3            4
## 4            4
## 5            5
## 6            3
Calculate average movie rating based on opinion of my friends. RTScore = Rotten Tomatoes Score.
library(dplyr)
## Warning: package 'dplyr' was built under R version 3.5.1
newmr <- mr %>%
 group_by(Title,Year, Genre, RTScore) %>%
 summarise(AvgRating = mean(friendrating))
newmrframe <- as.data.frame(newmr)
newmrframe
##                              Title Year              Genre RTScore
## 1            Avengers Infinity War 2018 Action & Adventure      84
## 2                    Black Panther 2018 Action & Adventure      97
## 3                   Doctor Strange 2016 Action & Adventure      89
## 4               The Imitation Game 2014              Drama      90
## 5                   Thor: Ragnarok 2017 Action & Adventure      92
## 6 TO ALL THE BOYS IVE LOVED BEFORE 2018            Romance      95
##   AvgRating
## 1       3.4
## 2       3.0
## 3       4.4
## 4       4.2
## 5       4.0
## 6       4.0
Let’s convert Rotten Tomatoes Score to a 5 point scale to make comparison easier.
newmrframe$RTScoreConv<-newmrframe$RTScore/100*5
newmrframe
##                              Title Year              Genre RTScore
## 1            Avengers Infinity War 2018 Action & Adventure      84
## 2                    Black Panther 2018 Action & Adventure      97
## 3                   Doctor Strange 2016 Action & Adventure      89
## 4               The Imitation Game 2014              Drama      90
## 5                   Thor: Ragnarok 2017 Action & Adventure      92
## 6 TO ALL THE BOYS IVE LOVED BEFORE 2018            Romance      95
##   AvgRating RTScoreConv
## 1       3.4        4.20
## 2       3.0        4.85
## 3       4.4        4.45
## 4       4.2        4.50
## 5       4.0        4.60
## 6       4.0        4.75
The obvious difference in scores is due to the sample size I was using. A sample size of 5 is clearly not large enough. Opinion of 1 person can skew the results where significantly.
Let’s identify which genre my friends prefer.
newmrfriend <- mr %>%
 group_by(Genre) %>%
 summarise(AvgRating = mean(friendrating))
newmrfriendframe <- as.data.frame(newmrfriend)
newmrfriendframe
##                Genre AvgRating
## 1 Action & Adventure       3.7
## 2              Drama       4.2
## 3            Romance       4.0
My friends seem to prefer Drama.
Let’s identify who is the harshest critic.
newmrcr <- mr %>%
 group_by(friendname) %>%
 summarise(AvgRating = round(mean(friendrating), digits = 2))
newmrcrframe <- as.data.frame(newmrcr)
newmrcrframe
##   friendname AvgRating
## 1       Alla      3.83
## 2     Andrey      4.17
## 3       Anna      3.83
## 4      Lilya      3.50
## 5    Marusya      3.83
Not surprisingly, my picky friend Lilya is showing the lowest average results and Andrey who likes Action movies is showing the highest score (3 out of 6 films were from that Genre).