Choose six recent popular movies. Ask at least five people that you know (friends, family, classmates, imaginary friends) to rate each of these movie that they have seen on a scale of 1 to 5. Take the results (observations) and store them in a SQL database. Load the information into an R dataframe.
Below I will connect MySQL database “popular” to R Studio
library(RMySQL)
## Loading required package: DBI
library(dbConnect)
## Loading required package: gWidgets
library(ggplot2)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following object is masked from 'package:gWidgets':
##
## id
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
mydb = dbConnect(MySQL(), user='root', password='haley', dbname='popular', host='localhost')
popular <- dbGetQuery(mydb, "select * from reviewers")
popular
## user_id raters_name
## 1 1 Haley
## 2 2 Nick
## 3 3 Alice
## 4 4 Mariano
## 5 5 John
## 6 6 Ana
Above I have queried the reviewers table, followed by the ratings table below.
popular <- dbGetQuery(mydb, "select * from ratings")
popular
## user_id movie_id rating
## 1 1 1 5.0
## 2 1 2 5.0
## 3 1 3 4.5
## 4 1 4 4.5
## 5 1 5 4.0
## 6 1 6 4.0
## 7 2 1 5.0
## 8 2 2 5.0
## 9 2 3 5.0
## 10 2 4 4.0
## 11 2 5 3.5
## 12 2 6 3.5
## 13 3 1 4.5
## 14 3 2 4.5
## 15 3 3 4.0
## 16 3 4 3.0
## 17 3 5 4.0
## 18 3 6 3.0
## 19 4 1 4.0
## 20 4 2 4.0
## 21 4 3 4.5
## 22 4 4 5.0
## 23 4 5 4.0
## 24 4 6 3.0
## 25 5 1 4.5
## 26 5 2 4.5
## 27 5 3 4.0
## 28 5 4 4.5
## 29 5 5 3.5
## 30 5 6 3.5
Here I have queried the popular movies table
popular <- dbGetQuery(mydb, "select * from popular_movies ")
popular
## movie_id title
## 1 1 Black Panther
## 2 2 The Avengers
## 3 3 Mission Impossible: Fall Out
## 4 4 Star Wars: The Force Awakens
## 5 5 Deadpool 2
## 6 6 Jurassic World: Falledn Kingdom
At this point, I will now create a join query to analyze the title of the movies and how rated them.
popular <- dbGetQuery(mydb,
"SELECT
user_id, title, ratings.movie_id, rating
FROM
popular_movies
JOIN
ratings
on popular_movies.movie_id = ratings.movie_id")
popular
## user_id title movie_id rating
## 1 1 Black Panther 1 5.0
## 2 1 The Avengers 2 5.0
## 3 1 Mission Impossible: Fall Out 3 4.5
## 4 1 Star Wars: The Force Awakens 4 4.5
## 5 1 Deadpool 2 5 4.0
## 6 1 Jurassic World: Falledn Kingdom 6 4.0
## 7 2 Black Panther 1 5.0
## 8 2 The Avengers 2 5.0
## 9 2 Mission Impossible: Fall Out 3 5.0
## 10 2 Star Wars: The Force Awakens 4 4.0
## 11 2 Deadpool 2 5 3.5
## 12 2 Jurassic World: Falledn Kingdom 6 3.5
## 13 3 Black Panther 1 4.5
## 14 3 The Avengers 2 4.5
## 15 3 Mission Impossible: Fall Out 3 4.0
## 16 3 Star Wars: The Force Awakens 4 3.0
## 17 3 Deadpool 2 5 4.0
## 18 3 Jurassic World: Falledn Kingdom 6 3.0
## 19 4 Black Panther 1 4.0
## 20 4 The Avengers 2 4.0
## 21 4 Mission Impossible: Fall Out 3 4.5
## 22 4 Star Wars: The Force Awakens 4 5.0
## 23 4 Deadpool 2 5 4.0
## 24 4 Jurassic World: Falledn Kingdom 6 3.0
## 25 5 Black Panther 1 4.5
## 26 5 The Avengers 2 4.5
## 27 5 Mission Impossible: Fall Out 3 4.0
## 28 5 Star Wars: The Force Awakens 4 4.5
## 29 5 Deadpool 2 5 3.5
## 30 5 Jurassic World: Falledn Kingdom 6 3.5
On this next query, I decided to add the “Reviwers” table to bring in the “Raters names”
popular <- dbGetQuery(mydb,
"
SELECT
ratings.user_id,raters_name, title, rating
FROM
ratings
JOIN
popular_movies ON popular_movies.movie_id = ratings.movie_id
JOIN
reviewers ON reviewers.user_id = ratings.user_id
ORDER BY
rating DESC
")
popular
## user_id raters_name title rating
## 1 2 Nick Black Panther 5.0
## 2 4 Mariano Star Wars: The Force Awakens 5.0
## 3 2 Nick The Avengers 5.0
## 4 1 Haley Black Panther 5.0
## 5 2 Nick Mission Impossible: Fall Out 5.0
## 6 1 Haley The Avengers 5.0
## 7 3 Alice Black Panther 4.5
## 8 4 Mariano Mission Impossible: Fall Out 4.5
## 9 3 Alice The Avengers 4.5
## 10 1 Haley Mission Impossible: Fall Out 4.5
## 11 5 John Black Panther 4.5
## 12 1 Haley Star Wars: The Force Awakens 4.5
## 13 5 John The Avengers 4.5
## 14 5 John Star Wars: The Force Awakens 4.5
## 15 1 Haley Jurassic World: Falledn Kingdom 4.0
## 16 2 Nick Star Wars: The Force Awakens 4.0
## 17 3 Alice Mission Impossible: Fall Out 4.0
## 18 4 Mariano Black Panther 4.0
## 19 5 John Mission Impossible: Fall Out 4.0
## 20 4 Mariano Deadpool 2 4.0
## 21 1 Haley Deadpool 2 4.0
## 22 4 Mariano The Avengers 4.0
## 23 3 Alice Deadpool 2 4.0
## 24 5 John Deadpool 2 3.5
## 25 5 John Jurassic World: Falledn Kingdom 3.5
## 26 2 Nick Deadpool 2 3.5
## 27 2 Nick Jurassic World: Falledn Kingdom 3.5
## 28 3 Alice Jurassic World: Falledn Kingdom 3.0
## 29 3 Alice Star Wars: The Force Awakens 3.0
## 30 4 Mariano Jurassic World: Falledn Kingdom 3.0
Now I will see what the average movie rating were from my raters.
ggplot(data = popular, aes(popular$title, popular$rating)) +
stat_summary(fun.y = mean,
geom = "bar", aes(fill = popular$title)) +
theme(axis.text.x=element_text(angle=45, hjust=1)) +
theme(legend.position="none") +
xlab("Movie Name") +
ylab("Rating") +
ggtitle("Avg Rating for Movies")

Conclusion:
Based on my analysis, my friends have chosen Black Panther to be one of the highest ratings out of the six movies; followed by The Avengers.