Introduction:
This rpub contains data for several movies with 5 user reviews for each. All users are rating the same movies.
This R markdown uses Azure MySQL to retrieve data.
There is a movies table, a users table, and a movieratings table. There is a many to many relationship between users and movieratings.
### Install Required Packages
install.packages("DBI", repos = "http://cran.us.r-project.org")
install.packages("tidyverse",repos = "http://cran.us.r-project.org")
install.packages("magrittr", repos = "http://cran.us.r-project.org")
install.packages("dplyr", repos = "http://cran.us.r-project.org")
### Load Required Packages
library(DBI)
library(tidyverse)
## ── Attaching packages ────────────────────────────────────────────────────────────────────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 3.1.0 ✔ purrr 0.3.0
## ✔ tibble 2.0.1 ✔ dplyr 0.7.8
## ✔ tidyr 0.8.2 ✔ stringr 1.4.0
## ✔ readr 1.3.1 ✔ forcats 0.3.0
## ── Conflicts ───────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
library(magrittr)
##
## Attaching package: 'magrittr'
## The following object is masked from 'package:purrr':
##
## set_names
## The following object is masked from 'package:tidyr':
##
## extract
library(dplyr)
In this section we will make a connection to a Azure MySQL hosting the schema referenced above
### Create SQL Connection
con <- dbConnect(RMariaDB::MariaDB(), user='rreadonly@cunyspsmysql', password="", dbname='moviereviews' ,host='cunyspsmysql.mysql.database.azure.com')
### Get a list of all SQL tables
dbListTables(con)
## [1] "movierating" "movies" "users"
## [4] "users_movierating"
In this section we will pull the revelent information into a dataframe.
## Gather all important fields from the SQL DB for analysis
mysql_movies <- dbGetQuery(con, 'select u.username, u.firstname, u.lastname, m.title, m.category, m.length, mr.rating, mr.review from users as u
join users_movierating as umr on u.userid = umr.userid
join movierating as mr on umr.movieratingid = mr.movieratingid
join movies as m on mr.movieid = m.movieid;')
## Let's look at the first 10 rows
head(mysql_movies, n= 10)
## username firstname lastname title category
## 1 RussellS Russell S Mission Impossible: Fallout action
## 2 RussellS Russell S Avengers: Infinity War action
## 3 RussellS Russell S The Quiet Place horror
## 4 RussellS Russell S DeadPool 2 comedy
## 5 RussellS Russell S Incredibles 2 animation
## 6 RussellS Russell S The Equalizer 2 crime
## 7 JoelH Joel H Mission Impossible: Fallout action
## 8 JoelH Joel H Avengers: Infinity War action
## 9 JoelH Joel H The Quiet Place horror
## 10 JoelH Joel H DeadPool 2 comedy
## length rating
## 1 148 5
## 2 149 2
## 3 90 5
## 4 119 5
## 5 118 5
## 6 121 3
## 7 148 4
## 8 149 5
## 9 90 4
## 10 119 3
## review
## 1 The best thrilling movie ever made! With the best movie character ever portrayed: Tom Cruise as Ethan Hunt!
## 2 At stake, as ever, is the fate of the universe. Why must it always be the universe? What's wrong with the fate of Hackensack? Doesn't anyone care what happens to South Dakota, or Denmark, or Peru?
## 3 This is a movie about the sound of fear, but it gives us a great deal more to listen to.
## 4 Deadpool 2 is an R-rated, potty-mouthed splatterfest and a funny one.
## 5 How great to see them back in this funny, exciting, if less audacious film.
## 6 Once it's over, you won't necessarily be itching for an Equalizer 3.
## 7 While Mission Impossible will certainly require rewatching a couple of times to fully fathom the convolutions of the plot, its nonetheless one very well constructed and directed peice of movie making. The tension never lets up and it kept me intrigued right through to the final action set piece that's so ludicrous I didn't know whether to laugh, cheer or throw something at the screen. The accompanying score goes a long way to making it more acceptable and to be fair this is still my favourite movie in the franchise and I do like them all.
## 8 Never has the Marvel Cinematic Universe seemed like more of a universe, in ways both good and bad. Infinity War-the title is almost too apt-is far from a perfect movie, but it is probably close to the best movie it could have been.
## 9 A Quiet Place is an undoubtedly taxing affair for the nerves; fortunately, it's also a deeply affecting one.
## 10 For all the impulsive flamboyance of Deadpool's patter, the liberating power of personal virtue, and the disinhibiting promise of second chances, Deadpool 2 feels narrowly impersonal and oppressively unfree.
## Check the dimensions of the data
dim(mysql_movies)
## [1] 30 8
## Let's view our table headers
names(mysql_movies)
## [1] "username" "firstname" "lastname" "title" "category" "length"
## [7] "rating" "review"
I want to group and create graphs based on the rating and create a chart. For this section, I will use dplyr
## Group the titles by name and average the scores using dplyr.
avg_movie_rating <- mysql_movies %>% group_by(title) %>% summarise(AvgRating=mean(rating))
Graph: Average Movie Review for each title
## Graph movie titles vs the average score.
gg <- ggplot(data = avg_movie_rating) + geom_point(mapping =aes(x= title, y=AvgRating, color = AvgRating))
gg <- gg + theme(axis.text.x = element_text(angle = 90), axis.line = element_line(color = "blue", size = 1, linetype = "solid")) + xlab("Movie Title") + ylab("Average User Review")
gg + ggtitle("Average Movie Score Review")
##
Now, let’s create a boxplot to determine how movie ratings compared for each category
## Group Movies by category and average score using dplyr
avg_movie_rating <- mysql_movies %>% group_by(title) %>% summarise(AvgRating=mean(rating))
gg2 <- ggplot(data = mysql_movies, aes(x =category, y=rating)) + geom_boxplot(fill='#A4A4A4', color='blue')
gg2 <- gg2 + theme(axis.text.x = element_text(angle = 0)) + xlab("Movie Category") + ylab("User Rating")
gg2 + ggtitle("User Rating per movie category")
Summary
Based on the sample of movies selected, it appears that animation and horror had a good year. We would need a much larger sample to determine which movies were the most popular.
dbDisconnect(con)