CUNY 607 Week 2 Assignment

In this week’s assignment, I had chosen 10 recent movies and had asked my family and friends to give a 1 to 5 rating. The information is saved in an SQL that has been uploaded to GitHub, which can be downloaded.

https://raw.githubusercontent.com/jcp9010/MSDA/master/movies.sql

Below is connecting the SQL database to R via package RMySQL.

library(RMySQL)
## Loading required package: DBI
#Creating a connection with the MySQL Database and showing available tables
#To access my MySQL, there is an and password associated on my local machine.
#For reproducibility, the user = "root", password = "MSDA" may need to be replaced with your user and password or deleted from your device, depending on the way your MySQL was set up.
con <- dbConnect(RMySQL::MySQL(), dbname = "movies", user = "root", password = "MSDA", host = "localhost")
dbListTables(con)
## [1] "complete_ratings" "movies"           "ratings"         
## [4] "reviewers"        "sample"
# Assigning data.frame variables in R from MySQL
reviewers.query <- "SELECT * FROM reviewers;"
movies.query <- "SELECT * FROM movies;"
ratings.query <- "SELECT * FROM ratings;"
complete_ratings.query <- "SELECT * FROM complete_ratings;"

reviewers <- dbGetQuery(con, reviewers.query)
movies <- dbGetQuery(con, movies.query)
ratings <- dbGetQuery(con, ratings.query)

# In this case, I will demonstrate another way of obtaining a query from MySQL into R using the dbSendQuery.

rs <- dbSendQuery(con, complete_ratings.query)
complete_ratings <- dbFetch(rs)
dbHasCompleted(rs)
## [1] TRUE
dbClearResult(rs)
## [1] TRUE
# Now to close out the con connection.
dbDisconnect(con)
## [1] TRUE
complete_ratings
##     name                   movie_name review
## 1    S.D Star Wars: The Force Awakens      5
## 2    S.D           Mad Max: Fury Road      4
## 3    S.D                 Interstellar      5
## 4    S.D      The Wolf of Wall Street     NA
## 5    S.D Rogue One: A Star Wars Story     NA
## 6    S.D               The Accountant      5
## 7    S.D  Keeping Up with the Joneses     NA
## 8    S.D                      Ben-Hur      3
## 9    S.D                Casion Royale      4
## 10   S.D              Bridge of Spies      5
## 11   A.D Star Wars: The Force Awakens      5
## 12   A.D           Mad Max: Fury Road     NA
## 13   A.D                 Interstellar     NA
## 14   A.D      The Wolf of Wall Street      3
## 15   A.D Rogue One: A Star Wars Story     NA
## 16   A.D               The Accountant      4
## 17   A.D  Keeping Up with the Joneses      2
## 18   A.D                      Ben-Hur      3
## 19   A.D                Casion Royale      4
## 20   A.D              Bridge of Spies      4
## 21   J.P Star Wars: The Force Awakens      5
## 22   J.P           Mad Max: Fury Road      2
## 23   J.P                 Interstellar      5
## 24   J.P      The Wolf of Wall Street      2
## 25   J.P Rogue One: A Star Wars Story      5
## 26   J.P               The Accountant      4
## 27   J.P  Keeping Up with the Joneses      3
## 28   J.P                      Ben-Hur      4
## 29   J.P                Casion Royale      4
## 30   J.P              Bridge of Spies      5
## 31   P.P Star Wars: The Force Awakens     NA
## 32   P.P           Mad Max: Fury Road      2
## 33   P.P                 Interstellar      3
## 34   P.P      The Wolf of Wall Street     NA
## 35   P.P Rogue One: A Star Wars Story      5
## 36   P.P               The Accountant      4
## 37   P.P  Keeping Up with the Joneses      4
## 38   P.P                      Ben-Hur      2
## 39   P.P                Casion Royale      3
## 40   P.P              Bridge of Spies      5
## 41  C.G. Star Wars: The Force Awakens      4
## 42  C.G.           Mad Max: Fury Road      5
## 43  C.G.                 Interstellar      4
## 44  C.G.      The Wolf of Wall Street      5
## 45  C.G. Rogue One: A Star Wars Story     NA
## 46  C.G.               The Accountant     NA
## 47  C.G.  Keeping Up with the Joneses     NA
## 48  C.G.                      Ben-Hur      3
## 49  C.G.                Casion Royale      4
## 50  C.G.              Bridge of Spies      4
## 51  C.H. Star Wars: The Force Awakens      4
## 52  C.H.           Mad Max: Fury Road     NA
## 53  C.H.                 Interstellar      4
## 54  C.H.      The Wolf of Wall Street      1
## 55  C.H. Rogue One: A Star Wars Story      4
## 56  C.H.               The Accountant      3
## 57  C.H.  Keeping Up with the Joneses      4
## 58  C.H.                      Ben-Hur     NA
## 59  C.H.                Casion Royale      5
## 60  C.H.              Bridge of Spies      5
## 61 Craig Star Wars: The Force Awakens      4
## 62 Craig           Mad Max: Fury Road     NA
## 63 Craig                 Interstellar      4
## 64 Craig      The Wolf of Wall Street      1
## 65 Craig Rogue One: A Star Wars Story      4
## 66 Craig               The Accountant      3
## 67 Craig  Keeping Up with the Joneses      4
## 68 Craig                      Ben-Hur     NA
## 69 Craig                Casion Royale      5
## 70 Craig              Bridge of Spies      5
## 71 Jason Star Wars: The Force Awakens      5
## 72 Jason           Mad Max: Fury Road      4
## 73 Jason                 Interstellar      4
## 74 Jason      The Wolf of Wall Street      5
## 75 Jason Rogue One: A Star Wars Story      5
## 76 Jason               The Accountant      3
## 77 Jason  Keeping Up with the Joneses      4
## 78 Jason                      Ben-Hur      4
## 79 Jason                Casion Royale      5
## 80 Jason              Bridge of Spies      5

With the available information, this is an opportunity to perform some analysis of the information that is provided.

Below is some information to help characterize the age of the reviewers.

summary(reviewers$age)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   26.00   28.00   30.50   37.25   41.75   60.00
hist(reviewers$age, xlab = "Age", main = "Histogram of Reviewer's Age", breaks = 20)

Below are the number of movie genres for the movies reviewed.

genres <- table(movies$genre)
genres
## 
## Action Comedy  Crime  Drama Sci-Fi 
##      4      1      1      1      3

The next step is to find the average rating of each movie.

mov <- table(complete_ratings$movie_name, complete_ratings$review)
rating.scale <- c(1,2,3,4,5)

movie.sum <- vector()

for (i in 1:nrow(mov)){
  total <- 0
  j <- 1
  for (j in 1:5){
   total <- total + mov[i,j]*rating.scale[j]
   j <- j + 1
  }
  movie.sum <- c(movie.sum, total)
}

movie.total.count <- vector()

for (i in 1:nrow(mov)){
  movie.total.count[i] <- sum(mov[i,])
}

movie.avg <- data.frame(Movie = unique(complete_ratings$movie_name), Ratings = movie.sum/movie.total.count)
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.3.2
pl <- ggplot(movie.avg, aes(x= Movie, y= Ratings)) 
pl2 <- pl + geom_point(aes(color = 'blue', alpha = 0.5, size = 5)) + ggtitle("Average Ratings for the Movies")
pl3 <- pl2 + theme(axis.text.x = element_text(angle = 90, hjust = 1))
pl3

Above were some examples how databases can be imported from MySQL, and the data can be manipulated and analyzed in R.