R Markdown

we start by accessing the database. To do this, run the two SQL scripts provided in the github repository (https://github.com/mkollontai/DATA607/tree/master/HW2).

First run the schema script, then the data.

We must now upload this data to R using RMYSQL

library(RMySQL)
## Loading required package: DBI
usr <- 'root'
pw <- 'password'
db <- 'movieratings'

connection <- dbConnect(MySQL(), user = usr, password = pw, dbname = db, host = "localhost")

selection <- "SELECT people.name, people.gender, movies.title, movies.year_released, reviews.rating, movies.imdb_rating FROM people INNER JOIN reviews ON people.person_id=reviews.person_id INNER JOIN movies ON movies.movie_id=reviews.movie_id ORDER BY title;"

ratings <- dbGetQuery(connection, selection)
ratings
##        name gender        title year_released rating imdb_rating
## 1     Temur      m         Clue          1985      8         7.3
## 2     Vadim      m         Clue          1985      5         7.3
## 3      Sara      f         Clue          1985     10         7.3
## 4  Fernando      m         Clue          1985      5         7.3
## 5       Kim      f         Clue          1985      6         7.3
## 6     Temur      m Interstellar          2014      7         8.6
## 7     Vadim      m Interstellar          2014      7         8.6
## 8      Sara      f Interstellar          2014      8         8.6
## 9  Fernando      m Interstellar          2014      9         8.6
## 10    Temur      m    Lion King          1994      9         8.5
## 11    Vadim      m    Lion King          1994      7         8.5
## 12     Sara      f    Lion King          1994      9         8.5
## 13 Fernando      m    Lion King          1994      8         8.5
## 14      Kim      f    Lion King          1994      9         8.5
## 15    Temur      m Pulp Fiction          1994      9         8.9
## 16    Vadim      m Pulp Fiction          1994      7         8.9
## 17     Sara      f Pulp Fiction          1994      9         8.9
## 18 Fernando      m Pulp Fiction          1994      9         8.9
## 19      Kim      f Pulp Fiction          1994      9         8.9
## 20    Temur      m  Rear Window          1954      9         8.5
## 21     Sara      f  Rear Window          1954      8         8.5
## 22     Sara      f          REC          2007      9         7.4
## 23 Fernando      m          REC          2007      6         7.4

Now that we have the data in R, let’s take a look at how different my friends’ ratings are from the IMDB ratings available online.

Let’s average the ratings from the friends who had seen each movie and compare that to IMDB ratings.

reviews_only <- "SELECT movies.title, avg(reviews.rating), movies.imdb_rating FROM movies INNER JOIN reviews ON movies.movie_id=reviews.movie_id GROUP BY title;"

review.compare <- dbGetQuery(connection, reviews_only)
## Warning in .local(conn, statement, ...): Decimal MySQL column 1 imported as
## numeric
names(review.compare)[2] <- "friend_rating"
review.compare
##          title friend_rating imdb_rating
## 1 Pulp Fiction          8.60         8.9
## 2 Interstellar          7.75         8.6
## 3         Clue          6.80         7.3
## 4    Lion King          8.40         8.5
## 5  Rear Window          8.50         8.5
## 6          REC          7.50         7.4
compare_matrix = matrix(,nrow = 2, ncol = 6)
compare_matrix[1,] = review.compare$friend_rating
compare_matrix[2,] = review.compare$imdb_rating

#plot the ratings side-by-side to determine how in-line with IMDB my friends' views are
barplot(compare_matrix, names.arg = review.compare$title, beside = TRUE, col = c("blue", "red"), legend.text = c("Friends","IMDB"), cex.names = 0.9,args.legend = list(x="bottomright"), ylim = c(0,10))

We must always take care to disconnect from the db

dbDisconnect(connection)
## [1] TRUE

As we can see, the ratings collected from my friends were fairly similar to the average available on IMDB (generally from hundreds of thousands or millions of averaged reviews), though the IMDB ratings did seem to be slightly higher on average.