Install required packages
install.packages("RMySQL", repos = "http://cran.us.r-project.org")
##
## The downloaded binary packages are in
## /var/folders/qp/xyrhr9kx6md2dyjyq9w6grrh0000gn/T//Rtmpv90qur/downloaded_packages
library(RMySQL)
## Loading required package: DBI
We create a connection to the MySQL server and load the movies data in the dataframe ‘movies’
mydb = dbConnect(MySQL(), user = user, password = password, dbname = 'movies', host = 'localhost')
movies <- dbGetQuery(mydb, "select * from movies")
There are 36 observations in this data set representing 6 movies titles and 6 reviewers. A decent portion of the reviews are NULL since not every movie had been seen by all participants. The data set is in tidy data format with one observation per line.
summary(movies)
## review_id title reviewer rating
## Min. : 1.00 Length:36 Length:36 Min. :2
## 1st Qu.: 9.75 Class :character Class :character 1st Qu.:3
## Median :18.50 Mode :character Mode :character Median :4
## Mean :18.50 Mean :4
## 3rd Qu.:27.25 3rd Qu.:5
## Max. :36.00 Max. :5
## NA's :11
movies
## review_id title reviewer rating
## 1 1 Roma Jason 5
## 2 2 Roma Bruce 3
## 3 3 Roma Elie 3
## 4 4 Roma George 4
## 5 5 Roma Brian NA
## 6 6 Roma Frank NA
## 7 7 Venom Jason 3
## 8 8 Venom Bruce NA
## 9 9 Venom Elie 4
## 10 10 Venom George 3
## 11 11 Venom Brian 2
## 12 12 Venom Frank 3
## 13 13 John Wick Jason 4
## 14 14 John Wick Bruce NA
## 15 15 John Wick Elie 5
## 16 16 John Wick George 4
## 17 17 John Wick Brian 3
## 18 18 John Wick Frank NA
## 19 19 The Lion King Jason 5
## 20 20 The Lion King Bruce 5
## 21 21 The Lion King Elie 5
## 22 22 The Lion King Brian NA
## 23 23 The Lion King George NA
## 24 24 The Lion King Frank 5
## 25 25 Avengers Jason 5
## 26 26 Avengers Bruce 4
## 27 27 Avengers Elie 5
## 28 28 Avengers George 5
## 29 29 Avengers Brian 5
## 30 30 Avengers Frank 4
## 31 31 Hobbes & Shaw Jason 3
## 32 32 Hobbes & Shaw Bruce NA
## 33 33 Hobbes & Shaw Elie 3
## 34 34 Hobbes & Shaw George NA
## 35 35 Hobbes & Shaw Brian NA
## 36 36 Hobbes & Shaw Frank NA
Here we omit the NA values from the data.
movies <- na.omit(movies)
movies
## review_id title reviewer rating
## 1 1 Roma Jason 5
## 2 2 Roma Bruce 3
## 3 3 Roma Elie 3
## 4 4 Roma George 4
## 7 7 Venom Jason 3
## 9 9 Venom Elie 4
## 10 10 Venom George 3
## 11 11 Venom Brian 2
## 12 12 Venom Frank 3
## 13 13 John Wick Jason 4
## 15 15 John Wick Elie 5
## 16 16 John Wick George 4
## 17 17 John Wick Brian 3
## 19 19 The Lion King Jason 5
## 20 20 The Lion King Bruce 5
## 21 21 The Lion King Elie 5
## 24 24 The Lion King Frank 5
## 25 25 Avengers Jason 5
## 26 26 Avengers Bruce 4
## 27 27 Avengers Elie 5
## 28 28 Avengers George 5
## 29 29 Avengers Brian 5
## 30 30 Avengers Frank 4
## 31 31 Hobbes & Shaw Jason 3
## 33 33 Hobbes & Shaw Elie 3
Let’s take a look at the average reviews for each movies title by running another MySQL query.
movies_avg_sql <- dbGetQuery(mydb, "select title, avg(rating) from movies group by title")
movies_avg_sql
## title avg(rating)
## 1 Roma 3.7500
## 2 Venom 3.0000
## 3 John Wick 4.0000
## 4 The Lion King 5.0000
## 5 Avengers 4.6667
## 6 Hobbes & Shaw 3.0000
Let’s get the averages in R directly, but sorted by title this time.
aggregate(movies[, 4], list(movies$title), mean)
## Group.1 x
## 1 Avengers 4.666667
## 2 Hobbes & Shaw 3.000000
## 3 John Wick 4.000000
## 4 Roma 3.750000
## 5 The Lion King 5.000000
## 6 Venom 3.000000