Assignment 2: Database Connection With R

Mario Pena

9/8/2019

In this assignment I have chosen six of the most popular Hollywood movies, and I have asked 6 people to rate each of the movies they’ve seen on a scale from 1 (being terrible) to 5 (being great).

I have stored the observations in a SQL database, so I will first need to load this data into an R dataframe.

We’ll need to load the following libraries:

library(RMySQL)
## Loading required package: DBI
library(dbConnect)
## Loading required package: gWidgets

And connect to MySQL server:

con = dbConnect (MySQL(), user = 'root', password = getOption('database_password'), dbname = 'movie_ratings', Host = 'localhost')

Now I can see a list of the tables in my “movie_ratings” database:

dbListTables(con)
## [1] "audience" "movies"   "ratings"

As you can see, I have created 3 different tables, each stores information regarding audience, movies and ratings.

I will create 3 queries within R using these tables:

myquery1 <- "select * from audience;"
myquery2 <- "select * from movies;"
myquery3 <- "select * from ratings;"

I will save them as dataframes in this next step and display their content:

audience <- dbGetQuery(con, myquery1)
audience
##   nameid firstname lastname
## 1      1      Amel        H
## 2      2      Dina        H
## 3      3     Mario        P
## 4      4       Ana        J
## 5      5    Teresa        C
## 6      6      Alba        P
movies <- dbGetQuery(con, myquery2)
movies
##   movieid                    title           genre
## 1       1 The Lion King (Original)          Family
## 2       2           The Dark Night          Action
## 3       3                  Titanic           Drama
## 4       4             Forrest Gump           Drama
## 5       5                     Jaws          Horror
## 6       6                     E.T. Science Fiction
ratings <- dbGetQuery(con, myquery3)
ratings
##    ratingid nameid movieid rating
## 1         1      1       1      5
## 2         2      1       2      5
## 3         3      1       3      5
## 4         4      1       4     NA
## 5         5      1       5      3
## 6         6      1       6      4
## 7         7      2       1      5
## 8         8      2       2      5
## 9         9      2       3      5
## 10       10      2       4      5
## 11       11      2       5      2
## 12       12      2       6      4
## 13       13      3       1      4
## 14       14      3       2      5
## 15       15      3       3      4
## 16       16      3       4      5
## 17       17      3       5      1
## 18       18      3       6      3
## 19       19      4       1      5
## 20       20      4       2      4
## 21       21      4       3      5
## 22       22      4       4      5
## 23       23      4       5      4
## 24       24      4       6      5
## 25       25      5       1      5
## 26       26      5       2      3
## 27       27      5       3      4
## 28       28      5       4      3
## 29       29      5       5      2
## 30       30      5       6      4
## 31       31      6       1      5
## 32       32      6       2     NA
## 33       33      6       3      5
## 34       34      6       4     NA
## 35       35      6       5      5
## 36       36      6       6      5

I will merge ‘movies’ and ‘ratings’ so we can have a clearer picture of how each movie was rated:

movie_ratings <- merge(movies, ratings, by.x="movieid")
movie_ratings
##    movieid                    title           genre ratingid nameid rating
## 1        1 The Lion King (Original)          Family        1      1      5
## 2        1 The Lion King (Original)          Family       19      4      5
## 3        1 The Lion King (Original)          Family        7      2      5
## 4        1 The Lion King (Original)          Family       25      5      5
## 5        1 The Lion King (Original)          Family       31      6      5
## 6        1 The Lion King (Original)          Family       13      3      4
## 7        2           The Dark Night          Action        2      1      5
## 8        2           The Dark Night          Action       32      6     NA
## 9        2           The Dark Night          Action        8      2      5
## 10       2           The Dark Night          Action       14      3      5
## 11       2           The Dark Night          Action       20      4      4
## 12       2           The Dark Night          Action       26      5      3
## 13       3                  Titanic           Drama        3      1      5
## 14       3                  Titanic           Drama        9      2      5
## 15       3                  Titanic           Drama       15      3      4
## 16       3                  Titanic           Drama       21      4      5
## 17       3                  Titanic           Drama       27      5      4
## 18       3                  Titanic           Drama       33      6      5
## 19       4             Forrest Gump           Drama        4      1     NA
## 20       4             Forrest Gump           Drama       10      2      5
## 21       4             Forrest Gump           Drama       16      3      5
## 22       4             Forrest Gump           Drama       22      4      5
## 23       4             Forrest Gump           Drama       28      5      3
## 24       4             Forrest Gump           Drama       34      6     NA
## 25       5                     Jaws          Horror       11      2      2
## 26       5                     Jaws          Horror       17      3      1
## 27       5                     Jaws          Horror       23      4      4
## 28       5                     Jaws          Horror       29      5      2
## 29       5                     Jaws          Horror        5      1      3
## 30       5                     Jaws          Horror       35      6      5
## 31       6                     E.T. Science Fiction       24      4      5
## 32       6                     E.T. Science Fiction        6      1      4
## 33       6                     E.T. Science Fiction       12      2      4
## 34       6                     E.T. Science Fiction       30      5      4
## 35       6                     E.T. Science Fiction       18      3      3
## 36       6                     E.T. Science Fiction       36      6      5

Let’s take a look at what the summmary of this new table looks like:

summary(movie_ratings)
##     movieid       title              genre              ratingid    
##  Min.   :1.0   Length:36          Length:36          Min.   : 1.00  
##  1st Qu.:2.0   Class :character   Class :character   1st Qu.: 9.75  
##  Median :3.5   Mode  :character   Mode  :character   Median :18.50  
##  Mean   :3.5                                         Mean   :18.50  
##  3rd Qu.:5.0                                         3rd Qu.:27.25  
##  Max.   :6.0                                         Max.   :36.00  
##                                                                     
##      nameid        rating     
##  Min.   :1.0   Min.   :1.000  
##  1st Qu.:2.0   1st Qu.:4.000  
##  Median :3.5   Median :5.000  
##  Mean   :3.5   Mean   :4.212  
##  3rd Qu.:5.0   3rd Qu.:5.000  
##  Max.   :6.0   Max.   :5.000  
##                NA's   :3

We can see that the average rating for all the movies in this database is 4.212. This tells us these movies were very well received by the audience who watched them.

Now let’s see what is the average movie rating per movie:

avg_movie_rating <- aggregate(x=movie_ratings["rating"], by = list(movie=movie_ratings$title), FUN = mean, na.rm=TRUE)
avg_movie_rating
##                      movie   rating
## 1                     E.T. 4.166667
## 2             Forrest Gump 4.500000
## 3                     Jaws 2.833333
## 4           The Dark Night 4.400000
## 5 The Lion King (Original) 4.833333
## 6                  Titanic 4.666667

We can see that “The Lion King (Original)” has the highest average rating in this list, while “Jaws” has the lowest average rating as also shown in the barplot below:

library(ggplot2)
ggplot(aes(x = movie, y = rating), data = avg_movie_rating) + geom_bar(stat = "identity")