Author: Romerl Elizes

Part I. Load Data to Database

I will use a straightforward MySQL Connection to create a database called MoviesDatabase. Prior to that, I will remove the database MoviesDatabase if it already exists so that I have a clean database. Upon successful creation of MoviesDatabase, I will create three tables: Viewers, Movie, and Rating. I will use insert commands to load the necessary data onto the tables. Assuming that I have the file, RomerlElizes_Assignment2.sql at the same directory, the file contents are:

# display file contents to screen.
# ref: R Markdown: How do I show file contents. Website: https://stackoverflow.com/questions/29257985/r-markdown-how-do-i-show-file-contents
cat (readLines('RomerlElizes_Assignment2.sql'), sep = '\n')
## -- Romerl Elizes
## -- DATA 607
## -- Week 2 - Part 1 SQL Programming Assignment 
## 
## -- Part 1 - create and populate database based on given information.
## -- a. - drop and create database MoviesDatabase
## drop database if exists MoviesDatabase;
## create database MoviesDatabase;
## use MoviesDatabase;
## 
## -- b. create Viewers table and insert values into table.
## create table Viewers (
##    viewerid mediumint,
##    name varchar(80),
##    primary key (viewerid)
## );
## insert into Viewers (viewerid, name) VALUES (1,'Charles');
## insert into Viewers (viewerid, name) VALUES (2,'Liam');
## insert into Viewers (viewerid, name) VALUES (3,'Emily');
## insert into Viewers (viewerid, name) VALUES (4,'Eric');
## insert into Viewers (viewerid, name) VALUES (5,'Evelyn');
## insert into Viewers (viewerid, name) VALUES (6,'Steven');
## 
## -- c. create Movie table and insert values into table.
## create table Movie (
##    movieid varchar(2),
##    name varchar(40),
##    primary key (movieid)
## );
## insert into Movie (movieid, name) VALUES ('SP','Spaceballs');
## insert into Movie (movieid, name) VALUES ('BE','Better Off Dead');
## insert into Movie (movieid, name) VALUES ('LB','Legally Blonde');
## insert into Movie (movieid, name) VALUES ('PA','Police Academy');
## insert into Movie (movieid, name) VALUES ('GB','Ghostbusters');
## insert into Movie (movieid, name) VALUES ('GF','Goodfellas');
## 
## 
## -- d. create Rating association table and insert values into table.
## create table Rating (
##  viewerid mediumint,
##  movieid varchar(2),
##  rating mediumint
## );
## 
## -- d1. - ratings by Chalres
## insert into Rating (viewerid, movieid, rating) VALUES (1,'SP',4);
## insert into Rating (viewerid, movieid, rating) VALUES (1,'BE',3);
## insert into Rating (viewerid, movieid, rating) VALUES (1,'LB',2);
## insert into Rating (viewerid, movieid, rating) VALUES (1,'PA',2);
## insert into Rating (viewerid, movieid, rating) VALUES (1,'GB',1);
## insert into Rating (viewerid, movieid, rating) VALUES (1,'GF',1);
## 
## -- d2. - ratings by Liam
## insert into Rating (viewerid, movieid, rating) VALUES (2,'SP',5);
## insert into Rating (viewerid, movieid, rating) VALUES (2,'BE',4);
## insert into Rating (viewerid, movieid, rating) VALUES (2,'LB',3);
## insert into Rating (viewerid, movieid, rating) VALUES (2,'PA',4);
## insert into Rating (viewerid, movieid, rating) VALUES (2,'GB',1);
## insert into Rating (viewerid, movieid, rating) VALUES (2,'GF',2);
## 
## -- d3. - ratings by Emily
## insert into Rating (viewerid, movieid, rating) VALUES (3,'SP',2);
## insert into Rating (viewerid, movieid, rating) VALUES (3,'BE',4);
## insert into Rating (viewerid, movieid, rating) VALUES (3,'LB',5);
## insert into Rating (viewerid, movieid, rating) VALUES (3,'PA',3);
## insert into Rating (viewerid, movieid, rating) VALUES (3,'GB',4);
## insert into Rating (viewerid, movieid, rating) VALUES (3,'GF',1);
## 
## -- d4. - ratings by Eric
## insert into Rating (viewerid, movieid, rating) VALUES (4,'SP',1);
## insert into Rating (viewerid, movieid, rating) VALUES (4,'BE',4);
## insert into Rating (viewerid, movieid, rating) VALUES (4,'LB',2);
## insert into Rating (viewerid, movieid, rating) VALUES (4,'PA',1);
## insert into Rating (viewerid, movieid, rating) VALUES (4,'GB',5);
## insert into Rating (viewerid, movieid, rating) VALUES (4,'GF',3);
## 
## -- d5. - ratings by Evelyn
## insert into Rating (viewerid, movieid, rating) VALUES (5,'SP',3);
## insert into Rating (viewerid, movieid, rating) VALUES (5,'BE',2);
## insert into Rating (viewerid, movieid, rating) VALUES (5,'LB',4);
## insert into Rating (viewerid, movieid, rating) VALUES (5,'PA',1);
## insert into Rating (viewerid, movieid, rating) VALUES (5,'GB',5);
## insert into Rating (viewerid, movieid, rating) VALUES (5,'GF',1);
## 
## -- d6. - ratings by Steven
## insert into Rating (viewerid, movieid, rating) VALUES (6,'SP',2);
## insert into Rating (viewerid, movieid, rating) VALUES (6,'BE',5);
## insert into Rating (viewerid, movieid, rating) VALUES (6,'LB',1);
## insert into Rating (viewerid, movieid, rating) VALUES (6,'PA',3);
## insert into Rating (viewerid, movieid, rating) VALUES (6,'GB',4);
## insert into Rating (viewerid, movieid, rating) VALUES (6,'GF',5);

Part II. Connect to Database

When connecting to the database, I created a new database called MoviesDatabase with a user called school with no password. If you wish to optionally add a username or password, modify the dbConnect method to read: con <- dbConnect(RMySQL::MySQL(), dbname = “MoviesDatabase”, username = “”, password = “”) or if you wish to open up a database connection with no username and password, type: con <- dbConnect(RMySQL::MySQL(), dbname = “MoviesDatabase”).

I used the library RMySQL which was stated in the textbook and to verify that the database connection was working, I used the summary function for the connection which gives me useful information about the table.

# loadgin package
library(RMySQL)
## Warning: package 'RMySQL' was built under R version 3.4.4
## Loading required package: DBI
con <- dbConnect(RMySQL::MySQL(), dbname = "MoviesDatabase", username = "school")
summary(con)
## <MySQLConnection:0,0>
##   User:   school 
##   Host:   localhost 
##   Dbname: MoviesDatabase 
##   Connection type: localhost via TCP/IP 
## 
## Results:

Part III. Test Basic Queries

This section will focus on doing basic queries on each of the tables: Viewers, Movie, and Rating. Each output will be inputted into a data frame and displayed onto the screen.

A. View all rows of Viewers

sql <- "SELECT * FROM Viewers"
res <- dbGetQuery(con, sql)
res
##   viewerid    name
## 1        1 Charles
## 2        2    Liam
## 3        3   Emily
## 4        4    Eric
## 5        5  Evelyn
## 6        6  Steven

B. View all rows of Movie

sql <- "SELECT * FROM Movie"
res <- dbGetQuery(con, sql)
res
##   movieid            name
## 1      BE Better Off Dead
## 2      GB    Ghostbusters
## 3      GF      Goodfellas
## 4      LB  Legally Blonde
## 5      PA  Police Academy
## 6      SP      Spaceballs

C. View all rows of Rating

sql <- "SELECT * FROM Rating"
res <- dbGetQuery(con, sql)
res
##    viewerid movieid rating
## 1         1      SP      4
## 2         1      BE      3
## 3         1      LB      2
## 4         1      PA      2
## 5         1      GB      1
## 6         1      GF      1
## 7         2      SP      5
## 8         2      BE      4
## 9         2      LB      3
## 10        2      PA      4
## 11        2      GB      1
## 12        2      GF      2
## 13        3      SP      2
## 14        3      BE      4
## 15        3      LB      5
## 16        3      PA      3
## 17        3      GB      4
## 18        3      GF      1
## 19        4      SP      1
## 20        4      BE      4
## 21        4      LB      2
## 22        4      PA      1
## 23        4      GB      5
## 24        4      GF      3
## 25        5      SP      3
## 26        5      BE      2
## 27        5      LB      4
## 28        5      PA      1
## 29        5      GB      5
## 30        5      GF      1
## 31        6      SP      2
## 32        6      BE      5
## 33        6      LB      1
## 34        6      PA      3
## 35        6      GB      4
## 36        6      GF      5

Part IV. Testing More Complex Queries

This section will focus on executing more complex queries on the tables: Viewers, Movie, and Rating. Each output will be inputted into a data frame and displayed onto the screen.

I will be using the paste method and vector storage to build the queries. The vector will be used to see the complex queries more clearly.

ref: How can two strings be concatenated? Website: https://stackoverflow.com/questions/7201341/how-can-two-strings-be-concatenated

A. List of Viewers Grouped by Highest Average Ratings

The query will join two tables, viewer and rating and display the highest average rating for each viewer. I had to experiment with group by and order by in order to get the proper results displayed. Based on the results, it shows that Steven has the highest average movie rating of 3.33 and Charles has the lowest average movie rating of 2.17. It appears Steven generally likes the listed movies while Charles is more critical of the listed movies.

sqlvector <- c ("select v.name as viewer, avg(r.rating) as averagerating", 
                "from viewers v", 
                "join rating r on v.viewerid = r.viewerid", 
                "group by viewer", 
                "order by averagerating desc" )
sql <- paste(sqlvector, collapse=" ")
res <- dbGetQuery(con, sql)
## Warning in .local(conn, statement, ...): Decimal MySQL column 1 imported as
## numeric
res
##    viewer averagerating
## 1  Steven        3.3333
## 2    Liam        3.1667
## 3   Emily        3.1667
## 4    Eric        2.6667
## 5  Evelyn        2.6667
## 6 Charles        2.1667

B. List of Movies Grouped by Highest Average Ratings

The query will join two tables, movie and rating and display the highest average rating for each movie. Based on the results, it shows that the movie, Better Off Dead has the highest average movie rating of 3.67 and Goodfellas has the lowest average movie rating of 2.17. It appears that our listed viewers prefer a comedy like Better Off Dead than a Gangster movie like Goodfellas.

sqlvector <- c ("select m.name as movie, avg(r.rating) as averagerating", 
                "from movie m", 
                "join rating r on m.movieid = r.movieid", 
                "group by movie", 
                "order by averagerating desc" )
sql <- paste(sqlvector, collapse=" ")
res <- dbGetQuery(con, sql)
## Warning in .local(conn, statement, ...): Decimal MySQL column 1 imported as
## numeric
res
##             movie averagerating
## 1 Better Off Dead        3.6667
## 2    Ghostbusters        3.3333
## 3  Legally Blonde        2.8333
## 4      Spaceballs        2.8333
## 5  Police Academy        2.3333
## 6      Goodfellas        2.1667

C. List of Viewers and Their Favorite Movies By Name and Highest Rating

The query will join three tables, viewers, movie, and rating and display each viewer in alphabetical order and the list of movies the watched and their assigned ratings sorted by highest rating. Based on the results, it shows that Charles, Emily, Eric, and Evelyn liked the movies Spaceballs, Legally Blonde, Ghostbusters, and Ghostbusters respectively. Moreover, Liam and Steven did not like the movies Ghostbusters and Legally Blonde respectively.

sqlvector <- c ("select v.name as viewer, m.name as movie, r.rating as rating", 
                "from viewers v", 
                "join rating r on v.viewerid = r.viewerid", 
                "join movie m on m.movieid = r.movieid", 
                "order by viewer, rating desc" )
sql <- paste(sqlvector, collapse=" ")
res <- dbGetQuery(con, sql)
res
##     viewer           movie rating
## 1  Charles      Spaceballs      4
## 2  Charles Better Off Dead      3
## 3  Charles  Legally Blonde      2
## 4  Charles  Police Academy      2
## 5  Charles    Ghostbusters      1
## 6  Charles      Goodfellas      1
## 7    Emily  Legally Blonde      5
## 8    Emily Better Off Dead      4
## 9    Emily    Ghostbusters      4
## 10   Emily  Police Academy      3
## 11   Emily      Spaceballs      2
## 12   Emily      Goodfellas      1
## 13    Eric    Ghostbusters      5
## 14    Eric Better Off Dead      4
## 15    Eric      Goodfellas      3
## 16    Eric  Legally Blonde      2
## 17    Eric      Spaceballs      1
## 18    Eric  Police Academy      1
## 19  Evelyn    Ghostbusters      5
## 20  Evelyn  Legally Blonde      4
## 21  Evelyn      Spaceballs      3
## 22  Evelyn Better Off Dead      2
## 23  Evelyn      Goodfellas      1
## 24  Evelyn  Police Academy      1
## 25    Liam      Spaceballs      5
## 26    Liam  Police Academy      4
## 27    Liam Better Off Dead      4
## 28    Liam  Legally Blonde      3
## 29    Liam      Goodfellas      2
## 30    Liam    Ghostbusters      1
## 31  Steven Better Off Dead      5
## 32  Steven      Goodfellas      5
## 33  Steven    Ghostbusters      4
## 34  Steven  Police Academy      3
## 35  Steven      Spaceballs      2
## 36  Steven  Legally Blonde      1

Part V. Disconnecting from the Database.

To disconnect from the database connection, the call to dbDisconnect function will be executed.

dbDisconnect(con)
## [1] TRUE