Instructions for Prof to run this code:

Assignment 2 - SQL & R

Creating database in SQL and importing into R and displaying the results in R

library("DT")
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(plotrix)
library(data.table)
## 
## Attaching package: 'data.table'
## The following objects are masked from 'package:dplyr':
## 
##     between, first, last
library(RMySQL)
## Loading required package: DBI
library(DBI)
library(rstudioapi)
require (ggplot2)
## Loading required package: ggplot2
# connecting to projet-final database in MYSQL

# This line do not ask for user and password
con <- dbConnect(MySQL(), user ="root", password = "Chicke0n", port = 3306, dbname = "projet-final",host = "localhost")

# This line ask for user name and password
#con <- dbConnect(MySQL(), user = showPrompt('Username',"Username",default="root"), password = #askForPassword(), port = 3306, dbname = "projet-final",host = "localhost")

#query my reviews & videos tables from projet-final database

ret1 <- dbGetQuery(con,"select * FROM Videos")

ret2 <- dbGetQuery(con,"select * FROM Reviewer")


# show the tables
#movies table
datatable ( ret1 )
#reviewer table
datatable ( ret2 )
# check data structure of ret2
str(ret2)
## 'data.frame':    36 obs. of  6 variables:
##  $ id              : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ Movie_Name      : chr  "superman" "wonderwoman" "The International" "Something about Mary" ...
##  $ Name_of_Reviewer: chr  "ssufian" "ssufian" "ssufian" "ssufian" ...
##  $ Rating          : int  1 5 5 5 4 NA 5 5 3 2 ...
##  $ review          : chr  "nah" "great!!!" "Incredible Film!" "super funny !!" ...
##  $ review_date     : chr  "2019-09-06" "2019-09-06" "2019-09-06" "2019-09-06" ...
# Converting NA to integer zero, to account for NULL Values in SQL Database
ret2[is.na(ret2)] <- 0

ret2
##    id           Movie_Name Name_of_Reviewer Rating
## 1   1             superman          ssufian      1
## 2   2          wonderwoman          ssufian      5
## 3   3    The International          ssufian      5
## 4   4 Something about Mary          ssufian      5
## 5   5         The Souvenir          ssufian      4
## 6   6            Lion King          ssufian      0
## 7   7             superman         Rosemary      5
## 8   8          wonderwoman         Rosemary      5
## 9   9    The International         Rosemary      3
## 10 10 Something about Mary         Rosemary      2
## 11 11         The Souvenir         Rosemary      0
## 12 12            Lion King         Rosemary      0
## 13 13             superman             Jeff      1
## 14 14          wonderwoman             Jeff      0
## 15 15    The International             Jeff      4
## 16 16 Something about Mary             Jeff      5
## 17 17         The Souvenir             Jeff      0
## 18 18            Lion King             Jeff      1
## 19 19             superman           Steffi      1
## 20 20          wonderwoman           Steffi      3
## 21 21    The International           Steffi      5
## 22 22 Something about Mary           Steffi      4
## 23 23         The Souvenir           Steffi      5
## 24 24            Lion King           Steffi      5
## 25 25             superman              Tom      5
## 26 26          wonderwoman              Tom      4
## 27 27    The International              Tom      2
## 28 28 Something about Mary              Tom      5
## 29 29         The Souvenir              Tom      4
## 30 30            Lion King              Tom      3
## 31 31             superman              Apu      3
## 32 32          wonderwoman              Apu      4
## 33 33    The International              Apu      2
## 34 34 Something about Mary              Apu      0
## 35 35         The Souvenir              Apu      1
## 36 36            Lion King              Apu      0
##                                review review_date
## 1                                 nah  2019-09-06
## 2                            great!!!  2019-09-06
## 3                    Incredible Film!  2019-09-06
## 4                      super funny !!  2019-09-06
## 5                       quite intense  2019-09-06
## 6                          no coments  2019-09-06
## 7                            Love it!  2019-04-06
## 8                   more woman power!  2018-09-06
## 9  too convoluted, did not understand  2018-04-06
## 10                          Not Funny  2018-04-06
## 11                        no comments  2018-04-06
## 12                        no comments  2018-04-06
## 13                           Hate it!  2019-02-06
## 14                        no comments  2019-02-06
## 15                    great thriller!  2019-02-06
## 16                   Had a great time  2019-02-06
## 17                        no comments  2019-02-06
## 18                        Fell Asleep  2019-02-06
## 19                             boring  2019-07-06
## 20                          Just Okay  2019-07-06
## 21                       great Movie!  2019-07-06
## 22                         very funny  2019-07-06
## 23                           Love it!  2019-07-06
## 24                 Love the animation  2019-07-06
## 25                        great movie  2018-02-06
## 26                           Fabolous  2018-02-06
## 27               Not my type of movie  2018-02-06
## 28                             wacky!  2018-02-06
## 29                    love the acting  2018-02-06
## 30                              so so  2018-02-06
## 31                               good  2019-04-06
## 32                      Above average  2019-04-06
## 33                     Do not like it  2019-04-06
## 34                        no comments  2019-04-06
## 35                       very average  2019-04-06
## 36                        no comments  2019-04-06
#bar plot of Total raw ratings
ggplot(data=ret2, aes(x=Name_of_Reviewer, y=Rating)) + geom_bar(stat="identity" )+
    geom_bar(colour="black", fill="#DD8888", width=.8, stat="identity" ) + 
    guides(fill=FALSE) +
    xlab("Participants") + ylab("Raw Ratings") +
    ggtitle("Sum of All Raw Ratings per 6 participants")

# Adding mean rating for each movie in the datatable

setDT(ret2)[, Mean_Rating := round(mean(Rating),digits = 1), by = Movie_Name]
datatable ( ret2)
# Get Unique Rows for Each movie with Rating and selecting only Movie_Name and Mean rating
dt_uniqe <- ret2[!duplicated(ret2[,'Movie_Name']),c(2,7)]
datatable ( dt_uniqe  )
#bar plot of Mean ratings of all Movies
ggplot(data=dt_uniqe, aes(x=Movie_Name, y=Mean_Rating)) + geom_bar(stat="identity")+
    geom_bar(colour="black", fill="#DD8888", width=.8, stat="identity") + 
    guides(fill=FALSE) +
    xlab("Movies") + ylab("Mean Ratings") +
    ggtitle("Mean Ratings by Movies")

#Creating Dataframe with all the movies and their mean rating and movie types
df <- merge(dt_uniqe, ret1, by="Movie_Name")
df
##              Movie_Name Mean_Rating id      movie_type
## 1:            Lion King         1.5  6        Animated
## 2: Something about Mary         3.5  4          Comedy
## 3:    The International         3.5  3        Suspense
## 4:         The Souvenir         2.3  5 Mystery Romance
## 5:             superman         2.7  1         Fantasy
## 6:          wonderwoman         3.5  2         Fantasy
##                                   web_link released_date
## 1:   https://www.imdb.com/title/tt6105098/    2016-05-01
## 2:   https://www.imdb.com/title/tt0129387/    2018-07-05
## 3:   https://www.imdb.com/title/tt0963178/    2017-07-01
## 4:   https://www.imdb.com/title/tt6920356/    2017-01-12
## 5: https://en.wikipedia.org/wiki/Supermanw    2018-01-01
## 6:        https://wonderwomanfilm.com/#_=_    2019-01-12

Summary Results:

First table - The movie table

2nd table - Reviewer table with original NA

3rd table - Reviewer table with NA replaced by zero

Observation #1

Estimating the participation rate (how actively they were involved) by sum of raw ratings:

  1. Participants Steffi & Tom gave the most ratings; Adding up all their ratings per movie. This means these 2 participants was actively involved in the survey by providing scores

  2. Participant Apu was on the other extreme end followed closely by Jeff; these 2 participants either did not watch the movies or did was not actively participating in the survey at all

Observation #2

Estimating which movies scored best or worst via their average rating scores:

  1. The movies “something about Mary”, “The International” and “Wonder woman” scored the highest respectively

  2. The movies “Lion King” was the worst rated followed by distance second “The souvenir”

links to RPubs: http://rpubs.com/ssufian/525717

link to Github:

https://github.com/ssufian/Data_607_HW2_SQL_R