Instructions for Prof to run this code:
Create a MYSQL Database called “projet-final”
Run Hw2_v1 SQL script
Run R Script
Creating database in SQL and importing into R and displaying the results in R
library("DT")
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(plotrix)
library(data.table)
##
## Attaching package: 'data.table'
## The following objects are masked from 'package:dplyr':
##
## between, first, last
library(RMySQL)
## Loading required package: DBI
library(DBI)
library(rstudioapi)
require (ggplot2)
## Loading required package: ggplot2
# connecting to projet-final database in MYSQL
# This line do not ask for user and password
con <- dbConnect(MySQL(), user ="root", password = "Chicke0n", port = 3306, dbname = "projet-final",host = "localhost")
# This line ask for user name and password
#con <- dbConnect(MySQL(), user = showPrompt('Username',"Username",default="root"), password = #askForPassword(), port = 3306, dbname = "projet-final",host = "localhost")
#query my reviews & videos tables from projet-final database
ret1 <- dbGetQuery(con,"select * FROM Videos")
ret2 <- dbGetQuery(con,"select * FROM Reviewer")
# show the tables
#movies table
datatable ( ret1 )
#reviewer table
datatable ( ret2 )
# check data structure of ret2
str(ret2)
## 'data.frame': 36 obs. of 6 variables:
## $ id : int 1 2 3 4 5 6 7 8 9 10 ...
## $ Movie_Name : chr "superman" "wonderwoman" "The International" "Something about Mary" ...
## $ Name_of_Reviewer: chr "ssufian" "ssufian" "ssufian" "ssufian" ...
## $ Rating : int 1 5 5 5 4 NA 5 5 3 2 ...
## $ review : chr "nah" "great!!!" "Incredible Film!" "super funny !!" ...
## $ review_date : chr "2019-09-06" "2019-09-06" "2019-09-06" "2019-09-06" ...
# Converting NA to integer zero, to account for NULL Values in SQL Database
ret2[is.na(ret2)] <- 0
ret2
## id Movie_Name Name_of_Reviewer Rating
## 1 1 superman ssufian 1
## 2 2 wonderwoman ssufian 5
## 3 3 The International ssufian 5
## 4 4 Something about Mary ssufian 5
## 5 5 The Souvenir ssufian 4
## 6 6 Lion King ssufian 0
## 7 7 superman Rosemary 5
## 8 8 wonderwoman Rosemary 5
## 9 9 The International Rosemary 3
## 10 10 Something about Mary Rosemary 2
## 11 11 The Souvenir Rosemary 0
## 12 12 Lion King Rosemary 0
## 13 13 superman Jeff 1
## 14 14 wonderwoman Jeff 0
## 15 15 The International Jeff 4
## 16 16 Something about Mary Jeff 5
## 17 17 The Souvenir Jeff 0
## 18 18 Lion King Jeff 1
## 19 19 superman Steffi 1
## 20 20 wonderwoman Steffi 3
## 21 21 The International Steffi 5
## 22 22 Something about Mary Steffi 4
## 23 23 The Souvenir Steffi 5
## 24 24 Lion King Steffi 5
## 25 25 superman Tom 5
## 26 26 wonderwoman Tom 4
## 27 27 The International Tom 2
## 28 28 Something about Mary Tom 5
## 29 29 The Souvenir Tom 4
## 30 30 Lion King Tom 3
## 31 31 superman Apu 3
## 32 32 wonderwoman Apu 4
## 33 33 The International Apu 2
## 34 34 Something about Mary Apu 0
## 35 35 The Souvenir Apu 1
## 36 36 Lion King Apu 0
## review review_date
## 1 nah 2019-09-06
## 2 great!!! 2019-09-06
## 3 Incredible Film! 2019-09-06
## 4 super funny !! 2019-09-06
## 5 quite intense 2019-09-06
## 6 no coments 2019-09-06
## 7 Love it! 2019-04-06
## 8 more woman power! 2018-09-06
## 9 too convoluted, did not understand 2018-04-06
## 10 Not Funny 2018-04-06
## 11 no comments 2018-04-06
## 12 no comments 2018-04-06
## 13 Hate it! 2019-02-06
## 14 no comments 2019-02-06
## 15 great thriller! 2019-02-06
## 16 Had a great time 2019-02-06
## 17 no comments 2019-02-06
## 18 Fell Asleep 2019-02-06
## 19 boring 2019-07-06
## 20 Just Okay 2019-07-06
## 21 great Movie! 2019-07-06
## 22 very funny 2019-07-06
## 23 Love it! 2019-07-06
## 24 Love the animation 2019-07-06
## 25 great movie 2018-02-06
## 26 Fabolous 2018-02-06
## 27 Not my type of movie 2018-02-06
## 28 wacky! 2018-02-06
## 29 love the acting 2018-02-06
## 30 so so 2018-02-06
## 31 good 2019-04-06
## 32 Above average 2019-04-06
## 33 Do not like it 2019-04-06
## 34 no comments 2019-04-06
## 35 very average 2019-04-06
## 36 no comments 2019-04-06
#bar plot of Total raw ratings
ggplot(data=ret2, aes(x=Name_of_Reviewer, y=Rating)) + geom_bar(stat="identity" )+
geom_bar(colour="black", fill="#DD8888", width=.8, stat="identity" ) +
guides(fill=FALSE) +
xlab("Participants") + ylab("Raw Ratings") +
ggtitle("Sum of All Raw Ratings per 6 participants")
# Adding mean rating for each movie in the datatable
setDT(ret2)[, Mean_Rating := round(mean(Rating),digits = 1), by = Movie_Name]
datatable ( ret2)
# Get Unique Rows for Each movie with Rating and selecting only Movie_Name and Mean rating
dt_uniqe <- ret2[!duplicated(ret2[,'Movie_Name']),c(2,7)]
datatable ( dt_uniqe )
#bar plot of Mean ratings of all Movies
ggplot(data=dt_uniqe, aes(x=Movie_Name, y=Mean_Rating)) + geom_bar(stat="identity")+
geom_bar(colour="black", fill="#DD8888", width=.8, stat="identity") +
guides(fill=FALSE) +
xlab("Movies") + ylab("Mean Ratings") +
ggtitle("Mean Ratings by Movies")
#Creating Dataframe with all the movies and their mean rating and movie types
df <- merge(dt_uniqe, ret1, by="Movie_Name")
df
## Movie_Name Mean_Rating id movie_type
## 1: Lion King 1.5 6 Animated
## 2: Something about Mary 3.5 4 Comedy
## 3: The International 3.5 3 Suspense
## 4: The Souvenir 2.3 5 Mystery Romance
## 5: superman 2.7 1 Fantasy
## 6: wonderwoman 3.5 2 Fantasy
## web_link released_date
## 1: https://www.imdb.com/title/tt6105098/ 2016-05-01
## 2: https://www.imdb.com/title/tt0129387/ 2018-07-05
## 3: https://www.imdb.com/title/tt0963178/ 2017-07-01
## 4: https://www.imdb.com/title/tt6920356/ 2017-01-12
## 5: https://en.wikipedia.org/wiki/Supermanw 2018-01-01
## 6: https://wonderwomanfilm.com/#_=_ 2019-01-12
Summary Results:
First table - The movie table
2nd table - Reviewer table with original NA
3rd table - Reviewer table with NA replaced by zero
Observation #1
Estimating the participation rate (how actively they were involved) by sum of raw ratings:
Participants Steffi & Tom gave the most ratings; Adding up all their ratings per movie. This means these 2 participants was actively involved in the survey by providing scores
Participant Apu was on the other extreme end followed closely by Jeff; these 2 participants either did not watch the movies or did was not actively participating in the survey at all
Observation #2
Estimating which movies scored best or worst via their average rating scores:
The movies “something about Mary”, “The International” and “Wonder woman” scored the highest respectively
The movies “Lion King” was the worst rated followed by distance second “The souvenir”
links to RPubs: http://rpubs.com/ssufian/525717
link to Github: