Choose six recent popular movies. Ask at least five people that you know (friends, family, classmates, imaginary friends) to rate each of these movie that they have seen on a scale of 1 to 5. Take the results (observations) and store them in a SQL database. Load the information into an R dataframe. Your deliverables should include your SQL scripts and your R Markdown code, posted to GitHub. This is by design a very open ended assignment. A variety of reasonable approaches are acceptable. You can (and should) blank out your SQL password if your solution requires it; otherwise, full credit requires that your code is “reproducible,” with the assumption that I have the same database server and R software.

Loading packages

The data, sql command and .rmd used in this exercise can be found at https://github.com/forhadakbar/data607fall2019/tree/master/Week%2002

Install the RMySQL and odbc packages by running install.packages(“RMySQL”) and install.packages(“odbc”) in RStudio if you don’t already have it installed and then load the RMySQL package. In addition, i am loading tidyverse and dplyr package.

Establish Connection

To hide the password i store the password in a variable called “localpassword” and run it in the console. Each time we start R we will need to reset this variable, but that is better than publishing a hardcoded password as i will share the .rmd using GitHub. There is a better way of storing the connection info in a configuration file outside R. I used {r echo= FALSE} to hide “localpassword” variable while kniting to HTML.

Load data

Exploratorory Analysis

## [1] "data.frame"
## [1] 36  3
## [1] "ReviewerName" "MovieTitle"   "MovieRating"
## [1] FALSE
## [1] FALSE
##  ReviewerName        MovieTitle         MovieRating   
##  Length:36          Length:36          Min.   :1.000  
##  Class :character   Class :character   1st Qu.:3.000  
##  Mode  :character   Mode  :character   Median :4.000  
##                                        Mean   :3.639  
##                                        3rd Qu.:5.000  
##                                        Max.   :5.000

Aggregate

Joker has the best rating

Optional: Google Cloud and Survey Monkey

Mysql Google Cloud Instance

I used google cloud to create a mysql instance and connect the instance with my mysql workbench. So, if i load any file or create a table using workbench i will be able to access same table from google clould mysql instance. The data i am using here is a Survey Monkey Data for this section. I run a survey in my social media and slack class slack channel. I discover people tend not to participate in a survey eventhough the survey i created is takes less than 1 min to complete. My survey reached at least 500 people but i only received 24 responses. I had to manually get the survey monkey data as i am using a free version of survey monkey.

Survey Monkey result link: https://www.surveymonkey.com/results/SM-JNYCRQ3C7/

Connection

Establish Connection to google clound mysql instance. Hide host and password in two varibles.

Load data

Load data collected from survey monkey

Exploratorory Analysis

## [1] "data.frame"
## [1] 120   3
## [1] "ResponseID" "ShowTitle"  "Rating"
## [1] FALSE
## [1] TRUE
##    ResponseID     ShowTitle             Rating     
##  Min.   : 1.00   Length:120         Min.   :1.000  
##  1st Qu.: 6.75   Class :character   1st Qu.:3.000  
##  Median :12.50   Mode  :character   Median :4.000  
##  Mean   :12.50                      Mean   :3.926  
##  3rd Qu.:18.25                      3rd Qu.:5.000  
##  Max.   :24.00                      Max.   :5.000  
##                                     NA's   :26
## [1] 26
## [1] 0.07222222

There are 26 missing values within 120 rows.

Removing missing values

Again check for missing value and run summary command to see stats.

## [1] FALSE
##    ResponseID     ShowTitle             Rating     
##  Min.   : 1.00   Length:94          Min.   :1.000  
##  1st Qu.: 6.00   Class :character   1st Qu.:3.000  
##  Median :11.00   Mode  :character   Median :4.000  
##  Mean   :11.78                      Mean   :3.926  
##  3rd Qu.:18.00                      3rd Qu.:5.000  
##  Max.   :24.00                      Max.   :5.000

We were able to get rid of 26 missing values.

Aggregate

Breaking Bad (my favourite show of alltime) has the best rating even from real world survey data.

Survey Monkey Result

Survey Monkey Result

Survey Monkey Result

dbDisconnect(congsql)