Source files: https://github.com/djlofland/DATA607_F2019/tree/master/Assignment2

Instructions

Assignment

Choose six recent popular movies. Ask at least five people that you know (friends, family, classmates, imaginary friends) to rate each of these movie that they have seen on a scale of 1 to 5. Take the results (observations) and store them in a SQL database. Load the information into an R dataframe. Your deliverables should include your SQL scripts and your R Markdown code, posted to GitHub. This is by design a very open ended assignment. A variety of reasonable approaches are acceptable. You can (and should) blank out your SQL password if your solution requires it; otherwise, full credit requires that your code is “reproducible,” with the assumption that I have the same database server and R software.

Setup Project

Load Libraries

## 
## Attaching package: 'lubridate'
## The following object is masked from 'package:base':
## 
##     date
## ── Attaching packages ─────────────────────────────────────────────────────────────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 3.2.1     ✔ purrr   0.3.2
## ✔ tibble  2.1.3     ✔ dplyr   0.8.3
## ✔ tidyr   0.8.3     ✔ stringr 1.4.0
## ✔ readr   1.3.1     ✔ forcats 0.4.0
## ── Conflicts ────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ lubridate::as.difftime() masks base::as.difftime()
## ✖ lubridate::date()        masks base::date()
## ✖ dplyr::filter()          masks stats::filter()
## ✖ lubridate::intersect()   masks base::intersect()
## ✖ dplyr::lag()             masks stats::lag()
## ✖ lubridate::setdiff()     masks base::setdiff()
## ✖ lubridate::union()       masks base::union()
## Loading required package: DBI
## 
## Attaching package: 'gridExtra'
## The following object is masked from 'package:dplyr':
## 
##     combine

Setup of .Renvron variables

To prevent sensitive information from being exposed, I created a file called .Renviron in my home directory. RStudio will read this file and load system environment variables. We can then just use Sys.getenv(“VAR_NAME”) to reference the value without exposing it.

Renviron settings

Renviron settings

Load Survey Data to DB

Load movies, options, ages

## [1] "I didn't see it"         "Terrible Movie"         
## [3] "Not Good"                "I Don't have an opinion"
## [5] "Decent Movie"            "Great Movie"            
## [7] "Best Movie Ever!"
##                MovieID                MovieName
## 1    AnnabelleComeHome     Annabelle Comes Home
## 2         HobbsAndShaw             Hobbs & Shaw
## 3         LionKing2019     The Lion King (2019)
## 4         MetersDown47           47 Meters Down
## 5 SpidermanFarFromHome Spiderman: Far From Home
## 6            ToyStory4              Toy Story 4
## [1] "Under 18" "18-24"    "25-34"    "35-44"    "45-54"    "55-64"   
## [7] "65+"

Load SurveyMonkey CSV file

## Parsed with column specification:
## cols(
##   `Respondent ID` = col_double(),
##   `Collector ID` = col_double(),
##   `Start Date` = col_character(),
##   `End Date` = col_character(),
##   `IP Address` = col_character(),
##   `Email Address` = col_logical(),
##   `First Name` = col_logical(),
##   `Last Name` = col_logical(),
##   `Custom Data 1` = col_logical(),
##   `Spiderman: Far from Home` = col_character(),
##   `Hobbs & Shaw` = col_character(),
##   `The Lion King (2019 Recent Version)` = col_character(),
##   `Annabelle Come Home` = col_character(),
##   `Toy Story 4` = col_character(),
##   `47 Meters Down` = col_character(),
##   `Your Gender` = col_character(),
##   `Your Age Bracket` = col_character()
## )

Cleanup CSV data-prep for DB

## # A tibble: 21 x 12
##    RespondentID CollectorID StartDate EndDate SpidermanFarFro… HobbsAndShaw
##           <dbl>       <dbl> <chr>     <chr>   <chr>            <chr>       
##  1  10976555523   245709500 09/07/20… 09/07/… Decent Movie     I didn't se…
##  2  10972842969   245715358 09/05/20… 09/05/… I didn't see it  I didn't se…
##  3  10972794947   245715358 09/05/20… 09/05/… I didn't see it  I didn't se…
##  4  10972751462   245715358 09/05/20… 09/05/… I didn't see it  I didn't se…
##  5  10972471453   245715358 09/05/20… 09/05/… I Don't have an… I Don't hav…
##  6  10971873184   245715358 09/05/20… 09/05/… Not Good         I didn't se…
##  7  10971861372   245715358 09/05/20… 09/05/… I didn't see it  I didn't se…
##  8  10971809011   245715358 09/05/20… 09/05/… Best Movie Ever! Decent Movie
##  9  10971398086   245709500 09/05/20… 09/05/… I didn't see it  I didn't se…
## 10  10971350360   245715358 09/05/20… 09/05/… Great Movie      I didn't se…
## # … with 11 more rows, and 6 more variables: LionKing2019 <chr>,
## #   AnnabelleComeHome <chr>, ToyStory4 <chr>, MetersDown47 <chr>,
## #   Gender <chr>, Age <chr>

Load Data for Analaysis

Read responses from DB

## # A tibble: 21 x 12
##    RespondentID CollectorID StartDate EndDate SpidermanFarFro… HobbsAndShaw
##    <chr>        <chr>       <chr>     <chr>   <chr>            <chr>       
##  1 10970180147  245709500   2019-04-… 2019-0… Great Movie      Decent Movie
##  2 10970192077  245709500   2019-04-… 2019-0… Decent Movie     Decent Movie
##  3 10970285008  245709500   2019-04-… 2019-0… Decent Movie     I didn't se…
##  4 10970920112  245715358   2019-05-… 2019-0… Great Movie      Great Movie 
##  5 10971077610  245709500   2019-05-… 2019-0… Great Movie      Decent Movie
##  6 10971094944  245709500   2019-05-… 2019-0… Decent Movie     I didn't se…
##  7 10971112419  245709500   2019-05-… 2019-0… I didn't see it  I didn't se…
##  8 10971157469  245709500   2019-05-… 2019-0… I didn't see it  I didn't se…
##  9 10971160987  245709500   2019-05-… 2019-0… I didn't see it  I didn't se…
## 10 10971322807  245715358   2019-05-… 2019-0… I didn't see it  I didn't se…
## # … with 11 more rows, and 6 more variables: LionKing2019 <chr>,
## #   AnnabelleComeHome <chr>, ToyStory4 <chr>, MetersDown47 <chr>,
## #   Gender <chr>, Age <chr>

Clean up Columns

## # A tibble: 21 x 12
##    RespondentID CollectorID StartDate           EndDate            
##    <chr>        <fct>       <dttm>              <dttm>             
##  1 10970180147  245709500   2019-04-09 22:00:52 2019-04-09 22:01:59
##  2 10970192077  245709500   2019-04-09 22:07:14 2019-04-09 22:09:46
##  3 10970285008  245709500   2019-04-09 23:11:45 2019-04-09 23:12:31
##  4 10970920112  245715358   2019-05-09 06:42:04 2019-05-09 06:42:40
##  5 10971077610  245709500   2019-05-09 07:57:59 2019-05-09 07:58:41
##  6 10971094944  245709500   2019-05-09 08:05:00 2019-05-09 08:05:40
##  7 10971112419  245709500   2019-05-09 08:11:36 2019-05-09 08:12:32
##  8 10971157469  245709500   2019-05-09 08:29:44 2019-05-09 08:30:27
##  9 10971160987  245709500   2019-05-09 08:31:02 2019-05-09 08:31:52
## 10 10971322807  245715358   2019-05-09 09:30:08 2019-05-09 09:30:50
## # … with 11 more rows, and 8 more variables: SpidermanFarFromHome <ord>,
## #   HobbsAndShaw <ord>, LionKing2019 <ord>, AnnabelleComeHome <ord>,
## #   ToyStory4 <ord>, MetersDown47 <ord>, Gender <fct>, Age <ord>

Analysis

Responses

## 
##   Man Woman 
##    11    10
## 
## Under 18    18-24    25-34    35-44    45-54    55-64      65+ 
##        2        3        3        4        5        3        1
## 
## Call:
##  density.default(x = table(survey$Age))
## 
## Data: table(survey$Age) (7 obs.);    Bandwidth 'bw' = 0.4551
## 
##        x                 y           
##  Min.   :-0.3653   Min.   :0.001406  
##  1st Qu.: 1.3173   1st Qu.:0.064824  
##  Median : 3.0000   Median :0.139551  
##  Mean   : 3.0000   Mean   :0.148373  
##  3rd Qu.: 4.6827   3rd Qu.:0.190352  
##  Max.   : 6.3653   Max.   :0.398055

Plots

Total Movies Seen

##    
##     Under 18 18-24 25-34 35-44 45-54 55-64 65+
##   1        0     1     1     0     1     0   0
##   2        1     1     0     2     1     1   0
##   3        0     0     1     0     1     1   0
##   4        0     0     0     0     1     0   0
##   6        0     0     1     0     0     0   0
##    
##       Under 18      18-24      25-34      35-44      45-54      55-64
##   1 0.00000000 0.07142857 0.07142857 0.00000000 0.07142857 0.00000000
##   2 0.07142857 0.07142857 0.00000000 0.14285714 0.07142857 0.07142857
##   3 0.00000000 0.00000000 0.07142857 0.00000000 0.07142857 0.07142857
##   4 0.00000000 0.00000000 0.00000000 0.00000000 0.07142857 0.00000000
##   6 0.00000000 0.00000000 0.07142857 0.00000000 0.00000000 0.00000000
##    
##            65+
##   1 0.00000000
##   2 0.00000000
##   3 0.00000000
##   4 0.00000000
##   6 0.00000000
##    
##     Man Woman
##   1   2     1
##   2   4     2
##   3   3     0
##   4   0     1
##   6   1     0
##    
##            Man      Woman
##   1 0.14285714 0.07142857
##   2 0.28571429 0.14285714
##   3 0.21428571 0.00000000
##   4 0.00000000 0.07142857
##   6 0.07142857 0.00000000