Assignment SQL - R

Anil Akyildirim

2019-09-06

Introduction

In this assignment, we have been asked to do a survey within our family and friends, where they rate six recent popular movies from scale of 1 to 5. The observations would be captured and stored in an SQL database and loaded into an R dataframe.

About the Survey

The focus of the survey was mainly to capture everyone’s ratings. Questions and description of each variable are outlined below;

1- What is your name?

2- From 1 to 5, what do you rate movie (Avengers, Lion King, Joker, Spiderman, Aladdin, Captain_Marvel)?

3- What is your age?

4- What is your gender?

5- What is our relationship?

1- name: Name of the individual

2- rating: From 1 to 5 the rating of the movie provided by the individual.

3- age: age of the individual

4- gender: gender of the individiual.

5- type: relationship between the surveyer and the individual.

6- Addition to the questions since we selected the movies as a drop down option for individuals to select, movie_name: Name of the Movie.

Creating the MySQL Database

The survey results are collected in a spreadhseet and saved here. Addition to the survey, we can use google to find additional information for the particular movies such as, genre, year made and cost (in millions). This spreadhseet is saved here

In order to create the database, we have to create an sql instance , create a schema and required tables for further load into the R dataframe. The .sql file can be found here. In the process we can create multiple tables, filter and merge variables if we want to.

Loading the MySQL Database and Tables in R

## Installing package into 'C:/Users/Anil Akyildirim/Documents/R/win-library/3.6'
## (as 'lib' is unspecified)
## package 'RMariaDB' successfully unpacked and MD5 sums checked
## Warning: cannot remove prior installation of package 'RMariaDB'
## Warning in file.copy(savedcopy, lib, recursive = TRUE):
## problem copying C:\Users\Anil Akyildirim\Documents\R\win-
## library\3.6\00LOCK\RMariaDB\libs\x64\RMariaDB.dll to C:\Users\Anil
## Akyildirim\Documents\R\win-library\3.6\RMariaDB\libs\x64\RMariaDB.dll:
## Permission denied
## Warning: restored 'RMariaDB'
## 
## The downloaded binary packages are in
##  C:\Users\Anil Akyildirim\AppData\Local\Temp\RtmpMTLsYi\downloaded_packages
## <MariaDBConnection>
##   Host:    localhost
##   Server:  8.0.17
##   Client:  5.5.1
## [1] "movie_data"     "movie_data_set" "movie_ratings"  "movie_table"
##   name     movie_name   type gender age rating
## 1 Tara       Avengers Family Female  35      3
## 2 Tara      Lion King Family Female  35      3
## 3 Tara          Joker Family Female  35      4
## 4 Tara      Spiderman Family Female  35      2
## 5 Tara        Aladdin Family Female  35     NA
## 6 Tara Captain_Marvel Family Female  35      5

We have successfully created our data frame of the surveys. We can further review it briefly to see if there are any missing values.

##   name     movie_name   type gender age rating
## 1 Tara          Joker Family Female  35      4
## 2 Tara Captain_Marvel Family Female  35      5

We can also filter through the table to see findings of the survey. For example; Tara’s favorite movies are Joker and Captain Marvel.

## [1] 22

We have 22 missing values. We can replace the NA values with average movie rating.

## [1] 3.631579

If we dont count the NA missing values within the movie ratings, the average rating is 3.6. Since we have defined the rating to be 1 to 5 integer value, we can use the average movie rating is 3.

##      name     movie_name   type gender age rating
## 1    Tara       Avengers Family Female  35      3
## 2    Tara      Lion King Family Female  35      3
## 3    Tara          Joker Family Female  35      4
## 4    Tara      Spiderman Family Female  35      2
## 5    Tara        Aladdin Family Female  35      3
## 6    Tara Captain_Marvel Family Female  35      5
## 7   Kevin       Avengers Family   Male  10      5
## 8   Kevin      Lion King Family   Male  10      4
## 9   Kevin          Joker Family   Male  10      3
## 10  Kevin      Spiderman Family   Male  10      5
## 11  Kevin        Aladdin Family   Male  10      3
## 12  Kevin Captain_Marvel Family   Male  10      3
## 13 Connor       Avengers Family   Male   7      3
## 14 Connor      Lion King Family   Male   7      4
## 15 Connor          Joker Family   Male   7      3
## 16 Connor      Spiderman Family   Male   7      3
## 17 Connor        Aladdin Family   Male   7      5
## 18 Connor Captain_Marvel Family   Male   7      3
## 19  Isaac       Avengers Friend   Male  46      2
## 20  Isaac      Lion King Friend   Male  46      2
## 21  Isaac          Joker Friend   Male  46      4
## 22  Isaac      Spiderman Friend   Male  46      2
## 23  Isaac        Aladdin Friend   Male  46      1
## 24  Isaac Captain_Marvel Friend   Male  46      3
## 25   John       Avengers Friend   Male  43      2
## 26   John      Lion King Friend   Male  43      5
## 27   John          Joker Friend   Male  43      2
## 28   John      Spiderman Friend   Male  43      3
## 29   John        Aladdin Friend   Male  43      3
## 30   John Captain_Marvel Friend   Male  43      3
## 31     JP       Avengers Friend   Male  45      2
## 32     JP      Lion King Friend   Male  45      5
## 33     JP          Joker Friend   Male  45      3
## 34     JP      Spiderman Friend   Male  45      3
## 35     JP        Aladdin Friend   Male  45      3
## 36     JP Captain_Marvel Friend   Male  45      5
## 37    Can       Avengers Family   Male   9      5
## 38    Can      Lion King Family   Male   9      3
## 39    Can          Joker Family   Male   9      3
## 40    Can      Spiderman Family   Male   9      5
## 41    Can        Aladdin Family   Male   9      5
## 42    Can Captain_Marvel Family   Male   9      3
## 43 Zeynep       Avengers Family Female   6      3
## 44 Zeynep      Lion King Family Female   6      4
## 45 Zeynep          Joker Family Female   6      3
## 46 Zeynep      Spiderman Family Female   6      3
## 47 Zeynep        Aladdin Family Female   6      3
## 48 Zeynep Captain_Marvel Family Female   6      3
## 49  Seray       Avengers Family Female  38      3
## 50  Seray      Lion King Family Female  38      5
## 51  Seray          Joker Family Female  38      4
## 52  Seray      Spiderman Family Female  38      3
## 53  Seray        Aladdin Family Female  38      3
## 54  Seray Captain_Marvel Family Female  38      4
## 55   Bell       Avengers Friend Female  41      3
## 56   Bell      Lion King Friend Female  41      5
## 57   Bell          Joker Friend Female  41      3
## 58   Bell      Spiderman Friend Female  41      3
## 59   Bell        Aladdin Friend Female  41      3
## 60   Bell Captain_Marvel Friend Female  41      5

We also have another table we created in the SQL database that provides information about the movies. We can load that into R as a dataframe to review as well.

##       movie_name year     genre cost
## 1       Avengers 2019    Action  220
## 2      Lion King 2019 Animation   45
## 3          Joker 2019    Action   50
## 4      Spiderman 2019    Action  160
## 5        Aladdin 2019    Comedy  183
## 6 Captain_Marvel 2019    Action  152

We combined the survey observetation input and movie information in our SQL database and we can simply load that in R as a dataframe as well. We can also join them in R as a dataframe too.

##   name   type gender age rating     genre year cost
## 1 Tara Family Female  35      3    Action 2019  220
## 2 Tara Family Female  35      3 Animation 2019   45
## 3 Tara Family Female  35      4    Action 2019   50
## 4 Tara Family Female  35      2    Action 2019  160
## 5 Tara Family Female  35     NA    Comedy 2019  183
## 6 Tara Family Female  35      5    Action 2019  152

Since we pulled the tables from the SQL server again and merge the tables here, we need to replace the NULL values again here.

##      name   type gender age rating     genre year cost
## 1    Tara Family Female  35      3    Action 2019  220
## 2    Tara Family Female  35      3 Animation 2019   45
## 3    Tara Family Female  35      4    Action 2019   50
## 4    Tara Family Female  35      2    Action 2019  160
## 5    Tara Family Female  35      3    Comedy 2019  183
## 6    Tara Family Female  35      5    Action 2019  152
## 7   Kevin Family   Male  10      5    Action 2019  220
## 8   Kevin Family   Male  10      4 Animation 2019   45
## 9   Kevin Family   Male  10      3    Action 2019   50
## 10  Kevin Family   Male  10      5    Action 2019  160
## 11  Kevin Family   Male  10      3    Comedy 2019  183
## 12  Kevin Family   Male  10      3    Action 2019  152
## 13 Connor Family   Male   7      3    Action 2019  220
## 14 Connor Family   Male   7      4 Animation 2019   45
## 15 Connor Family   Male   7      3    Action 2019   50
## 16 Connor Family   Male   7      3    Action 2019  160
## 17 Connor Family   Male   7      5    Comedy 2019  183
## 18 Connor Family   Male   7      3    Action 2019  152
## 19  Isaac Friend   Male  46      2    Action 2019  220
## 20  Isaac Friend   Male  46      2 Animation 2019   45
## 21  Isaac Friend   Male  46      4    Action 2019   50
## 22  Isaac Friend   Male  46      2    Action 2019  160
## 23  Isaac Friend   Male  46      1    Comedy 2019  183
## 24  Isaac Friend   Male  46      3    Action 2019  152
## 25   John Friend   Male  43      2    Action 2019  220
## 26   John Friend   Male  43      5 Animation 2019   45
## 27   John Friend   Male  43      2    Action 2019   50
## 28   John Friend   Male  43      3    Action 2019  160
## 29   John Friend   Male  43      3    Comedy 2019  183
## 30   John Friend   Male  43      3    Action 2019  152
## 31     JP Friend   Male  45      2    Action 2019  220
## 32     JP Friend   Male  45      5 Animation 2019   45
## 33     JP Friend   Male  45      3    Action 2019   50
## 34     JP Friend   Male  45      3    Action 2019  160
## 35     JP Friend   Male  45      3    Comedy 2019  183
## 36     JP Friend   Male  45      5    Action 2019  152
## 37    Can Family   Male   9      5    Action 2019  220
## 38    Can Family   Male   9      3 Animation 2019   45
## 39    Can Family   Male   9      3    Action 2019   50
## 40    Can Family   Male   9      5    Action 2019  160
## 41    Can Family   Male   9      5    Comedy 2019  183
## 42    Can Family   Male   9      3    Action 2019  152
## 43 Zeynep Family Female   6      3    Action 2019  220
## 44 Zeynep Family Female   6      4 Animation 2019   45
## 45 Zeynep Family Female   6      3    Action 2019   50
## 46 Zeynep Family Female   6      3    Action 2019  160
## 47 Zeynep Family Female   6      3    Comedy 2019  183
## 48 Zeynep Family Female   6      3    Action 2019  152
## 49  Seray Family Female  38      3    Action 2019  220
## 50  Seray Family Female  38      5 Animation 2019   45
## 51  Seray Family Female  38      4    Action 2019   50
## 52  Seray Family Female  38      3    Action 2019  160
## 53  Seray Family Female  38      3    Comedy 2019  183
## 54  Seray Family Female  38      4    Action 2019  152
## 55   Bell Friend Female  41      3    Action 2019  220
## 56   Bell Friend Female  41      5 Animation 2019   45
## 57   Bell Friend Female  41      3    Action 2019   50
## 58   Bell Friend Female  41      3    Action 2019  160
## 59   Bell Friend Female  41      3    Comedy 2019  183
## 60   Bell Friend Female  41      5    Action 2019  152

We can further look at the distribution on age and correlation between cost and rating. Do movies have to cost more to be better?

## Installing package into 'C:/Users/Anil Akyildirim/Documents/R/win-library/3.6'
## (as 'lib' is unspecified)
## package 'ggplot2' successfully unpacked and MD5 sums checked
## 
## The downloaded binary packages are in
##  C:\Users\Anil Akyildirim\AppData\Local\Temp\RtmpMTLsYi\downloaded_packages

There is no linear correlation between cost and rating of a movie.

This shows us that , none of my family members or friends ages between 10-35 partricipated in the survey.

Conclusion

We have collected movie survey observetaions from selected friends and family members. It is clear that the experiment requires non sample bias data(as it seems there is no response, voluntery response and convinience sample bias in the sampling) to make proper analysis and exploration.