Introduction
In this assignment, we have been asked to do a survey within our family and friends, where they rate six recent popular movies from scale of 1 to 5. The observations would be captured and stored in an SQL database and loaded into an R dataframe.
About the Survey
The focus of the survey was mainly to capture everyone’s ratings. Questions and description of each variable are outlined below;
1- What is your name?
2- From 1 to 5, what do you rate movie (Avengers, Lion King, Joker, Spiderman, Aladdin, Captain_Marvel)?
3- What is your age?
4- What is your gender?
5- What is our relationship?
1- name: Name of the individual
2- rating: From 1 to 5 the rating of the movie provided by the individual.
3- age: age of the individual
4- gender: gender of the individiual.
5- type: relationship between the surveyer and the individual.
6- Addition to the questions since we selected the movies as a drop down option for individuals to select, movie_name: Name of the Movie.
Creating the MySQL Database
The survey results are collected in a spreadhseet and saved here. Addition to the survey, we can use google to find additional information for the particular movies such as, genre, year made and cost (in millions). This spreadhseet is saved here
In order to create the database, we have to create an sql instance , create a schema and required tables for further load into the R dataframe. The .sql file can be found here. In the process we can create multiple tables, filter and merge variables if we want to.
Loading the MySQL Database and Tables in R
## Installing package into 'C:/Users/Anil Akyildirim/Documents/R/win-library/3.6'
## (as 'lib' is unspecified)
## package 'RMariaDB' successfully unpacked and MD5 sums checked
## Warning: cannot remove prior installation of package 'RMariaDB'
## Warning in file.copy(savedcopy, lib, recursive = TRUE):
## problem copying C:\Users\Anil Akyildirim\Documents\R\win-
## library\3.6\00LOCK\RMariaDB\libs\x64\RMariaDB.dll to C:\Users\Anil
## Akyildirim\Documents\R\win-library\3.6\RMariaDB\libs\x64\RMariaDB.dll:
## Permission denied
## Warning: restored 'RMariaDB'
##
## The downloaded binary packages are in
## C:\Users\Anil Akyildirim\AppData\Local\Temp\RtmpMTLsYi\downloaded_packages
library(RMariaDB)
# setting up database credentials as variables
user_name <- 'root'
user_password <- "GGanuil1"
database <- 'movies'
host_name <- 'localhost'
#connecting to the MySQL database
myDb <- dbConnect(RMariaDB::MariaDB(), user=user_name, password=user_password, dbname=database, host=host_name)
myDb
## <MariaDBConnection>
## Host: localhost
## Server: 8.0.17
## Client: 5.5.1
## [1] "movie_data" "movie_data_set" "movie_ratings" "movie_table"
## name movie_name type gender age rating
## 1 Tara Avengers Family Female 35 3
## 2 Tara Lion King Family Female 35 3
## 3 Tara Joker Family Female 35 4
## 4 Tara Spiderman Family Female 35 2
## 5 Tara Aladdin Family Female 35 NA
## 6 Tara Captain_Marvel Family Female 35 5
We have successfully created our data frame of the surveys. We can further review it briefly to see if there are any missing values.
# review movie_survey table
tara_fav_movies <- dbGetQuery(myDb, 'select * from movie_ratings where name="Tara" and rating > 3')
tara_fav_movies
## name movie_name type gender age rating
## 1 Tara Joker Family Female 35 4
## 2 Tara Captain_Marvel Family Female 35 5
We can also filter through the table to see findings of the survey. For example; Tara’s favorite movies are Joker and Captain Marvel.
## [1] 22
We have 22 missing values. We can replace the NA values with average movie rating.
## [1] 3.631579
If we dont count the NA missing values within the movie ratings, the average rating is 3.6. Since we have defined the rating to be 1 to 5 integer value, we can use the average movie rating is 3.
## name movie_name type gender age rating
## 1 Tara Avengers Family Female 35 3
## 2 Tara Lion King Family Female 35 3
## 3 Tara Joker Family Female 35 4
## 4 Tara Spiderman Family Female 35 2
## 5 Tara Aladdin Family Female 35 3
## 6 Tara Captain_Marvel Family Female 35 5
## 7 Kevin Avengers Family Male 10 5
## 8 Kevin Lion King Family Male 10 4
## 9 Kevin Joker Family Male 10 3
## 10 Kevin Spiderman Family Male 10 5
## 11 Kevin Aladdin Family Male 10 3
## 12 Kevin Captain_Marvel Family Male 10 3
## 13 Connor Avengers Family Male 7 3
## 14 Connor Lion King Family Male 7 4
## 15 Connor Joker Family Male 7 3
## 16 Connor Spiderman Family Male 7 3
## 17 Connor Aladdin Family Male 7 5
## 18 Connor Captain_Marvel Family Male 7 3
## 19 Isaac Avengers Friend Male 46 2
## 20 Isaac Lion King Friend Male 46 2
## 21 Isaac Joker Friend Male 46 4
## 22 Isaac Spiderman Friend Male 46 2
## 23 Isaac Aladdin Friend Male 46 1
## 24 Isaac Captain_Marvel Friend Male 46 3
## 25 John Avengers Friend Male 43 2
## 26 John Lion King Friend Male 43 5
## 27 John Joker Friend Male 43 2
## 28 John Spiderman Friend Male 43 3
## 29 John Aladdin Friend Male 43 3
## 30 John Captain_Marvel Friend Male 43 3
## 31 JP Avengers Friend Male 45 2
## 32 JP Lion King Friend Male 45 5
## 33 JP Joker Friend Male 45 3
## 34 JP Spiderman Friend Male 45 3
## 35 JP Aladdin Friend Male 45 3
## 36 JP Captain_Marvel Friend Male 45 5
## 37 Can Avengers Family Male 9 5
## 38 Can Lion King Family Male 9 3
## 39 Can Joker Family Male 9 3
## 40 Can Spiderman Family Male 9 5
## 41 Can Aladdin Family Male 9 5
## 42 Can Captain_Marvel Family Male 9 3
## 43 Zeynep Avengers Family Female 6 3
## 44 Zeynep Lion King Family Female 6 4
## 45 Zeynep Joker Family Female 6 3
## 46 Zeynep Spiderman Family Female 6 3
## 47 Zeynep Aladdin Family Female 6 3
## 48 Zeynep Captain_Marvel Family Female 6 3
## 49 Seray Avengers Family Female 38 3
## 50 Seray Lion King Family Female 38 5
## 51 Seray Joker Family Female 38 4
## 52 Seray Spiderman Family Female 38 3
## 53 Seray Aladdin Family Female 38 3
## 54 Seray Captain_Marvel Family Female 38 4
## 55 Bell Avengers Friend Female 41 3
## 56 Bell Lion King Friend Female 41 5
## 57 Bell Joker Friend Female 41 3
## 58 Bell Spiderman Friend Female 41 3
## 59 Bell Aladdin Friend Female 41 3
## 60 Bell Captain_Marvel Friend Female 41 5
We also have another table we created in the SQL database that provides information about the movies. We can load that into R as a dataframe to review as well.
## movie_name year genre cost
## 1 Avengers 2019 Action 220
## 2 Lion King 2019 Animation 45
## 3 Joker 2019 Action 50
## 4 Spiderman 2019 Action 160
## 5 Aladdin 2019 Comedy 183
## 6 Captain_Marvel 2019 Action 152
We combined the survey observetation input and movie information in our SQL database and we can simply load that in R as a dataframe as well. We can also join them in R as a dataframe too.
movie_df <- dbGetQuery(myDb, "SELECT name, type, gender, age, rating, genre, year, cost FROM movie_ratings JOIN movie_data ON movie_ratings.movie_name=movie_data.movie_name")
head(movie_df)
## name type gender age rating genre year cost
## 1 Tara Family Female 35 3 Action 2019 220
## 2 Tara Family Female 35 3 Animation 2019 45
## 3 Tara Family Female 35 4 Action 2019 50
## 4 Tara Family Female 35 2 Action 2019 160
## 5 Tara Family Female 35 NA Comedy 2019 183
## 6 Tara Family Female 35 5 Action 2019 152
Since we pulled the tables from the SQL server again and merge the tables here, we need to replace the NULL values again here.
## name type gender age rating genre year cost
## 1 Tara Family Female 35 3 Action 2019 220
## 2 Tara Family Female 35 3 Animation 2019 45
## 3 Tara Family Female 35 4 Action 2019 50
## 4 Tara Family Female 35 2 Action 2019 160
## 5 Tara Family Female 35 3 Comedy 2019 183
## 6 Tara Family Female 35 5 Action 2019 152
## 7 Kevin Family Male 10 5 Action 2019 220
## 8 Kevin Family Male 10 4 Animation 2019 45
## 9 Kevin Family Male 10 3 Action 2019 50
## 10 Kevin Family Male 10 5 Action 2019 160
## 11 Kevin Family Male 10 3 Comedy 2019 183
## 12 Kevin Family Male 10 3 Action 2019 152
## 13 Connor Family Male 7 3 Action 2019 220
## 14 Connor Family Male 7 4 Animation 2019 45
## 15 Connor Family Male 7 3 Action 2019 50
## 16 Connor Family Male 7 3 Action 2019 160
## 17 Connor Family Male 7 5 Comedy 2019 183
## 18 Connor Family Male 7 3 Action 2019 152
## 19 Isaac Friend Male 46 2 Action 2019 220
## 20 Isaac Friend Male 46 2 Animation 2019 45
## 21 Isaac Friend Male 46 4 Action 2019 50
## 22 Isaac Friend Male 46 2 Action 2019 160
## 23 Isaac Friend Male 46 1 Comedy 2019 183
## 24 Isaac Friend Male 46 3 Action 2019 152
## 25 John Friend Male 43 2 Action 2019 220
## 26 John Friend Male 43 5 Animation 2019 45
## 27 John Friend Male 43 2 Action 2019 50
## 28 John Friend Male 43 3 Action 2019 160
## 29 John Friend Male 43 3 Comedy 2019 183
## 30 John Friend Male 43 3 Action 2019 152
## 31 JP Friend Male 45 2 Action 2019 220
## 32 JP Friend Male 45 5 Animation 2019 45
## 33 JP Friend Male 45 3 Action 2019 50
## 34 JP Friend Male 45 3 Action 2019 160
## 35 JP Friend Male 45 3 Comedy 2019 183
## 36 JP Friend Male 45 5 Action 2019 152
## 37 Can Family Male 9 5 Action 2019 220
## 38 Can Family Male 9 3 Animation 2019 45
## 39 Can Family Male 9 3 Action 2019 50
## 40 Can Family Male 9 5 Action 2019 160
## 41 Can Family Male 9 5 Comedy 2019 183
## 42 Can Family Male 9 3 Action 2019 152
## 43 Zeynep Family Female 6 3 Action 2019 220
## 44 Zeynep Family Female 6 4 Animation 2019 45
## 45 Zeynep Family Female 6 3 Action 2019 50
## 46 Zeynep Family Female 6 3 Action 2019 160
## 47 Zeynep Family Female 6 3 Comedy 2019 183
## 48 Zeynep Family Female 6 3 Action 2019 152
## 49 Seray Family Female 38 3 Action 2019 220
## 50 Seray Family Female 38 5 Animation 2019 45
## 51 Seray Family Female 38 4 Action 2019 50
## 52 Seray Family Female 38 3 Action 2019 160
## 53 Seray Family Female 38 3 Comedy 2019 183
## 54 Seray Family Female 38 4 Action 2019 152
## 55 Bell Friend Female 41 3 Action 2019 220
## 56 Bell Friend Female 41 5 Animation 2019 45
## 57 Bell Friend Female 41 3 Action 2019 50
## 58 Bell Friend Female 41 3 Action 2019 160
## 59 Bell Friend Female 41 3 Comedy 2019 183
## 60 Bell Friend Female 41 5 Action 2019 152
We can further look at the distribution on age and correlation between cost and rating. Do movies have to cost more to be better?
## Installing package into 'C:/Users/Anil Akyildirim/Documents/R/win-library/3.6'
## (as 'lib' is unspecified)
## package 'ggplot2' successfully unpacked and MD5 sums checked
##
## The downloaded binary packages are in
## C:\Users\Anil Akyildirim\AppData\Local\Temp\RtmpMTLsYi\downloaded_packages
There is no linear correlation between cost and rating of a movie.
This shows us that , none of my family members or friends ages between 10-35 partricipated in the survey.
Conclusion
We have collected movie survey observetaions from selected friends and family members. It is clear that the experiment requires non sample bias data(as it seems there is no response, voluntery response and convinience sample bias in the sampling) to make proper analysis and exploration.