Watching movie is one of the best thing to do during our free time, and Netflix may become your best friends after Covid-19 hit. I reach out to my friends and ask them to rate 6 of the movies on survey monkey. Let’s see what movie they like, and what movie they are not interested much!
For this project, I decided to use survey monkey to collect my data from audience
URL:https://www.surveymonkey.com/r/36V9CBW
After getting the data from survey monkey, I have created following 4 local tables to store the info and data by writing SQL code.
movies: This table stores movies name and assigns movies id friends: This table stores friends name and and assignes friends id review: it stores review data from survey money, csv file are linked down below movieruslt: it join 3 table together to get the final data for anlaysis, SQL code are linked down below
check the review data in cvs format link out here
check the SQL code link out here
We can install R package and connect R to local PostgreSQL server
library(DBI)
db <- 'movie_assignment'
host_db <- 'localhost'
db_port <- '5432'
db_user <- 'jayleesps'
db_password <- myLocalPassword
con <- dbConnect(RPostgres::Postgres(), dbname = db, host=host_db, port=db_port, user=db_user, password=db_password)
dbListTables(con)
## [1] "movies" "friends" "review" "movieresult"
## Showing Data
Under this section, I will show you each of the table!
##moviestable <- dbSendQuery(con, "SELECT * FROM movies")
##dbFetch(moviestable)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
tbl(con, "movies")
## # Source: table<movies> [?? x 2]
## # Database: postgres [jayleesps@localhost:5432/movie_assignment]
## movie_id movie_name
## <int> <chr>
## 1 1 SpiderMan
## 2 2 Sing2
## 3 3 Uncharted
## 4 4 Moonfall
## 5 5 Blacklight
## 6 6 Marry Me
tbl(con, "friends")
## # Source: table<friends> [?? x 2]
## # Database: postgres [jayleesps@localhost:5432/movie_assignment]
## friends_id friends_name
## <int> <chr>
## 1 1 Victor
## 2 2 Pak
## 3 3 Daisy
## 4 4 Mike
## 5 5 Greg
tbl(con, "review")
## # Source: table<review> [?? x 3]
## # Database: postgres [jayleesps@localhost:5432/movie_assignment]
## movie_id friends_id ranking
## <int> <int> <int>
## 1 1 1 4
## 2 2 1 NA
## 3 3 1 3
## 4 4 1 2
## 5 5 1 3
## 6 6 1 4
## 7 1 2 5
## 8 2 2 5
## 9 3 2 3
## 10 4 2 NA
## # … with more rows
tbl(con, "movieresult")
## # Source: table<movieresult> [?? x 5]
## # Database: postgres [jayleesps@localhost:5432/movie_assignment]
## movie_id friends_id ranking movie_name friends_name
## <int> <int> <int> <chr> <chr>
## 1 1 1 4 SpiderMan Victor
## 2 2 1 NA Sing2 Victor
## 3 3 1 3 Uncharted Victor
## 4 4 1 2 Moonfall Victor
## 5 5 1 3 Blacklight Victor
## 6 6 1 4 Marry Me Victor
## 7 1 2 5 SpiderMan Pak
## 8 2 2 5 Sing2 Pak
## 9 3 2 3 Uncharted Pak
## 10 4 2 NA Moonfall Pak
## # … with more rows
We can just pull the data that the ranking is not null since we have already join the table together by using SQL.
finaltable <- dbGetQuery(con,'
select *
from "movieresult"
where ranking >= 1
')
We can see Spiderman is the most watched movie and people enjoy the most which can explain why it is famous!
aggregate(ranking ~ movie_name,finaltable,mean)
## movie_name ranking
## 1 Blacklight 3.500000
## 2 Marry Me 3.000000
## 3 Moonfall 2.333333
## 4 Sing2 3.750000
## 5 SpiderMan 4.400000
## 6 Uncharted 3.000000
aggregate(ranking ~ friends_name,finaltable,mean)
## friends_name ranking
## 1 Daisy 4.000000
## 2 Greg 2.666667
## 3 Mike 3.600000
## 4 Pak 4.000000
## 5 Victor 3.200000
aggregate(movie_id ~ movie_name, data = finaltable, FUN = length)
## movie_name movie_id
## 1 Blacklight 4
## 2 Marry Me 4
## 3 Moonfall 3
## 4 Sing2 4
## 5 SpiderMan 5
## 6 Uncharted 4
aggregate(friends_id ~ friends_name, data = finaltable, FUN = length)
## friends_name friends_id
## 1 Daisy 4
## 2 Greg 6
## 3 Mike 5
## 4 Pak 4
## 5 Victor 5
library(ggplot2)
ggplot (finaltable, aes(x=movie_name, y=ranking,fill=friends_name)) + geom_col() + xlab("movie name") + ylab("ranking") +ggtitle("Ranking of the movies")
moviescount <- table(finaltable$friends_name)
lbls <- c("Daisy","Greg","Mike","Pak","Victor")
pct <- round(moviescount/sum(moviescount)*100)
lbls <-paste(lbls,pct)
lbls <-paste(lbls,"%")
pie(moviescount, labels=lbls, main ="Pie chart of movie watched")
peoplecount <- table(finaltable$movie_name)
lbls <- c("Blacklight","Marry Me","Moonfall","Sing2","SpiderMan","Uncharted")
pct <- round(peoplecount/sum(peoplecount)*100)
lbls <-paste(lbls,pct)
lbls <-paste(lbls,"%")
pie(peoplecount, labels=lbls, main ="Movies of people watched")