The goal of this assignment is to create and analyze a survey of six movies and add the data to SQL database where we implement the data from a secure password host to the R markdown. the finding of this report is analyzing the movies and find out which one has the most rating.
I conducted the survey by asking friends and coworkers about the six movies, some of them have seen most of the movies and some saw only few of the movies.
I uploaded the files the Github and you can find the Github
And you can check the files on Github
And the Rpubs
urlA <- "https://raw.githubusercontent.com/akarimhammoud/607-Data-Acquisition-and-Management-CUNY-SPS-Fall2020/master/Week%202%20R%20and%20SQL/Data/Week%202%20movie_data.csv"
urlB <- "https://raw.githubusercontent.com/akarimhammoud/607-Data-Acquisition-and-Management-CUNY-SPS-Fall2020/master/Week%202%20R%20and%20SQL/Data/Week2_people.csv"
rating <- read.csv(urlA, header = TRUE, sep = ",")
rating## ID People Frozen.2 Murder.Mystery Moana Joker The.Big.Kick Get.out
## 1 1 Alice 5 4 5 NA 3 4
## 2 2 Tony 4 3 3 4 NA 5
## 3 3 Zen 5 2 5 5 1 5
## 4 4 Majed NA 5 NA 5 2 3
## 5 5 Gary NA 4 NA 4 3 3
## ID People sex relationship
## 1 1 Alice female friend
## 2 2 Tony male coworker
## 3 3 Zen female girlfriend
## 4 4 Majed male friend
## 5 5 Gary male coworker
Connect to MySql on Google Cloud host
conn <- dbConnect(MySQL(),
user="root", password= "dav",
dbname="data607", host= "35.188.162.1")
summary(conn)## <MySQLConnection:0,0>
## User: root
## Host: 35.188.162.1
## Dbname: data607
## Connection type: 35.188.162.1 via TCP/IP
##
## Results:
Creating table for rating and people on MySql
## [1] TRUE
Creating table for the people
## [1] TRUE
join the rating and people tables and getting the data
## row_names ID People Frozen.2 Murder.Mystery Moana Joker The.Big.Kick Get.out
## 1 1 1 Alice 5 4 5 NA 3 4
## 2 2 2 Tony 4 3 3 4 NA 5
## 3 3 3 Zen 5 2 5 5 1 5
## 4 4 4 Majed NA 5 NA 5 2 3
## 5 5 5 Gary NA 4 NA 4 3 3
## row_names ID People sex relationship
## 1 1 1 Alice female friend
## 2 2 2 Tony male coworker
## 3 3 3 Zen female girlfriend
## 4 4 4 Majed male friend
## 5 5 5 Gary male coworker
We get rid of the duplicate columns
## row_names ID People Frozen.2 Murder.Mystery Moana Joker The.Big.Kick Get.out
## 1 1 1 Alice 5 4 5 NA 3 4
## 2 2 2 Tony 4 3 3 4 NA 5
## 3 3 3 Zen 5 2 5 5 1 5
## 4 4 4 Majed NA 5 NA 5 2 3
## 5 5 5 Gary NA 4 NA 4 3 3
## sex relationship
## 1 female friend
## 2 male coworker
## 3 female girlfriend
## 4 male friend
## 5 male coworker
Calculating the mean average of rating for the movies, and considering there are NA
movies <- subset(data, select = c(Frozen.2, Murder.Mystery, Moana, Joker, The.Big.Kick, Get.out))
movies <- colMeans(movies, na.rm = TRUE, dims = 1)
movies## Frozen.2 Murder.Mystery Moana Joker The.Big.Kick
## 4.666667 3.600000 4.333333 4.500000 2.250000
## Get.out
## 4.000000
After pulling the average rating for each movie we present the data on the chart
par(mar=c(5,2,2,0))
barplot(movies, main="Movies Rating", xlab="Movies", ylim=c(0,6), horiz=FALSE,cex.lab=1 , col= c("lightcyan","mistyrose","beige","yellow","red","green"), names.arg=c("Frozen 2", "MurderMystery", "Moana", "Joker", "The Big Kick", "Get out" ))As we see in the barchart, the movies Frozen 2 and Joker got the highest average rating evethough Frozen2 had been watched only by 3 people, while Joker had been watched by all of the people in the rating list. on the other side The Big Sick movie got the worst rating of around only 2 out of 5 and it was watched by 4 people, it seems like they didn’t enjoy it a lot.
Thanks, Karim Hammoud