Choose six recent popular movies. Ask at least five people that you know (friends, family, classmates, imaginary friends if necessary) to rate each of these movies that they have seen on a scale of 1 to 5. Take the results (observations) and store them in a SQL database of your choosing. Load the information from the SQL database into an R dataframe.
Pior to created code, I created 2 CSV files and uploaded them into GitHub.
Additionally, I created and connected to a cloud-based database.
In the below code, I first captured the data on CSV files into dataframes. Followed by establishing a connection to a dattabase. I, then, queried the tables and saved the results to a dataframe. Lastly, I performed analyses on the results.
Here, I loaded the readr package to read the csv files on GitHub and saved them to dataframes.
library(readr)
friendsUrl <- 'https://raw.githubusercontent.com/dcorrea614/MSDS/master/assignment2%20-%20friends.csv'
ratingURL <- 'https://raw.githubusercontent.com/dcorrea614/MSDS/master/assignment2%20-%20rating.csv'
dfFriends <- read_csv(file = friendsUrl)
## Parsed with column specification:
## cols(
## FriendsID = col_double(),
## FirstName = col_character(),
## LastName = col_character()
## )
dfRating <- read_csv(file = ratingURL)
## Parsed with column specification:
## cols(
## RatingID = col_double(),
## FriendsID = col_double(),
## Movie = col_character(),
## Rating = col_double()
## )
dfFriends
## # A tibble: 5 x 3
## FriendsID FirstName LastName
## <dbl> <chr> <chr>
## 1 1 John Doe
## 2 2 Jane Doe
## 3 3 Peter Pascal
## 4 4 Paul Panini
## 5 5 Mary Meyer
dfRating
## # A tibble: 30 x 4
## RatingID FriendsID Movie Rating
## <dbl> <dbl> <chr> <dbl>
## 1 1 1 Ford v Ferrari 3
## 2 2 1 The Irishman NA
## 3 3 1 Jojo Rabbit 5
## 4 4 1 Joker 5
## 5 5 1 Marriage Story 4
## 6 6 1 Parasite NA
## 7 7 2 Ford v Ferrari 3
## 8 8 2 The Irishman NA
## 9 9 2 Jojo Rabbit 5
## 10 10 2 Joker 5
## # ... with 20 more rows
To connect to the cloud based database, I used the RMySQL package.
library(RMySQL)
## Loading required package: DBI
con <- dbConnect(MySQL(),
user = 'root',
host = '34.122.92.218',
dbname = 'DATA607')
summary(con)
## <MySQLConnection:0,0>
## User: root
## Host: 34.122.92.218
## Dbname: DATA607
## Connection type: 34.122.92.218 via TCP/IP
##
## Results:
Dumping the dataframes into tables.
dbWriteTable(con, 'friends', dfFriends, overwrite = TRUE)
## [1] TRUE
dbWriteTable(con, 'rating', dfRating, overwrite = TRUE)
## [1] TRUE
Query the tables and joined on FriendsID.
res <- dbGetQuery(con, 'select concat(FirstName, " ", LastName) as Friend, Movie, Rating
from rating inner join friends on rating.FriendsID = friends.FriendsID;')
res
## Friend Movie Rating
## 1 John Doe Ford v Ferrari 3
## 2 John Doe The Irishman NA
## 3 John Doe Jojo Rabbit 5
## 4 John Doe Joker 5
## 5 John Doe Marriage Story 4
## 6 John Doe Parasite NA
## 7 Jane Doe Ford v Ferrari 3
## 8 Jane Doe The Irishman NA
## 9 Jane Doe Jojo Rabbit 5
## 10 Jane Doe Joker 5
## 11 Jane Doe Marriage Story 4
## 12 Jane Doe Parasite 4
## 13 Peter Pascal Ford v Ferrari 3
## 14 Peter Pascal The Irishman NA
## 15 Peter Pascal Jojo Rabbit 5
## 16 Peter Pascal Joker 5
## 17 Peter Pascal Marriage Story NA
## 18 Peter Pascal Parasite 2
## 19 Paul Panini Ford v Ferrari 5
## 20 Paul Panini The Irishman 3
## 21 Paul Panini Jojo Rabbit 4
## 22 Paul Panini Joker 4
## 23 Paul Panini Marriage Story 5
## 24 Paul Panini Parasite 4
## 25 Mary Meyer Ford v Ferrari NA
## 26 Mary Meyer The Irishman NA
## 27 Mary Meyer Jojo Rabbit 5
## 28 Mary Meyer Joker 5
## 29 Mary Meyer Marriage Story NA
## 30 Mary Meyer Parasite 3
Using the psych package, we are able to see summary statics of the Rating, grouped by Movie.
Finally, I used the ggplot package to visually see the ratings by Movie.
library(psych)
describeBy(res$Rating, group = res$Movie)
##
## Descriptive statistics by group
## group: Ford v Ferrari
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 4 3.5 1 3 3.5 0 3 5 2 0.75 -1.69 0.5
## ------------------------------------------------------------
## group: Jojo Rabbit
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 5 4.8 0.45 5 4.8 0 4 5 1 -1.07 -0.92 0.2
## ------------------------------------------------------------
## group: Joker
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 5 4.8 0.45 5 4.8 0 4 5 1 -1.07 -0.92 0.2
## ------------------------------------------------------------
## group: Marriage Story
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 3 4.33 0.58 4 4.33 0 4 5 1 0.38 -2.33 0.33
## ------------------------------------------------------------
## group: Parasite
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 4 3.25 0.96 3.5 3.25 0.74 2 4 2 -0.32 -2.08 0.48
## ------------------------------------------------------------
## group: The Irishman
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 1 3 NA 3 3 0 3 3 0 NA NA NA
library(ggplot2)
##
## Attaching package: 'ggplot2'
## The following objects are masked from 'package:psych':
##
## %+%, alpha
ggplot(res, aes(x = Rating)) + geom_histogram(fill = 'blue', binwidth = 1) +
facet_wrap(~Movie)
## Warning: Removed 8 rows containing non-finite values (stat_bin).
Based on the info, Jojo Rabbit and Joker have the highest average ratings, while The Irishman received the worst.