Week Two - R and SQL Part 1: Build Table • Choose six recent popular
movies. • Ask at least five people that you know (friends, family,
classmates, imaginary friends if necessary) to rate each of these movies
that they have seen on a scale of 1 to 5. Part 2: Store data in SQL
database • Take the results (observations) and store them in the class
MySQL database: - Server name: cunydata607sql.mysql.database.azure.com -
Username / password: will be given to you in an email Note: it is good
practice to change your password. To do so, use this SQL command: SET
PASSWORD = ‘
Installed MYSQL and build popular movie ratings table. Populated table with sample data
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(RMySQL)
## Loading required package: DBI
mydb <- dbConnect(RMySQL::MySQL(),
dbname= 'erick.hadi87',
host= 'cunydata607sql.mysql.database.azure.com',
port=3306,
user='erick.hadi87',
password='erick.hadi87')
dbListTables(mydb)
## [1] "popular_movie_ratings"
ratings <- dbSendQuery(mydb, "select * from popular_movie_ratings")
ratings_results <- fetch(ratings)
print(ratings_results)
## rater_ID rater_name Barbie John_Wick Mission_Impossible Oppenheimer Spiderman
## 1 1 Kevin 5 3 4 2 1
## 2 2 Adam 0 1 2 3 4
## 3 3 Tom 3 4 2 1 5
## 4 4 Erick 0 2 3 0 5
## 5 5 John 1 3 2 5 4
## Minecraft
## 1 3
## 2 5
## 3 5
## 4 5
## 5 2
The approach taken to handle raters with missing data was to input a 0 for the missing ratings. This approach was selected so that there is a value and not a blank cell in the data frame. If any calculation were to be made then we can omit the amount of ratings with 0’s and subtract them from the total. This would however entail that the survey now includes an option for 0 if the rater has not seen a select movie. This would also entail that any calculation made on the dataset must consider that a value of 0 must be omitted.