Assignment – R and SQL

Week Two - R and SQL Part 1: Build Table • Choose six recent popular movies. • Ask at least five people that you know (friends, family, classmates, imaginary friends if necessary) to rate each of these movies that they have seen on a scale of 1 to 5. Part 2: Store data in SQL database • Take the results (observations) and store them in the class MySQL database: - Server name: cunydata607sql.mysql.database.azure.com - Username / password: will be given to you in an email Note: it is good practice to change your password. To do so, use this SQL command: SET PASSWORD = ‘’; Part 3: Transfer data from SQL database to R dataframe • Load the information from the SQL database into an R dataframe. Part 4: Missing data strategy • Implement an approach to missing data • Explain why you decided to take the chosen approach Note: consider that later in the course you will revisit this information you have collected and will use it to implement a Recommender.

Build Table and Store Data in SQL Database

Installed MYSQL and build popular movie ratings table. Populated table with sample data

Transfer data from SQL database to R dataframe

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(RMySQL)
## Loading required package: DBI
mydb <- dbConnect(RMySQL::MySQL(),
                            dbname= 'erick.hadi87',
                            host= 'cunydata607sql.mysql.database.azure.com',
                            port=3306,
                            user='erick.hadi87',
                            password='erick.hadi87')
                            
dbListTables(mydb)
## [1] "popular_movie_ratings"
ratings <- dbSendQuery(mydb, "select * from popular_movie_ratings")
ratings_results <- fetch(ratings)
print(ratings_results)
##   rater_ID rater_name Barbie John_Wick Mission_Impossible Oppenheimer Spiderman
## 1        1      Kevin      5         3                  4           2         1
## 2        2       Adam      0         1                  2           3         4
## 3        3        Tom      3         4                  2           1         5
## 4        4      Erick      0         2                  3           0         5
## 5        5       John      1         3                  2           5         4
##   Minecraft
## 1         3
## 2         5
## 3         5
## 4         5
## 5         2

Missing data strategy

The approach taken to handle raters with missing data was to input a 0 for the missing ratings. This approach was selected so that there is a value and not a blank cell in the data frame. If any calculation were to be made then we can omit the amount of ratings with 0’s and subtract them from the total. This would however entail that the survey now includes an option for 0 if the rater has not seen a select movie. This would also entail that any calculation made on the dataset must consider that a value of 0 must be omitted.