1. Build Table

In this assignment, we will recall tables from the SQL database into R. The data is stored in the database server: cunydata607sql.mysql.database.azure.com. The two tables created were Friends and Ratings. Friends have names, ages, occupancy, and favorite genres. Ratings contain the ratings recorded for each movie.

# install packages needed
library(DBI)
library(odbc)
library(RSQLite)
library(RMySQL)
## 
## Attaching package: 'RMySQL'
## The following object is masked from 'package:RSQLite':
## 
##     isIdCurrent
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

2. Store data in SQL database

The tables were made in EXCEL and uploaded through the “Table Data Import Wizard” Available on MySQL Workbench.

3. Transfer data from SQL database to R dataframe

In the code below, we will establish a connection with MySQL server with the stored data, and view insights from the tables.

con <- dbConnect(MySQL(), user='bishoy.sokkar60', password='<bishoy.sokkar607>', dbname='bishoy.sokkar60', host='cunydata607sql.mysql.database.azure.com')

To view the tables in the database:

dbListTables(con)
## [1] "friends" "ratings"

To learn more about the table variables in the data base:

dbListFields(con, "friends") 
## [1] "my_row_id"      "Friend"      "Age"            "Occupancy"     
## [5] "Favorite Genre"
dbListFields(con, "ratings")
## [1] "my_row_id"      "Friend"      "Wonka"          "The Beekeeper" 
## [5] "A Ghost Story"  "Anyone But You" "Origin"         "Past Lives"

To retrieve tables from MySQL server, we use the following function:

#to create tables in R with the data
Friends_data = dbGetQuery(con, "SELECT * FROM friends ")
Ratings_data = dbGetQuery(con, "SELECT * FROM ratings")
# to glimpse to confirm all information transfered correctly.
glimpse(Friends_data)
## Rows: 5
## Columns: 4
## $ `Friend`      <chr> "Alicia", "Lauren", "Jacky", "Vivian ", "Jacob"
## $ Age              <int> 29, 24, 32, 34, 27
## $ Occupancy        <chr> "Area Manager", "Area Manager", "Area Manager", "Oper…
## $ `Favorite Genre` <chr> "Romance", "Horror", "Comedy", "Fantasy", "Action"
glimpse(Ratings_data)
## Rows: 5
## Columns: 7
## $ `Friend`      <chr> "Alicia", "Lauren", "Jacky", "Vivian ", "Jacob"
## $ Wonka            <int> 2, 3, 1, 5, 1
## $ `The Beekeeper`  <int> 3, 4, 2, 4, 3
## $ `A Ghost Story`  <int> 1, 5, 5, 2, 2
## $ `Anyone But You` <int> 1, 5, 1, 4, 1
## $ Origin           <int> 4, 4, 3, 3, 2
## $ `Past Lives`     <int> 4, 2, 5, 1, 2

4. Missing data strategy

When uploading the tables into the SQL database, I selected NN (Not Null) which constrains the column from containing null values.