Title: CUNY SPS MDS Data607_Assignment2"

Author: Charles Ugiagbe

Date: “9/3/2021”

Introduction

This assignment demonstrates ability to connect to a SQL database from R

Load the required libraries

library(RODBC)
library(tidyverse)
## Warning: package 'tidyverse' was built under R version 4.1.1
## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v ggplot2 3.3.5     v purrr   0.3.4
## v tibble  3.1.4     v dplyr   1.0.7
## v tidyr   1.1.3     v stringr 1.4.0
## v readr   2.0.1     v forcats 0.5.1
## Warning: package 'tibble' was built under R version 4.1.1
## Warning: package 'readr' was built under R version 4.1.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

Connect to the MySQL database using odbc connect and a dsn

conn <- odbcConnect("MySQL_dsn")

Read files

Read the reviewers information from MySQL database into R

sqlQuery(conn, "Select * FROM Review limit 15")
##    Reviewer_ID Reviewer_Name  Movies_title Rating
## 1            1       Osamede       Avenger      5
## 2            1       Osamede       Aquaman      4
## 3            1       Osamede Black Panther      5
## 4            1       Osamede      Creed II      4
## 5            1       Osamede   Incredibles      3
## 6            1       Osamede The Lion King     NA
## 7            2      Kingsley       Avenger      3
## 8            2      Kingsley       Aquaman      4
## 9            2      Kingsley Black Panther      5
## 10           2      Kingsley      Creed II      5
## 11           2      Kingsley   Incredibles      2
## 12           2      Kingsley The Lion King      5
## 13           3         Joyce       Avenger      5
## 14           3         Joyce       Aquaman      3
## 15           3         Joyce Black Panther      4

Read the Dataframe into R

query <- "Select * from Review"
df <- sqlQuery(conn, query)

check the head to have an idea of the data

head(df)
##   Reviewer_ID Reviewer_Name  Movies_title Rating
## 1           1       Osamede       Avenger      5
## 2           1       Osamede       Aquaman      4
## 3           1       Osamede Black Panther      5
## 4           1       Osamede      Creed II      4
## 5           1       Osamede   Incredibles      3
## 6           1       Osamede The Lion King     NA

Structure

Check the Structure of the data
str(df)
## 'data.frame':    30 obs. of  4 variables:
##  $ Reviewer_ID  : int  1 1 1 1 1 1 2 2 2 2 ...
##  $ Reviewer_Name: chr  "Osamede" "Osamede" "Osamede" "Osamede" ...
##  $ Movies_title : chr  "Avenger" "Aquaman" "Black Panther" "Creed II" ...
##  $ Rating       : int  5 4 5 4 3 NA 3 4 5 5 ...

Check the dataframe when review is “3”

sqlQuery(conn, "Select * FROM Review where rating = 3")
##   Reviewer_ID Reviewer_Name  Movies_title Rating
## 1           1       Osamede   Incredibles      3
## 2           2      Kingsley       Avenger      3
## 3           3         Joyce       Aquaman      3
## 4           3         Joyce The Lion King      3
## 5           4         Smith       Avenger      3
## 6           4         Smith   Incredibles      3
## 7           5        Loveth       Aquaman      3

Check if the dataframe contains any missing values

anyNA.data.frame(df)
## [1] TRUE
anyNA.data.frame(df$Rating)
## [1] TRUE