Assignment Two - R and SQL

Part 1: Build Table

• Choose six recent popular movies.

• Ask at least five people that you know (friends, family, classmates, imaginary friends if necessary) to rate each of these movies that they have seen on a scale of 1 to 5.

Part 2: Store data in SQL database

• Take the results (observations) and store them in the class MySQL database:

Note: it is good practice to change your password. To do so, use this SQL command: SET PASSWORD = ‘’;

Part 3: Transfer data from SQL database to R dataframe

• Load the information from the SQL database into an R dataframe. Part 4: Missing data strategy

• Implement an approach to missing data

• Explain why you decided to take the chosen approach

Note: consider that later in the course you will revisit this information you have collected and will use it to implement a Recommender.

Overview

This assignment involves choosing six recent popular movies and asking at least five people to rate each movie on a scale of 1 to 5. The results should be stored in a SQL database of your choosing and then loaded into an R dataframe.

The assignment also requires demonstrating a reasonable approach for handling missing data.

Create the database

I have created a table in the server cunydata607sql.mysql.database.azure.com named movies_ratings.

library(DBI)
library(RSQLite)
## Warning: package 'RSQLite' was built under R version 4.3.2
library(RMySQL)
## Warning: package 'RMySQL' was built under R version 4.3.2
## 
## Attaching package: 'RMySQL'
## The following object is masked from 'package:RSQLite':
## 
##     isIdCurrent
mydb = dbConnect(RMySQL::MySQL(), user='mohammed.rahman76', password='mohammed.rahman76', dbname='mohammed.rahman76', port=3306, host='cunydata607sql.mysql.database.azure.com')

summary(mydb)
## <MySQLConnection:0,0>
##   User:   mohammed.rahman76 
##   Host:   cunydata607sql.mysql.database.azure.com 
##   Dbname: mohammed.rahman76 
##   Connection type: cunydata607sql.mysql.database.azure.com via TCP/IP 
## 
## Results:
#showing tables list
dbListTables(mydb)
## [1] "movies_ratings" "tb"

Fetching results from movies_ratings table

result = dbSendQuery(mydb, "select * from movies_ratings")

movies_rating <- fetch(result)

print(movies_rating)
##   ID   Name The Shawshank Redemption The Dark Knight Inception The Godfather
## 1     1 Ashfak                        5               5         4             4
## 2     2  Tarek                        5               4         4             5
## 3     3 Moynul                        5               4         5             5
## 4     4  Taher                        4               4         5             5
## 5     5  Hasan                        5               5         4             4
##   Pulp Fiction 12 Angry Men
## 1            5            4
## 2            5            4
## 3            4            4
## 4            4            4
## 5            5            4

Handling missing data

As all of them have watched all the movies and they have provided highly ratings unlikely, there wouldn’t be any missing data. Although, we can easily check if there is a missing value.

anyNA(movies_rating)
## [1] FALSE

As the result is false, we can say that there is no missing values.