Part 1: Build Table
• Choose six recent popular movies.
• Ask at least five people that you know (friends, family, classmates, imaginary friends if necessary) to rate each of these movies that they have seen on a scale of 1 to 5.
Part 2: Store data in SQL database
• Take the results (observations) and store them in the class MySQL database:
Server name: cunydata607sql.mysql.database.azure.com
Username / password: will be given to you in an email
Note: it is good practice to change your password. To do so, use this
SQL command: SET PASSWORD = ‘
Part 3: Transfer data from SQL database to R dataframe
• Load the information from the SQL database into an R dataframe. Part 4: Missing data strategy
• Implement an approach to missing data
• Explain why you decided to take the chosen approach
Note: consider that later in the course you will revisit this information you have collected and will use it to implement a Recommender.
This assignment involves choosing six recent popular movies and asking at least five people to rate each movie on a scale of 1 to 5. The results should be stored in a SQL database of your choosing and then loaded into an R dataframe.
The assignment also requires demonstrating a reasonable approach for handling missing data.
Here are the ones I chose:
I have created a table in the server cunydata607sql.mysql.database.azure.com named movies_ratings.
library(DBI)
library(RSQLite)
## Warning: package 'RSQLite' was built under R version 4.3.2
library(RMySQL)
## Warning: package 'RMySQL' was built under R version 4.3.2
##
## Attaching package: 'RMySQL'
## The following object is masked from 'package:RSQLite':
##
## isIdCurrent
mydb = dbConnect(RMySQL::MySQL(), user='mohammed.rahman76', password='mohammed.rahman76', dbname='mohammed.rahman76', port=3306, host='cunydata607sql.mysql.database.azure.com')
summary(mydb)
## <MySQLConnection:0,0>
## User: mohammed.rahman76
## Host: cunydata607sql.mysql.database.azure.com
## Dbname: mohammed.rahman76
## Connection type: cunydata607sql.mysql.database.azure.com via TCP/IP
##
## Results:
#showing tables list
dbListTables(mydb)
## [1] "movies_ratings" "tb"
Fetching results from movies_ratings table
result = dbSendQuery(mydb, "select * from movies_ratings")
movies_rating <- fetch(result)
print(movies_rating)
## ID Name The Shawshank Redemption The Dark Knight Inception The Godfather
## 1 1 Ashfak 5 5 4 4
## 2 2 Tarek 5 4 4 5
## 3 3 Moynul 5 4 5 5
## 4 4 Taher 4 4 5 5
## 5 5 Hasan 5 5 4 4
## Pulp Fiction 12 Angry Men
## 1 5 4
## 2 5 4
## 3 4 4
## 4 4 4
## 5 5 4
As all of them have watched all the movies and they have provided highly ratings unlikely, there wouldn’t be any missing data. Although, we can easily check if there is a missing value.
anyNA(movies_rating)
## [1] FALSE
As the result is false, we can say that there is no missing values.