The purpose of the assigment is to create a movie table on a cloud database and retrieve the data from R code. Create text file consisting of movie names and the ratings with the related attributes. Once created, load the data into a online cloud mySQL database and query the table in the R code and display the results
suppressMessages(library(tidyr))
suppressMessages(library(dplyr))
suppressMessages(library(DT))
suppressMessages(library(RMySQL))
suppressMessages(library(DBI))
The file contains list of Netflix movies with different ratings. The file is updated on GitHub here
(https://raw.githubusercontent.com/DataScienceAR/silver-octo-funicular/master/Netflix%20Shows.csv)
theurl <- url('https://raw.githubusercontent.com/DataScienceAR/silver-octo-funicular/master/Netflix%20Shows.csv')
movies.df <- tbl_df(read.csv(theurl))
datatable(head(movies.df,3))
A new “Movie” database and tables are created in Google Cloud. The .csv files available on GitHub is used a source to feed the table creation. Two tables are created under “Movie” Database. Calling the SQL in R
con <- dbConnect(RMySQL::MySQL(),
dbname = "Movies",
host = "34.73.184.91",
port = 3306,
user = "Ruser",
password = "Ruser"
)
dbListTables(con)
## [1] "IMDB" "NetflixShows"
dbGetQuery(con,"select * from `NetflixShows` limit 5 ")
## title rating
## 1 Bones TV-14
## 2 Dexter TV-MA
## 3 Leverage TV-PG
## 4 Grey's Anatomy TV-14
## 5 Lie to Me TV-14
## ratingLevel
## 1 Parents strongly cautioned. May be unsuitable for children ages 14 and under.
## 2 For mature audiences. May not be suitable for children 17 and under.
## 3 Parental guidance suggested. May not be suitable for all children.
## 4 Parents strongly cautioned. May be unsuitable for children ages 14 and under.
## 5 Parents strongly cautioned. May be unsuitable for children ages 14 and under.
## ratingDescription release year user rating score user rating size
## 1 90 2016 97 80
## 2 110 2013 95 80
## 3 70 2012 86 80
## 4 90 2016 98 80
## 5 90 2010 89 80
datatable(dbGetQuery(con,"SELECT
nt.title `Movie Name`,
nt.rating `Rating`,
nt.ratingLevel `Rating Level`,
nt.ratingDescription,
nt.`release year` `Release Year`
FROM
`NetflixShows` nt
WHERE `user rating score` >97"))
### Get the count of movies by year that are more than 1 in count.
dbGetQuery(con,"SELECT
nt.`release year` `Release Year` ,
COUNT(DISTINCT nt.title) `Movie Name`
FROM
`NetflixShows` nt
WHERE `user rating score` >90
GROUP BY
nt.`release year`
HAVING COUNT(DISTINCT nt.title) >1
ORDER BY
`release year` DESC")
## Release Year Movie Name
## 1 2017 4
## 2 2016 31
## 3 2015 7
## 4 2014 3
## 5 2013 3
## 6 2012 2
## 7 2008 2
## 8 2005 3
## 9 2000 2