Assignment Description

The purpose of the assigment is to create a movie table on a cloud database and retrieve the data from R code. Create text file consisting of movie names and the ratings with the related attributes. Once created, load the data into a online cloud mySQL database and query the table in the R code and display the results

R packages used

  1. tidyr
  2. dplyr
  3. DT
  4. RMySQL
  5. DBI
suppressMessages(library(tidyr))
suppressMessages(library(dplyr))
suppressMessages(library(DT))
suppressMessages(library(RMySQL))
suppressMessages(library(DBI))

Raw movie file on .csv format

The file contains list of Netflix movies with different ratings. The file is updated on GitHub here

(https://raw.githubusercontent.com/DataScienceAR/silver-octo-funicular/master/Netflix%20Shows.csv)

theurl <- url('https://raw.githubusercontent.com/DataScienceAR/silver-octo-funicular/master/Netflix%20Shows.csv')
movies.df <- tbl_df(read.csv(theurl))
datatable(head(movies.df,3))

Creating Movie database and tables on Google Cloud

A new “Movie” database and tables are created in Google Cloud. The .csv files available on GitHub is used a source to feed the table creation. Two tables are created under “Movie” Database. Calling the SQL in R

con <- dbConnect(RMySQL::MySQL(),
                 dbname = "Movies",
                 host = "34.73.184.91",
                 port = 3306,
                 user = "Ruser",
                 password = "Ruser"
                 
              
                 
                 )

Querring the list of table in the “Movie” database

dbListTables(con)
## [1] "IMDB"         "NetflixShows"

Querring the “NetflixShows” table for top 5 rows

dbGetQuery(con,"select * from `NetflixShows` limit 5 ")
##            title rating
## 1          Bones  TV-14
## 2         Dexter  TV-MA
## 3       Leverage  TV-PG
## 4 Grey's Anatomy  TV-14
## 5      Lie to Me  TV-14
##                                                                     ratingLevel
## 1 Parents strongly cautioned. May be unsuitable for children ages 14 and under.
## 2         For mature audiences.  May not be suitable for children 17 and under.
## 3            Parental guidance suggested. May not be suitable for all children.
## 4 Parents strongly cautioned. May be unsuitable for children ages 14 and under.
## 5 Parents strongly cautioned. May be unsuitable for children ages 14 and under.
##   ratingDescription release year user rating score user rating size
## 1                90         2016                97               80
## 2               110         2013                95               80
## 3                70         2012                86               80
## 4                90         2016                98               80
## 5                90         2010                89               80

Get the list of observations that have user rating score greater than 97

datatable(dbGetQuery(con,"SELECT
    nt.title `Movie Name`,
    nt.rating `Rating`,
    nt.ratingLevel `Rating Level`,
    nt.ratingDescription,
    nt.`release year` `Release Year` 
FROM
    `NetflixShows` nt 
    WHERE `user rating score` >97"))

### Get the count of movies by year that are more than 1 in count.

dbGetQuery(con,"SELECT
    
        nt.`release year` `Release Year` ,
    COUNT(DISTINCT nt.title) `Movie Name`
FROM
    `NetflixShows` nt 
    WHERE `user rating score` >90


    GROUP BY    
    
    nt.`release year` 
    HAVING COUNT(DISTINCT nt.title) >1
    ORDER BY
    `release year` DESC")
##   Release Year Movie Name
## 1         2017          4
## 2         2016         31
## 3         2015          7
## 4         2014          3
## 5         2013          3
## 6         2012          2
## 7         2008          2
## 8         2005          3
## 9         2000          2