Assignment - SQL and R

Your deliverables should include your SQL scripts and your R Markdown code, posted to GitHub.

This is by design a very open ended assignment. A variety of reasonable approaches are acceptable. You can (and should) blank out your SQL password if your solution requires it; otherwise, full credit requires that your code is “reproducible,” with the assumption that I have the same database server and R software.

I choose a combination of English and Hindi movies that our family had watched recently

Movie Names: Bahubali2, Dunkrik, Judwaa2, Jumanji2, Mubarakan and Padmavaat

Setup the Table

I created a table ‘movies’ with three columns

1. mnane: movie name as varchar(100) not null

2. fname: friend name as varchar(100) not null

3. rating: ratings for the movie as an int

Please see the SQL script (movie.sql) for more details

  1. Load the Libraries

Load the required libraries

## Load the Libraries 
library(RMySQL)
## Loading required package: DBI
library(odbc)
library(ggplot2)
  1. Connect to the My SQL Database

Connect to mySQL Database

## connect using dbConnect
mydb = dbConnect(MySQL(), user='root', password='mysql57', host='localhost', dbname="tb")
  1. Query the Data base

Query the database

moviedf = fetch(dbSendQuery(mydb, "select * from movies;"))
  1. Review the results from the memory

Review the results

dim(moviedf)
## [1] 30  3
moviedf
##        mname    fname rating
## 1  Bahubali2  bhavika      5
## 2    Dunkrik  bhavika      2
## 3    Judwaa2  bhavika      4
## 4  Padmavaat  bhavika     NA
## 5   Jumanji2  bhavika      5
## 6  Mubarakan  bhavika      2
## 7  Bahubali2      raj      5
## 8    Dunkrik      raj      4
## 9    Judwaa2      raj      5
## 10 Padmavaat      raj      3
## 11  Jumanji2      raj      3
## 12 Mubarakan      raj      2
## 13 Bahubali2    aseem      5
## 14   Dunkrik    aseem      3
## 15   Judwaa2    aseem      5
## 16 Padmavaat    aseem      4
## 17  Jumanji2    aseem      3
## 18 Mubarakan    aseem      3
## 19 Bahubali2 dakshita      4
## 20   Dunkrik dakshita      2
## 21   Judwaa2 dakshita      5
## 22 Padmavaat dakshita      5
## 23  Jumanji2 dakshita      2
## 24 Mubarakan dakshita      4
## 25 Bahubali2     neil      5
## 26   Dunkrik     neil      2
## 27   Judwaa2     neil      4
## 28 Padmavaat     neil     NA
## 29  Jumanji2     neil      3
## 30 Mubarakan     neil      2
  1. Analysis of the data for each movie

Data Analysis for each movie

#Print the rating for each movie and find the means rating 


moviedf$rating[which(moviedf$mname=="Bahubali2")]
## [1] 5 5 5 4 5
m1 = mean(moviedf$rating[which(moviedf$mname=="Bahubali2")])
m1
## [1] 4.8
moviedf$rating[which(moviedf$mname=="Dunkrik")]
## [1] 2 4 3 2 2
m2 = mean(moviedf$rating[which(moviedf$mname=="Dunkrik")])
m2 
## [1] 2.6
moviedf$rating[which(moviedf$mname=="Judwaa2")]
## [1] 4 5 5 5 4
m3 = mean(moviedf$rating[which(moviedf$mname=="Judwaa2")])
m3
## [1] 4.6
moviedf$rating[which(moviedf$mname=="Jumanji2")]
## [1] 5 3 3 2 3
m4 = mean(moviedf$rating[which(moviedf$mname=="Jumanji2")])
m4
## [1] 3.2
moviedf$rating[which(moviedf$mname=="Mubarakan")]
## [1] 2 2 3 4 2
m5 = mean(moviedf$rating[which(moviedf$mname=="Mubarakan")], na.rm=TRUE)
m5
## [1] 2.6
moviedf$rating[which(moviedf$mname=="Padmavaat")]
## [1] NA  3  4  5 NA
m6 = mean(moviedf$rating[which(moviedf$mname=="Padmavaat")], na.rm = TRUE)
m6
## [1] 4

Lets find the Best and Worst movie… Drum rolls!!!

allrating <- c(m1, m2, m3, m4, m5, m6) 
names(allrating) <- names(table(moviedf$mname))

# And the highest rated movie was .... drum roll!!!
allrating[which(allrating==max(allrating))]
## Bahubali2 
##       4.8
# And the lowest rated movie was .... booooooo!!!
allrating[which(allrating==min(allrating))]
##   Dunkrik Mubarakan 
##       2.6       2.6
  1. Analysis using GGPLOT2 ### let draw a GGPlot for the ratings
# Draw a GGPlot to show how each movie was rated 
# I am using Box Plot as it can show the range of reviews and provide a mean for these ratings
ggplot(moviedf, aes(mname, rating))+
    geom_boxplot(na.rm = TRUE, color = 'blue', outlier.color = 'green')+
    theme_bw()+
    labs(title="Movie Review", x="Movie Name", y="Star Ratings")+
    theme(plot.title = element_text(hjust = 0.5))

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.