Vladimir Nimchenko

Introduction:

I had selected 8 friends to rate 8 different movies. The movies I selected had three genre’s: comedy,action,and drama. I would like to find out the mean rating of each genre.I would also like to find out the mean rating for each friend. I will create appropriate tables in the database and connect to the database in R.

Connecting to the Database.

library(RMySQL)
## Loading required package: DBI
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
#prompt for credentials
username <- rstudioapi::askForPassword("Database username")
password <- rstudioapi::askForPassword("Database password")

#connecting to the mysql database
con <- dbConnect(MySQL(), user=username, password=password, dbname='sakila', host='localhost')

Displaying the data by running R command which calls the sql query

#Running SQl in R
moviedata <- dbGetQuery(con, "SELECT p.name as friend, m.name as movie_name, m.year,m.genre,m.duration,r.rating
                           FROM review r
                           JOIN person p ON r.person_id = p.id
                           JOIN movie m ON r.movie_id = m.id")

Converting the sql table into a data frame and displaying the data

#Converting sql table into a data frame
movie.rating <- as.data.frame(moviedata)
#Displaying the data
head(movie.rating)
##   friend                                                    movie_name year
## 1   Mike                                                         Shrek 2001
## 2   Mike                                              Anger Management 2003
## 3   Mike                                              Wedding Crashers 2005
## 4   Mike                                                Fast & Furious 2009
## 5   Mike                                               The Incredibles 2004
## 6   Mike Pirates of the Caribbean: The Curse of the Black Pearl (2003) 2003
##    genre    duration rating
## 1 Comedy  90 minutes      4
## 2 Comedy 106 minutes      3
## 3 Comedy 119 minutes      5
## 4 Action 107 minutes      4
## 5 Action 115 minutes      3
## 6 Action 143 minutes      5

Finding the mean rating for each movie genre

#Finding the mean rating for each genre
mean.movie <- aggregate(rating ~ genre, movie.rating, mean)
#Displaying the data
mean.movie
##    genre rating
## 1 Action 3.2500
## 2 Comedy 3.3750
## 3  Drama 3.9375

Finding the mean rating for each person

#Finding the mean rating for each genre
mean.movie <- aggregate(rating ~ friend, movie.rating, mean)
#Displaying the data
mean.movie
##     friend rating
## 1     Alex  3.250
## 2   Harvey  2.875
## 3   Justin  3.375
## 4    Kayla  3.875
## 5   Marina  3.875
## 6     Mike  4.000
## 7    Oscar  3.250
## 8 Samantha  3.250

Graphing the mean rating by each movie

#Finding the mean rating for each movie
mean.movie <- aggregate(rating ~ movie_name, movie.rating, mean)
#plotting a bar plot of the graph
library(ggplot2)
library("stringr")
gginit.movie <- ggplot(mean.movie,aes(x=movie_name,y=rating))
plottype.movie <- geom_bar(stat="Identity",color='blue',fill='purple',alpha=0.5)
plottheme.movie <- theme_bw()
gginit.movie + plottype.movie + plottheme.movie + xlab('Movie Name') + ylab('Mean Rating') + ylim(0,5) + scale_x_discrete(labels = function(x) str_wrap(x, width = 10)) 

Conclusion:

By looking at the rating by genre mean,we can see that Drama was rated the highest and Action movies were rated the lowest. Looking at the bar chart which shows us mean by a particular movie we can see that “The Pianist” is rated the highest.We also see that “Fast & Furious” as well as “The Incredibles” had the second and third lowest ratings. We can see the correlation of how the average rating of each movie affected the average rating of each genre. Because “The Pianist” was the highest rated movie out of the sample population Drama movies it made that genre be the highest. Same with the “Fast & Furious” and “The Incredibles” individual low ratings made the sample population of Action movies be the lowest.