I had selected 8 friends to rate 8 different movies. The movies I selected had three genre’s: comedy,action,and drama. I would like to find out the mean rating of each genre.I would also like to find out the mean rating for each friend. I will create appropriate tables in the database and connect to the database in R.
library(RMySQL)
## Loading required package: DBI
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
#prompt for credentials
username <- rstudioapi::askForPassword("Database username")
password <- rstudioapi::askForPassword("Database password")
#connecting to the mysql database
con <- dbConnect(MySQL(), user=username, password=password, dbname='sakila', host='localhost')
#Running SQl in R
moviedata <- dbGetQuery(con, "SELECT p.name as friend, m.name as movie_name, m.year,m.genre,m.duration,r.rating
FROM review r
JOIN person p ON r.person_id = p.id
JOIN movie m ON r.movie_id = m.id")
#Converting sql table into a data frame
movie.rating <- as.data.frame(moviedata)
#Displaying the data
head(movie.rating)
## friend movie_name year
## 1 Mike Shrek 2001
## 2 Mike Anger Management 2003
## 3 Mike Wedding Crashers 2005
## 4 Mike Fast & Furious 2009
## 5 Mike The Incredibles 2004
## 6 Mike Pirates of the Caribbean: The Curse of the Black Pearl (2003) 2003
## genre duration rating
## 1 Comedy 90 minutes 4
## 2 Comedy 106 minutes 3
## 3 Comedy 119 minutes 5
## 4 Action 107 minutes 4
## 5 Action 115 minutes 3
## 6 Action 143 minutes 5
#Finding the mean rating for each genre
mean.movie <- aggregate(rating ~ genre, movie.rating, mean)
#Displaying the data
mean.movie
## genre rating
## 1 Action 3.2500
## 2 Comedy 3.3750
## 3 Drama 3.9375
#Finding the mean rating for each genre
mean.movie <- aggregate(rating ~ friend, movie.rating, mean)
#Displaying the data
mean.movie
## friend rating
## 1 Alex 3.250
## 2 Harvey 2.875
## 3 Justin 3.375
## 4 Kayla 3.875
## 5 Marina 3.875
## 6 Mike 4.000
## 7 Oscar 3.250
## 8 Samantha 3.250
#Finding the mean rating for each movie
mean.movie <- aggregate(rating ~ movie_name, movie.rating, mean)
#plotting a bar plot of the graph
library(ggplot2)
library("stringr")
gginit.movie <- ggplot(mean.movie,aes(x=movie_name,y=rating))
plottype.movie <- geom_bar(stat="Identity",color='blue',fill='purple',alpha=0.5)
plottheme.movie <- theme_bw()
gginit.movie + plottype.movie + plottheme.movie + xlab('Movie Name') + ylab('Mean Rating') + ylim(0,5) + scale_x_discrete(labels = function(x) str_wrap(x, width = 10))
By looking at the rating by genre mean,we can see that Drama was rated the highest and Action movies were rated the lowest. Looking at the bar chart which shows us mean by a particular movie we can see that “The Pianist” is rated the highest.We also see that “Fast & Furious” as well as “The Incredibles” had the second and third lowest ratings. We can see the correlation of how the average rating of each movie affected the average rating of each genre. Because “The Pianist” was the highest rated movie out of the sample population Drama movies it made that genre be the highest. Same with the “Fast & Furious” and “The Incredibles” individual low ratings made the sample population of Action movies be the lowest.