Overview

In this assignment, I have chosen six of the most popular Hollywood movies, and I have asked seven of my coworkers to rate each of the movies they have watched on a scale from 1 (being not good) to 5 (being great). From the responses of my survey, I have created a SQL database. Here, I will first load these data into an R dataframe. For our purposes, we will be comparing which movies has higher average ratings than others.

Step 1: Load the following libraries:

library(RMySQL)
## Loading required package: DBI
library(DBI)
library(ggplot2)

Step 2: Connect to MySQL server:

con = dbConnect (MySQL(), user = 'root', password = getOption('database_password'), dbname = 'movie_ratings', Host = 'localhost')

Step 3: List the tables in “movie_ratings” database:

dbListTables(con)
## [1] "Audience" "Movies"   "Ratings"

As you can see, 3 different tables have been created with information regarding audience, movies and ratings.

Step 4: Create 3 queries with the follwing SQL syntax within R:

Q1 <- "select * from audience;"
Q2 <- "select * from movies;"
Q3 <- "select * from ratings;"

Step 5: Save queries as dataframes and display their content:

audience <- dbGetQuery(con, Q1)
audience
##   Nameid Firstname Lastname
## 1      1     Mario        P
## 2      2    Teresa        C
## 3      3    Tyleah        C
## 4      4   Marisol        L
## 5      5    Leslie        B
## 6      6 Elizabeth        F
## 7      7    Denzil        R
movies <- dbGetQuery(con, Q2)
movies
##   Movieid                    Title           Genre
## 1       1            Jurassic Park Science Fiction
## 2       2                   Avatar Sicence Fiction
## 3       3                  Titanic           Drama
## 4       4 The Pursuit of Happyness           Drama
## 5       5 The Lion King (Oreginal)          Family
## 6       6          The Dark Knight          Action
ratings <- dbGetQuery(con, Q3)
ratings
##    Ratingid Nameid Movieid Rating
## 1         1      1       1      5
## 2         2      1       2      5
## 3         3      1       3      4
## 4         4      1       4      4
## 5         5      1       5      3
## 6         6      1       6      5
## 7         7      2       1      2
## 8         8      2       2     NA
## 9         9      2       3      4
## 10       10      2       4      5
## 11       11      2       5      5
## 12       12      2       6      3
## 13       13      3       1      4
## 14       14      3       2      5
## 15       15      3       3      5
## 16       16      3       4      5
## 17       17      3       5      3
## 18       18      3       6      1
## 19       19      4       1      4
## 20       20      4       2      5
## 21       21      4       3      4
## 22       22      4       4      5
## 23       23      4       5      5
## 24       24      4       6      4
## 25       25      5       1      3
## 26       26      5       2      4
## 27       27      5       3      4
## 28       28      5       4      5
## 29       29      5       5      5
## 30       30      5       6     NA
## 31       31      6       1      5
## 32       32      6       2      3
## 33       33      6       3      4
## 34       34      6       4     NA
## 35       35      6       5      5
## 36       36      6       6      5
## 37       37      7       1      5
## 38       38      7       2      5
## 39       39      7       3      4
## 40       40      7       4      5
## 41       41      7       5      4
## 42       42      7       6      5

Step 6: Merge and create a subset of ‘movies’ and ‘ratings’:

This way, we can have a clear picture of how each movie was rated:

movie_ratings <- merge(movies, ratings, by.x="Movieid")
movie_ratings2 <- subset(movie_ratings, select=c(Title, Genre, Rating))
movie_ratings2
##                       Title           Genre Rating
## 1             Jurassic Park Science Fiction      5
## 2             Jurassic Park Science Fiction      4
## 3             Jurassic Park Science Fiction      2
## 4             Jurassic Park Science Fiction      3
## 5             Jurassic Park Science Fiction      5
## 6             Jurassic Park Science Fiction      5
## 7             Jurassic Park Science Fiction      4
## 8                    Avatar Sicence Fiction      5
## 9                    Avatar Sicence Fiction     NA
## 10                   Avatar Sicence Fiction      5
## 11                   Avatar Sicence Fiction      3
## 12                   Avatar Sicence Fiction      5
## 13                   Avatar Sicence Fiction      5
## 14                   Avatar Sicence Fiction      4
## 15                  Titanic           Drama      4
## 16                  Titanic           Drama      4
## 17                  Titanic           Drama      5
## 18                  Titanic           Drama      4
## 19                  Titanic           Drama      4
## 20                  Titanic           Drama      4
## 21                  Titanic           Drama      4
## 22 The Pursuit of Happyness           Drama      4
## 23 The Pursuit of Happyness           Drama      5
## 24 The Pursuit of Happyness           Drama      5
## 25 The Pursuit of Happyness           Drama      5
## 26 The Pursuit of Happyness           Drama      5
## 27 The Pursuit of Happyness           Drama     NA
## 28 The Pursuit of Happyness           Drama      5
## 29 The Lion King (Oreginal)          Family      3
## 30 The Lion King (Oreginal)          Family      3
## 31 The Lion King (Oreginal)          Family      5
## 32 The Lion King (Oreginal)          Family      5
## 33 The Lion King (Oreginal)          Family      5
## 34 The Lion King (Oreginal)          Family      5
## 35 The Lion King (Oreginal)          Family      4
## 36          The Dark Knight          Action     NA
## 37          The Dark Knight          Action      5
## 38          The Dark Knight          Action      3
## 39          The Dark Knight          Action      1
## 40          The Dark Knight          Action      5
## 41          The Dark Knight          Action      4
## 42          The Dark Knight          Action      5

Step 7: Summmary of this new table:

summary(movie_ratings2)
##     Title              Genre               Rating     
##  Length:42          Length:42          Min.   :1.000  
##  Class :character   Class :character   1st Qu.:4.000  
##  Mode  :character   Mode  :character   Median :5.000  
##                                        Mean   :4.256  
##                                        3rd Qu.:5.000  
##                                        Max.   :5.000  
##                                        NA's   :3

In summary above we can see that the average rating for all the movies in this database is 4.256. This tells us that these movies were well received by the audience who watched them.

Step 7: The average movie rating per movie in table:

avg_movie_rating <- aggregate(x=movie_ratings2["Rating"], by = list(movie=movie_ratings2$Title), FUN = mean, , na.rm=TRUE)
avg_movie_rating
##                      movie   Rating
## 1                   Avatar 4.500000
## 2            Jurassic Park 4.000000
## 3          The Dark Knight 3.833333
## 4 The Lion King (Oreginal) 4.285714
## 5 The Pursuit of Happyness 4.833333
## 6                  Titanic 4.142857

From the table above, we can see that “The Pursuit of Happyness” has the highest average rating in this list, while “The Dark Knight” has the lowest average rating.

Step 8: Graphical representaion of average movie ratings per movie:

# Draw plot
theme_set(theme_bw())
ggplot(avg_movie_rating, aes(x= reorder(movie, -Rating), y=Rating)) + 
  geom_bar(stat="identity", width=.5, fill="tomato3") + 
  labs(title="Movie Ratings", 
       subtitle="Movie vs Avg. Rating") + 
  theme(axis.text.x = element_text(angle=65, vjust=0.6)) + 
  scale_x_discrete(name="Movies")

Conclusion

Based on my survey analysis, and the data visualization, The Pursuit of Happyness has the highest average rating of 4.8 compared to The Dark Knight which received a 3.8 out of 5. In the future, what I would be interested in exploring is the profit margins of each movie and compare them to my survey to analyze the relationship between them. To do so, I would research online and go through each movies budget amount vs. the profit as well as compared to my survey vs general popularity. ```