Overview

In this assignment, I have chosen six of the most popular Hollywood movies, and I have asked seven of my coworkers to rate each of the movies they have watched on a scale from 1 (being not good) to 5 (being great). From the responses of my survey, I have created a SQL database. Here, I will first load these data into an R dataframe. For our purposes, we will be comparing which movies has higher average ratings than others.

Step 1: Load the following libraries:

library(RMySQL)

## Loading required package: DBI

library(DBI)
library(ggplot2)

Step 2: Connect to MySQL server:

con = dbConnect (MySQL(), user = 'root', password = getOption('database_password'), dbname = 'movie_ratings', Host = 'localhost')

Step 3: List the tables in “movie_ratings” database:

dbListTables(con)

## [1] "Audience" "Movies"   "Ratings"

As you can see, 3 different tables have been created with information regarding audience, movies and ratings.

Step 4: Create 3 queries with the follwing SQL syntax within R:

Q1 <- "select * from audience;"
Q2 <- "select * from movies;"
Q3 <- "select * from ratings;"

Step 5: Save queries as dataframes and display their content:

audience <- dbGetQuery(con, Q1)
audience

##   Nameid Firstname Lastname
## 1      1     Mario        P
## 2      2    Teresa        C
## 3      3    Tyleah        C
## 4      4   Marisol        L
## 5      5    Leslie        B
## 6      6 Elizabeth        F
## 7      7    Denzil        R

movies <- dbGetQuery(con, Q2)
movies

##   Movieid                    Title           Genre
## 1       1            Jurassic Park Science Fiction
## 2       2                   Avatar Sicence Fiction
## 3       3                  Titanic           Drama
## 4       4 The Pursuit of Happyness           Drama
## 5       5 The Lion King (Oreginal)          Family
## 6       6          The Dark Knight          Action

ratings <- dbGetQuery(con, Q3)
ratings

##    Ratingid Nameid Movieid Rating
## 1         1      1       1      5
## 2         2      1       2      5
## 3         3      1       3      4
## 4         4      1       4      4
## 5         5      1       5      3
## 6         6      1       6      5
## 7         7      2       1      2
## 8         8      2       2     NA
## 9         9      2       3      4
## 10       10      2       4      5
## 11       11      2       5      5
## 12       12      2       6      3
## 13       13      3       1      4
## 14       14      3       2      5
## 15       15      3       3      5
## 16       16      3       4      5
## 17       17      3       5      3
## 18       18      3       6      1
## 19       19      4       1      4
## 20       20      4       2      5
## 21       21      4       3      4
## 22       22      4       4      5
## 23       23      4       5      5
## 24       24      4       6      4
## 25       25      5       1      3
## 26       26      5       2      4
## 27       27      5       3      4
## 28       28      5       4      5
## 29       29      5       5      5
## 30       30      5       6     NA
## 31       31      6       1      5
## 32       32      6       2      3
## 33       33      6       3      4
## 34       34      6       4     NA
## 35       35      6       5      5
## 36       36      6       6      5
## 37       37      7       1      5
## 38       38      7       2      5
## 39       39      7       3      4
## 40       40      7       4      5
## 41       41      7       5      4
## 42       42      7       6      5

Step 6: Merge and create a subset of ‘movies’ and ‘ratings’:

This way, we can have a clear picture of how each movie was rated:

movie_ratings <- merge(movies, ratings, by.x="Movieid")
movie_ratings2 <- subset(movie_ratings, select=c(Title, Genre, Rating))
movie_ratings2

##                       Title           Genre Rating
## 1             Jurassic Park Science Fiction      5
## 2             Jurassic Park Science Fiction      4
## 3             Jurassic Park Science Fiction      2
## 4             Jurassic Park Science Fiction      3
## 5             Jurassic Park Science Fiction      5
## 6             Jurassic Park Science Fiction      5
## 7             Jurassic Park Science Fiction      4
## 8                    Avatar Sicence Fiction      5
## 9                    Avatar Sicence Fiction     NA
## 10                   Avatar Sicence Fiction      5
## 11                   Avatar Sicence Fiction      3
## 12                   Avatar Sicence Fiction      5
## 13                   Avatar Sicence Fiction      5
## 14                   Avatar Sicence Fiction      4
## 15                  Titanic           Drama      4
## 16                  Titanic           Drama      4
## 17                  Titanic           Drama      5
## 18                  Titanic           Drama      4
## 19                  Titanic           Drama      4
## 20                  Titanic           Drama      4
## 21                  Titanic           Drama      4
## 22 The Pursuit of Happyness           Drama      4
## 23 The Pursuit of Happyness           Drama      5
## 24 The Pursuit of Happyness           Drama      5
## 25 The Pursuit of Happyness           Drama      5
## 26 The Pursuit of Happyness           Drama      5
## 27 The Pursuit of Happyness           Drama     NA
## 28 The Pursuit of Happyness           Drama      5
## 29 The Lion King (Oreginal)          Family      3
## 30 The Lion King (Oreginal)          Family      3
## 31 The Lion King (Oreginal)          Family      5
## 32 The Lion King (Oreginal)          Family      5
## 33 The Lion King (Oreginal)          Family      5
## 34 The Lion King (Oreginal)          Family      5
## 35 The Lion King (Oreginal)          Family      4
## 36          The Dark Knight          Action     NA
## 37          The Dark Knight          Action      5
## 38          The Dark Knight          Action      3
## 39          The Dark Knight          Action      1
## 40          The Dark Knight          Action      5
## 41          The Dark Knight          Action      4
## 42          The Dark Knight          Action      5

Step 7: Summmary of this new table:

summary(movie_ratings2)

##     Title              Genre               Rating     
##  Length:42          Length:42          Min.   :1.000  
##  Class :character   Class :character   1st Qu.:4.000  
##  Mode  :character   Mode  :character   Median :5.000  
##                                        Mean   :4.256  
##                                        3rd Qu.:5.000  
##                                        Max.   :5.000  
##                                        NA's   :3

In summary above we can see that the average rating for all the movies in this database is 4.256. This tells us that these movies were well received by the audience who watched them.

DATA 607: Assignment 2

Farhana Akther

2023-02-03