Database creation and Insertion of Data
I decided to create a postgres instance on my computer to store the movie survey data. This is the SQL database I am the most familiar with, so I chose it over MySQL. The method I used to create the table and insert data was through SQL code. No csv was used during this process. Below is the code used to initiate table and insert the survey data.
CREATE TABLE movie_reviews (
personId SERIAL PRIMARY KEY,
firstName varchar,
lastName varchar,
movie varchar,
rating int
);
INSERT INTO movie_reviews
(firstName, lastName, movie, rating)
VALUES
('Jordan','Glendrange','Goodfellas',1),
('Jordan','Glendrange','Princess Bride',1),
('Jordan','Glendrange','Akira',1),
('Jordan','Glendrange','Moonlight',1),
('Jordan','Glendrange','Killer Clowns From Outer Space',5),
('Evit','Markarian','Goodfellas',5),
('Evit','Markarian','Princess Bride',3),
('Evit','Markarian','Akira',3),
('Evit','Markarian','Moonlight',4),
('Evit','Markarian','Killer Clowns From Outer Space',1),
('Charlene','Nercess','Goodfellas',5),
('Charlene','Nercess','Princess Bride',5),
('Charlene','Nercess','Akira',1),
('Charlene','Nercess','Moonlight',5),
('Charlene','Nercess','Killer Clowns From Outer Space',2),
('Ray','Glendrange','Goodfellas',5),
('Ray','Glendrange','Princess Bride',3),
('Ray','Glendrange','Akira',3),
('Ray','Glendrange','Moonlight',4),
('Ray','Glendrange','Killer Clowns From Outer Space',1),
('Jenny','Glendrange','Goodfellas',4),
('Jenny','Glendrange','Princess Bride',5),
('Jenny','Glendrange','Akira',2),
('Jenny','Glendrange','Moonlight',5),
('Jenny','Glendrange','Killer Clowns From Outer Space',1)
Connecting to Database
DBI and RPostgreSQL are the libraries I am using in order to connect to the postgres database (hosted on my computer). Keep in mind this code is not reproducible, because 1. I am setting my working directory in the code and 2. The data is hosted on my personal computer.
library("DBI")
library("RPostgreSQL")
setwd("/Users/jordanglendrange/Documents/DATA607/Homework2")
source("loginCredentials.R")
To hide the database credentials I saved them in an R script file and brought them in as variables using the source command.
con <- dbConnect(RPostgres::Postgres(), user=username,dbname=dbname,password=password)
Querying Data
Next I want to query the entire table and store it in a dataframe. This will allow me to work in R on it.
movie_reviews <- dbGetQuery(con, 'select * from movie_reviews')
head(movie_reviews)
## personid firstname lastname movie rating
## 1 1 Jordan Glendrange Goodfellas 1
## 2 2 Jordan Glendrange Princess Bride 1
## 3 3 Jordan Glendrange Akira 1
## 4 4 Jordan Glendrange Moonlight 1
## 5 5 Jordan Glendrange Killer Clowns From Outer Space 5
## 6 6 Evit Markarian Goodfellas 5
Analysis
In the chart below we can see Goodfellas got the highest review. This is to be expected, it’s a great movie! Killer Clowns From Outer Space and Akira were tied for last. Most surveyed hadn’t seen Akira. Let’s dig into the Clown movie.
library("tidyverse")
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.3 ✓ purrr 0.3.4
## ✓ tibble 3.0.6 ✓ dplyr 1.0.4
## ✓ tidyr 1.1.2 ✓ stringr 1.4.0
## ✓ readr 1.4.0 ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library("ggplot2")
avg_movie <- movie_reviews %>%
group_by(movie) %>%
summarise(avg_score = mean(rating))
ggplot(data=avg_movie, aes(y=movie, x=avg_score)) + geom_col()
This is surprising, Jordan gave KC a 5 star rating while almost everyone else gave it a 1 star rating. I’m interested to see how Jordan rated the other movies.
killer_clown <- movie_reviews %>%
filter(movie == "Killer Clowns From Outer Space")
ggplot(data=killer_clown, aes(y=firstname, x=rating)) +geom_col()
Wow. He gave every movie a 1 star rating except Killer Clowns from OuterSpace. What’s wrong with this guy?
jordan_ratings <- movie_reviews %>%
filter(firstname == "Jordan")
ggplot(data=jordan_ratings, aes(y=movie, x=rating)) +geom_col()
Creating the Table in the postgres terminal
Table Creation.
Inserting data and querying it in the postgres terminal
Inserting Data and Querying Data