Storing Data in a Relational Database

The first step was to create a new database in MySQL.

CREATE DATABASE hw2_607;

Next, I created a table in the database called responses.

CREATE TABLE hw2_607.responses ( id VARCHAR(45) NOT NULL, jojo_rank INT NULL, joker_rank INT NULL, endgame_rank INT NULL, rocket_rank INT NULL, little_rank INT NULL, sonic_rank INT NULL, PRIMARY KEY (id));

Lastly, I used MySQL’s Table Import Wizard to import data from my .csv file into the responses table.

Loading Data Into R

Libraries

library(RMySQL)

## Loading required package: DBI

library(tidyverse)

## -- Attaching packages --------------------------------------- tidyverse 1.3.0 --

## v ggplot2 3.3.3     v purrr   0.3.4
## v tibble  3.0.6     v dplyr   1.0.3
## v tidyr   1.1.2     v stringr 1.4.0
## v readr   1.4.0     v forcats 0.5.1

## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

library(ggplot2)
library(kableExtra)

## 
## Attaching package: 'kableExtra'

## The following object is masked from 'package:dplyr':
## 
##     group_rows

library(wesanderson)
library(keyring)

Importing Data

key_set('user', 'test')

## Please enter password in TK window (Alt+Tab)

db = dbConnect(MySQL(), user='test', password=key_get('user', 'test'), dbname='hw2_607', host='localhost')
movie_data <- dbGetQuery(db, "SELECT * FROM responses")
colnames(movie_data)[1] <- gsub('^...','',colnames(movie_data)[1])

movie_data %>%
  kbl(caption = "Original Movie Ranking Data Exported from SQL") %>%
  kable_styling(bootstrap_options = "striped")

Original Movie Ranking Data Exported from SQL
id	jojo_rank	joker_rank	endgame_rank	rocket_rank	little_rank	sonic_rank
A	4	3	3	4	4	2
B	0	3	4	4	0	0
C	4	3	5	3	4	1
D	4	3	2	4	5	2
E	5	4	3	3	4	4
F	5	4	5	5	0	1

Each participant has been identified with an id instead of with their name. However, if we did need to retrieve the participant’s names, those are stored in a separate table in the database. For the purpose of this assignment, the names have been fictionalized.

names <- dbGetQuery(db, "SELECT * FROM names")
colnames(names)[1] <- gsub('^...','',colnames(names)[1])
names %>%
  kbl(caption = "Names and IDs") %>%
  kable_styling(bootstrap_options = "striped")

Names and IDs
id	name
A	Adelbert Anderson
B	Boris Brown
C	Cordelia Cohen
D	Dorcas Donovan
E	Ernest Englewort
F	Felicity Forrester

Handling Missing Data

In this scenario, a score of 0 means the participant did not see the movie. Let’s make sure that we don’t include that when we compute rankings.

movie_data[movie_data == 0] <- NA
movie_data %>%
  kbl(caption="Movie Rankings With Missing Values Handled") %>%
  kable_styling(bootstrap_options = "striped")

Movie Rankings With Missing Values Handled
id	jojo_rank	joker_rank	endgame_rank	rocket_rank	little_rank	sonic_rank
A	4	3	3	4	4	2
B	NA	3	4	4	NA	NA
C	4	3	5	3	4	1
D	4	3	2	4	5	2
E	5	4	3	3	4	4
F	5	4	5	5	NA	1

Analysis

To further analyze the data, I created a bar graph with the average movie rankings using ggplot and customized the colors using the wesanderson Zissou1 color palette. Missing rankings were not included in the averages.

jojo_mean <- mean(movie_data[,'jojo_rank'], na.rm=TRUE)
joker_mean <- mean(movie_data[,'joker_rank'], na.rm=TRUE)
endgame_mean <- mean(movie_data[,'endgame_rank'], na.rm=TRUE)
rocket_mean <- mean(movie_data[,'rocket_rank'], na.rm=TRUE)
little_mean <- mean(movie_data[,'little_rank'], na.rm=TRUE)
sonic_mean <- mean(movie_data[,'sonic_rank'], na.rm=TRUE)

mean_df <- data.frame("Movie_Name"= c('Jojo Rabbit', 'The Joker', 'The Avengers: End Game', 'Rocketman', 'Little Women', 'Sonic the Hedgehog'), 'Average_Ranking'= c(jojo_mean, joker_mean, endgame_mean, rocket_mean, little_mean, sonic_mean))

ggplot(mean_df, aes(x=Average_Ranking, y=Movie_Name, fill=Movie_Name)) + geom_bar(stat="identity") + theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank()) + theme(axis.line = element_line(colour = "black")) + scale_fill_manual(values=wes_palette( name="Zissou1", 6, type="continuous")) +labs(title= "Average Movie Ranking Bar Graph")

Reproducibility

Github links to the .csv files can be found here and here

Recommendations

Based on the original data set, there are a few clear recommendations to make . Participant A’s top ranked movies were Jojo Rabbit, Rocket Man, and Little Women, indicating a preference for historical films. Based on this, I would recommend they watch Bohemian Rhapsody, which is a historical film about the band Queen. Participant C only ranked Avengers: End Game as a 5, which may denote a preference for films set in the Marvel Cinematic Universe (MCU). Based on this, they may enjoy other movies set in the MCU, such as Avengers: Infinity War or Doctor Strange.

607 HW 2

Carlisle Ferguson

2/10/2021

Introduction

Data Collection