Introduction

I asked five friends to rank the Star Wars fanchise in order of their favorite movies. The goal for this project was to find which movie and sequence of movies was the overall favorite amongst my friends.

Libraries

I used the libraries listed below for this task.

# Open up required libraries
library(mice)

## Loading required package: lattice

## 
## Attaching package: 'mice'

## The following objects are masked from 'package:base':
## 
##     cbind, rbind

library(keyring)
library(DBI)

Code

After opening the required libraries, I connected to my MySQL database (using keyring to store my password) and pulled in all of the data from the rankings table.

# Connect to database
con <- dbConnect(RMySQL::MySQL(), dbname = 'star_wars', host = 'localhost', port = 3306, user = 'root', password = keyring::key_get("star_wars", "root"))

# Create the query for all data from the ranking table and get the results of the query
sql = "SELECT * FROM rankings"
ranking_table <- dbGetQuery(con, sql)

Next, I used the mice package to fill in any missing data from the rankings table. Then I normalized the rankings so that a 10 was a perfect score and a 1 was the worst score.

# Impute data to correct for missing data; use pmm method
imputed_Data <- mice(ranking_table, m = 5, maxit = 50, method = 'pmm', seed = 500)
ranking_table <- complete(imputed_Data, 2)

# Get averages
averages <- data.frame(1:9, tapply(ranking_table$ranking, ranking_table$movieid, mean))
names(averages) <- c("episode", "ranking")

# Normalize the rankings
averages$ranking <- (-1.125 * averages$ranking) + 11.125

Finally, I plotted the ranking of each movie.

# Create barplot of rankings
barplot(averages$ranking, names.arg = 1:9, main = "Star Wars Rankings",
        sub = "Perfect score = 10", xlab = "Episode", ylab = "Ranking",
        col = "aquamarine1", ylim = c(0,10))

SQL and R

David Moste

2/8/2020

Introduction

Libraries

Code

Conclusion