Introduction

I asked five friends to rank the Star Wars fanchise in order of their favorite movies. The goal for this project was to find which movie and sequence of movies was the overall favorite amongst my friends.

Libraries

I used the libraries listed below for this task.

# Open up required libraries
library(mice)
## Loading required package: lattice
## 
## Attaching package: 'mice'
## The following objects are masked from 'package:base':
## 
##     cbind, rbind
library(keyring)
library(DBI)

Code

After opening the required libraries, I connected to my MySQL database (using keyring to store my password) and pulled in all of the data from the rankings table.

# Connect to database
con <- dbConnect(RMySQL::MySQL(), dbname = 'star_wars', host = 'localhost', port = 3306, user = 'root', password = keyring::key_get("star_wars", "root"))

# Create the query for all data from the ranking table and get the results of the query
sql = "SELECT * FROM rankings"
ranking_table <- dbGetQuery(con, sql)

Next, I used the mice package to fill in any missing data from the rankings table. Then I normalized the rankings so that a 10 was a perfect score and a 1 was the worst score.

# Impute data to correct for missing data; use pmm method
imputed_Data <- mice(ranking_table, m = 5, maxit = 50, method = 'pmm', seed = 500)
ranking_table <- complete(imputed_Data, 2)

# Get averages
averages <- data.frame(1:9, tapply(ranking_table$ranking, ranking_table$movieid, mean))
names(averages) <- c("episode", "ranking")

# Normalize the rankings
averages$ranking <- (-1.125 * averages$ranking) + 11.125

Finally, I plotted the ranking of each movie.

# Create barplot of rankings
barplot(averages$ranking, names.arg = 1:9, main = "Star Wars Rankings",
        sub = "Perfect score = 10", xlab = "Episode", ylab = "Ranking",
        col = "aquamarine1", ylim = c(0,10))

Conclusion

From the barplot, it’s clear that the middle series in the franchise is the favorite (this is not a surprise). The more recent movies come in second, and the first trio falls into last place. I’d really enjoy expanding my survey to include more people to get a more accurate representation of the individual movies, though I’m fairly confident that even with this ridiculously limited sample size, the overall preference for trios is accurate.