If you remember being a kid on Halloween, you’ll remember that fun-sized candy was a staple at most houses. But, what makes for the best candy? FiveThirtyEight’s The Ultimate Halloween Candy Power Ranking seeks to determine the best candy as well as determine the driving factors behind each fun-sized bar’s popularity.
library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.0 --
## v ggplot2 3.3.3 v purrr 0.3.4
## v tibble 3.0.6 v dplyr 1.0.3
## v tidyr 1.1.2 v stringr 1.4.0
## v readr 1.4.0 v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
candy_data <- read.csv("https://raw.githubusercontent.com/fivethirtyeight/data/master/candy-power-ranking/candy-data.csv")
head(candy_data)
## competitorname chocolate fruity caramel peanutyalmondy nougat
## 1 100 Grand 1 0 1 0 0
## 2 3 Musketeers 1 0 0 0 1
## 3 One dime 0 0 0 0 0
## 4 One quarter 0 0 0 0 0
## 5 Air Heads 0 1 0 0 0
## 6 Almond Joy 1 0 0 1 0
## crispedricewafer hard bar pluribus sugarpercent pricepercent winpercent
## 1 1 0 1 0 0.732 0.860 66.97173
## 2 0 0 1 0 0.604 0.511 67.60294
## 3 0 0 0 0 0.011 0.116 32.26109
## 4 0 0 0 0 0.011 0.511 46.11650
## 5 0 0 0 0 0.906 0.511 52.34146
## 6 0 0 1 0 0.465 0.767 50.34755
My favorite type of candy is chocolate. Let’s subset the data to only look at the chocolate candy. Also, let’s rename the “competitorname” column to “candyname” to make it extra clear that we’re talking about candy. The data set doesn’t have any abbreviations so there’s no need to replace any data in the table.
chocolate_only <- subset(candy_data, chocolate==1)
head(chocolate_only)
## competitorname chocolate fruity caramel peanutyalmondy nougat
## 1 100 Grand 1 0 1 0 0
## 2 3 Musketeers 1 0 0 0 1
## 6 Almond Joy 1 0 0 1 0
## 7 Baby Ruth 1 0 1 1 1
## 11 Charleston Chew 1 0 0 0 1
## 23 HersheyÕs Kisses 1 0 0 0 0
## crispedricewafer hard bar pluribus sugarpercent pricepercent winpercent
## 1 1 0 1 0 0.732 0.860 66.97173
## 2 0 0 1 0 0.604 0.511 67.60294
## 6 0 0 1 0 0.465 0.767 50.34755
## 7 0 0 1 0 0.604 0.767 56.91455
## 11 0 0 1 0 0.604 0.511 38.97504
## 23 0 0 0 1 0.127 0.093 55.37545
names(chocolate_only)[names(chocolate_only) == "competitorname"] <- "candyname"
head(chocolate_only)
## candyname chocolate fruity caramel peanutyalmondy nougat
## 1 100 Grand 1 0 1 0 0
## 2 3 Musketeers 1 0 0 0 1
## 6 Almond Joy 1 0 0 1 0
## 7 Baby Ruth 1 0 1 1 1
## 11 Charleston Chew 1 0 0 0 1
## 23 HersheyÕs Kisses 1 0 0 0 0
## crispedricewafer hard bar pluribus sugarpercent pricepercent winpercent
## 1 1 0 1 0 0.732 0.860 66.97173
## 2 0 0 1 0 0.604 0.511 67.60294
## 6 0 0 1 0 0.465 0.767 50.34755
## 7 0 0 1 0 0.604 0.767 56.91455
## 11 0 0 1 0 0.604 0.511 38.97504
## 23 0 0 0 1 0.127 0.093 55.37545
I really like chocolate, but how does everyone else feel about it? I’d like to compare the top ten performances for the overall population of Halloween candies with my subset of chocolate only candies by making two bar graphs.
winners_all <- candy_data %>% slice_max(winpercent, n=10)
ggplot(data=winners_all, aes(x=winpercent, y=competitorname, fill=winpercent)) + geom_bar(stat="identity")
winners_chocolate <- chocolate_only %>% slice_max(winpercent, n=10)
ggplot(data=winners_chocolate, aes(x=winpercent, y=candyname, fill=winpercent)) + geom_bar(stat="identity")
# Findings and Reccomendations The top ten winners overall are also the top ten winners in the chocolate-only division. Further analysis should be conducted to determine whether the same factors influence general winners as chocolate-only winners. For example, would having nougat carry more weight in the general population, or would it carry more weight in the chocolate only population? To accomplish this, a method such as partition analysis could be performed to figure out which variables are the driving forces behind the winners of each population.