Overview:
Article can be found at: https://raw.githubusercontent.com/fivethirtyeight/data/master/candy-power-ranking/candy-data.csv
This article analyzes the most popular halloween candy based on an internet survey. The survey presented two candies side by side and asked users to choose their favorite. The user was then shown two different candies and it continued. Using the results from this survey, the author, Walt Hickey, then determined what characteristics make up the most popular candy.
Get data from GitHub respository.
candy_raw <- read_csv("https://raw.githubusercontent.com/fivethirtyeight/data/master/candy-power-ranking/candy-data.csv")
## Parsed with column specification:
## cols(
## competitorname = col_character(),
## chocolate = col_double(),
## fruity = col_double(),
## caramel = col_double(),
## peanutyalmondy = col_double(),
## nougat = col_double(),
## crispedricewafer = col_double(),
## hard = col_double(),
## bar = col_double(),
## pluribus = col_double(),
## sugarpercent = col_double(),
## pricepercent = col_double(),
## winpercent = col_double()
## )
candy <- candy_raw
candy$winpercent <- candy$winpercent/100
colnames(candy)[colnames(candy) == 'competitorname'] <- 'Candy'
colnames(candy)[colnames(candy) == 'peanutyalmondy'] <- 'Nuts'
colnames(candy)[colnames(candy) == 'sugarpercent'] <- 'Sugar_Percentile'
colnames(candy)[colnames(candy) == 'pricepercent'] <- 'Price_Percentile'
colnames(candy)[colnames(candy) == 'pluribus'] <- 'Multiple_Candies_In_Container'
colnames(candy)[colnames(candy) == 'winpercent'] <- 'Perct_Won_Matchup'
candy$Candy <- sub("Õ","",candy$Candy)
Do you know someone with a nut allergy? Here are the most popular candy types that are free of all nuts.
nut_free_candy <- subset(candy,Nuts == 0)
nut_free_candy[order(-nut_free_candy$Perct_Won_Matchup),c("Candy","Perct_Won_Matchup")]
## # A tibble: 71 x 2
## Candy Perct_Won_Matchup
## <chr> <dbl>
## 1 Twix 0.816
## 2 Kit Kat 0.768
## 3 Milky Way 0.731
## 4 3 Musketeers 0.676
## 5 Starburst 0.670
## 6 100 Grand 0.670
## 7 M&Ms 0.666
## 8 Nestle Crunch 0.665
## 9 Rolo 0.657
## 10 Milky Way Simply Caramel 0.644
## # … with 61 more rows
Here are the most budget friendly candies that won atleast 50% of matchups:
popular_candy <- subset(candy,Perct_Won_Matchup >= 0.50)
cheapest_candy <- popular_candy[order(-popular_candy$Price_Percentile),c("Candy","Price_Percentile","Perct_Won_Matchup")]
cheapest_candy
## # A tibble: 39 x 3
## Candy Price_Percentile Perct_Won_Matchup
## <chr> <dbl> <dbl>
## 1 Hersheys Krackel 0.918 0.623
## 2 Hersheys Milk Chocolate 0.918 0.565
## 3 Hersheys Special Dark 0.918 0.592
## 4 Mr Good Bar 0.918 0.545
## 5 Twix 0.906 0.816
## 6 100 Grand 0.860 0.670
## 7 Milky Way Simply Caramel 0.860 0.644
## 8 Rolo 0.860 0.657
## 9 Almond Joy 0.767 0.503
## 10 Baby Ruth 0.767 0.569
## # … with 29 more rows
The author of this article did some wonderful analysis to determine which ingredients make for the most popular candy. He determined that chocolate is very popular while fruit is less popular. I would like to extend this analysis to replicate his work and investigate on my own why certain candies are popular based on ingredients. I’d also like to investigate if the most popular ingredients correlate to the most expensive candy.