Walt Hickey from FiveThirtyEight collected data from people within the United States to see if a risk-averse person would be more likely to order a steak well done. They found no evidence a person that was a higher risk taker would prefer their steaks rare.
library(pander)
library(tidyverse)
library(pollster)
The original data set columns included the full questions asked. They will be renamed with keywords for ease:
Keyword | Original Column |
---|---|
id | RespondentID |
lottery_pick | Consider the following hypothetical situations:   In Lottery A, you have a 50% chance of success, with a payout of $100.   In Lottery B, you have a 90% chance of success, with a payout of $20. Assuming you have $10 to bet, would you play Lottery A or Lottery B? |
smoker | Do you ever smoke cigarettes? |
alcohol | Do you ever drink alcohol? |
gamble | Do you ever gamble? |
skydiving | Have you ever been skydiving? |
speeding | Do you ever drive above the speed limit? |
cheated | Have you ever cheated on your significant other? |
eat_steak | Do you eat steak? |
doneness | How do you like your steak prepared? |
gender | Gender |
age_group | Age |
income_level | Household Income |
education_level | Education |
us_region | Location (Census Region) |
Import data from GitHub and rename columns
new_colnames <- c('id', 'lottery_pick', 'smoker', 'alcohol', 'gamble', 'skydiving', 'speeding', 'cheated', 'eat_steak', 'doneness', 'gender', 'age_group', 'income_level', 'education_level', 'us_region')
steak_survey <- read.csv('https://raw.githubusercontent.com/fivethirtyeight/data/master/steak-survey/steak-risk-survey.csv', col.names = new_colnames)
View first few rows within the data frame.
knitr::kable(head(steak_survey, 5))
id | lottery_pick | smoker | alcohol | gamble | skydiving | speeding | cheated | eat_steak | doneness | gender | age_group | income_level | education_level | us_region |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
NA | Response | Response | Response | Response | Response | Response | Response | Response | Response | Response | Response | Response | Response | Response |
3237565956 | Lottery B | |||||||||||||
3234982343 | Lottery A | No | Yes | No | No | No | No | Yes | Medium rare | Male | > 60 | $50,000 - $99,999 | Some college or Associate degree | East North Central |
3234973379 | Lottery A | No | Yes | Yes | No | Yes | Yes | Yes | Rare | Male | > 60 | $150,000+ | Graduate degree | South Atlantic |
3234972383 | Lottery B | Yes | Yes | Yes | No | Yes | Yes | Yes | Medium | Male | > 60 | $50,000 - $99,999 | Bachelor degree | New England |
Remove data that is explicitly NA from the id column and people who do not eat steak. Subset data to focus on risks and their region.
steak_survey <- steak_survey %>%
drop_na(id)
subset_cols <- c('smoker', 'alcohol', 'gamble', 'skydiving', 'speeding', 'cheated', 'doneness', 'us_region')
risks <- steak_survey %>%
select(all_of(subset_cols))
non_null_risks <- risks %>%
filter(across(all_of(subset_cols), ~ .x != ''))
knitr::kable(head(non_null_risks, 5))
smoker | alcohol | gamble | skydiving | speeding | cheated | doneness | us_region |
---|---|---|---|---|---|---|---|
No | Yes | No | No | No | No | Medium rare | East North Central |
No | Yes | Yes | No | Yes | Yes | Rare | South Atlantic |
Yes | Yes | Yes | No | Yes | Yes | Medium | New England |
No | Yes | No | No | Yes | Yes | Medium | Middle Atlantic |
No | No | No | No | Yes | No | Medium rare | West South Central |
We can see that most individuals prefer Medium and Medium Rare cooking temperatures for their steaks.
ggplot(data = non_null_risks) +
geom_bar(aes(doneness))
Calculate total risks taken by each individual.
non_null_risks[non_null_risks=='Yes'] <- 1
non_null_risks[non_null_risks=='No'] <- 0
change_type <- c('smoker', 'alcohol', 'gamble', 'skydiving', 'speeding', 'cheated')
non_null_risks <- non_null_risks %>%
mutate_at(change_type, as.integer)
non_null_risks$doneness <- factor(non_null_risks$doneness, levels = c('Rare', 'Medium rare', 'Medium', 'Medium Well', 'Well'))
non_null_risks <- non_null_risks %>%
mutate(total_risks = smoker + alcohol + gamble + skydiving + speeding + cheated)
Using a crosstab view, we can determine how people like their steak within each region
knitr::kable(crosstab(non_null_risks, x = us_region, y = doneness, weight = total_risks))
us_region | Rare | Medium rare | Medium | Medium Well | Well | n |
---|---|---|---|---|---|---|
East North Central | 0.000000 | 40.36145 | 33.73494 | 15.662651 | 10.240964 | 166 |
East South Central | 0.000000 | 54.34783 | 23.91304 | 6.521739 | 15.217391 | 46 |
Middle Atlantic | 11.510791 | 20.14388 | 36.69065 | 22.302158 | 9.352518 | 139 |
Mountain | 0.000000 | 28.20513 | 39.74359 | 17.948718 | 14.102564 | 78 |
New England | 5.333333 | 44.00000 | 34.66667 | 12.000000 | 4.000000 | 75 |
Pacific | 8.510638 | 34.57447 | 38.82979 | 10.106383 | 7.978723 | 188 |
South Atlantic | 6.666667 | 38.46154 | 26.66667 | 23.076923 | 5.128205 | 195 |
West North Central | 11.578947 | 46.31579 | 29.47368 | 12.631579 | 0.000000 | 95 |
West South Central | 0.000000 | 51.66667 | 26.66667 | 13.333333 | 8.333333 | 60 |
grouped <- crosstab(non_null_risks, x = us_region, y = doneness, weight = total_risks, format = 'long')
grouped %>%
ggplot(aes(x = pct, y = us_region, fill = doneness)) +
geom_bar(stat = 'identity', position = position_fill(reverse = TRUE)) +
xlab("Total Region %") +
ylab(element_blank()) +
geom_text(aes(label = paste0(round(pct, 0), '%')), position = position_fill(reverse = TRUE, vjust = 0.5), size = 2.5, color = 'black', fontface = 'bold') +
scale_x_continuous(labels = scales::percent) +
theme(legend.position = "top", legend.title = element_blank()) +
labs(title = 'Steak Doneness by US Region') +
scale_y_discrete(limits=rev) +
scale_fill_brewer(palette="RdBu")
Surprisingly, not every region is represented by people who prefer their steaks rare.
Given how the survey was taken, more information would be beneficial to help determine riskiness. Each question only offered the options of Yes/No, however, the degree of risk may vary based on frequency. A person who drinks alcohol, may drink a few glasses per week, but another person may drink several per day. The survey is also at risk of self-selection bias and does not account for random sampling. Lastly, are certain race/religions more prone to how they like to eat steak?