How Americans Like Their Steak

Overview

Walt Hickey from FiveThirtyEight collected data from people within the United States to see if a risk-averse person would be more likely to order a steak well done. They found no evidence a person that was a higher risk taker would prefer their steaks rare.

‘FiveThirtyEight Article’

‘Data Source’

Required libraries

library(pander)
library(tidyverse)
library(pollster)

Data Dictionary

The original data set columns included the full questions asked. They will be renamed with keywords for ease:

Keyword	Original Column
id	RespondentID
lottery_pick	Consider the following hypothetical situations: In Lottery A, you have a 50% chance of success, with a payout of $100. In Lottery B, you have a 90% chance of success, with a payout of $20. Assuming you have $10 to bet, would you play Lottery A or Lottery B?
smoker	Do you ever smoke cigarettes?
alcohol	Do you ever drink alcohol?
gamble	Do you ever gamble?
skydiving	Have you ever been skydiving?
speeding	Do you ever drive above the speed limit?
cheated	Have you ever cheated on your significant other?
eat_steak	Do you eat steak?
doneness	How do you like your steak prepared?
gender	Gender
age_group	Age
income_level	Household Income
education_level	Education
us_region	Location (Census Region)

Data Preparation

Import data from GitHub and rename columns

new_colnames <- c('id', 'lottery_pick', 'smoker', 'alcohol', 'gamble', 'skydiving', 'speeding', 'cheated', 'eat_steak', 'doneness', 'gender', 'age_group', 'income_level', 'education_level', 'us_region')

steak_survey <- read.csv('https://raw.githubusercontent.com/fivethirtyeight/data/master/steak-survey/steak-risk-survey.csv', col.names = new_colnames)

View first few rows within the data frame.

knitr::kable(head(steak_survey, 5))

id	lottery_pick	smoker	alcohol	gamble	skydiving	speeding	cheated	eat_steak	doneness	gender	age_group	income_level	education_level	us_region
NA	Response	Response	Response	Response	Response	Response	Response	Response	Response	Response	Response	Response	Response	Response
3237565956	Lottery B
3234982343	Lottery A	No	Yes	No	No	No	No	Yes	Medium rare	Male	> 60	$50,000 - $99,999	Some college or Associate degree	East North Central
3234973379	Lottery A	No	Yes	Yes	No	Yes	Yes	Yes	Rare	Male	> 60	$150,000+	Graduate degree	South Atlantic
3234972383	Lottery B	Yes	Yes	Yes	No	Yes	Yes	Yes	Medium	Male	> 60	$50,000 - $99,999	Bachelor degree	New England

Remove data that is explicitly NA from the id column and people who do not eat steak. Subset data to focus on risks and their region.

steak_survey <- steak_survey %>% 
  drop_na(id)

subset_cols <- c('smoker', 'alcohol', 'gamble', 'skydiving', 'speeding', 'cheated', 'doneness', 'us_region')
risks <- steak_survey %>% 
  select(all_of(subset_cols))

non_null_risks <- risks %>% 
  filter(across(all_of(subset_cols), ~ .x != ''))

knitr::kable(head(non_null_risks, 5))

smoker	alcohol	gamble	skydiving	speeding	cheated	doneness	us_region
No	Yes	No	No	No	No	Medium rare	East North Central
No	Yes	Yes	No	Yes	Yes	Rare	South Atlantic
Yes	Yes	Yes	No	Yes	Yes	Medium	New England
No	Yes	No	No	Yes	Yes	Medium	Middle Atlantic
No	No	No	No	Yes	No	Medium rare	West South Central

Findings

We can see that most individuals prefer Medium and Medium Rare cooking temperatures for their steaks.

ggplot(data = non_null_risks) +
  geom_bar(aes(doneness))

Calculate total risks taken by each individual.

non_null_risks[non_null_risks=='Yes'] <- 1
non_null_risks[non_null_risks=='No'] <- 0

change_type <- c('smoker', 'alcohol', 'gamble', 'skydiving', 'speeding', 'cheated')

non_null_risks <- non_null_risks %>% 
  mutate_at(change_type, as.integer)

non_null_risks$doneness <- factor(non_null_risks$doneness, levels = c('Rare', 'Medium rare', 'Medium', 'Medium Well', 'Well'))

non_null_risks <- non_null_risks %>% 
  mutate(total_risks = smoker + alcohol + gamble + skydiving + speeding + cheated)

Using a crosstab view, we can determine how people like their steak within each region

knitr::kable(crosstab(non_null_risks, x = us_region, y = doneness, weight = total_risks))

us_region	Rare	Medium rare	Medium	Medium Well	Well	n
East North Central	0.000000	40.36145	33.73494	15.662651	10.240964	166
East South Central	0.000000	54.34783	23.91304	6.521739	15.217391	46
Middle Atlantic	11.510791	20.14388	36.69065	22.302158	9.352518	139
Mountain	0.000000	28.20513	39.74359	17.948718	14.102564	78
New England	5.333333	44.00000	34.66667	12.000000	4.000000	75
Pacific	8.510638	34.57447	38.82979	10.106383	7.978723	188
South Atlantic	6.666667	38.46154	26.66667	23.076923	5.128205	195
West North Central	11.578947	46.31579	29.47368	12.631579	0.000000	95
West South Central	0.000000	51.66667	26.66667	13.333333	8.333333	60

grouped <- crosstab(non_null_risks, x = us_region, y = doneness, weight = total_risks, format = 'long')

grouped %>% 
  ggplot(aes(x = pct, y = us_region, fill = doneness)) +
  geom_bar(stat = 'identity', position = position_fill(reverse = TRUE)) +
  xlab("Total Region %") +
  ylab(element_blank()) +
  geom_text(aes(label = paste0(round(pct, 0), '%')), position = position_fill(reverse = TRUE, vjust = 0.5), size = 2.5, color = 'black', fontface = 'bold') +
  scale_x_continuous(labels = scales::percent) +
 theme(legend.position = "top", legend.title = element_blank()) +
  labs(title = 'Steak Doneness by US Region') +
  scale_y_discrete(limits=rev) +
  scale_fill_brewer(palette="RdBu")

Surprisingly, not every region is represented by people who prefer their steaks rare.

Recommendations

Given how the survey was taken, more information would be beneficial to help determine riskiness. Each question only offered the options of Yes/No, however, the degree of risk may vary based on frequency. A person who drinks alcohol, may drink a few glasses per week, but another person may drink several per day. The survey is also at risk of self-selection bias and does not account for random sampling. Lastly, are certain race/religions more prone to how they like to eat steak?