library(tidyverse)
library(openintro)
Sample Vignette
This is an example of how to use tidyverse packages such as dplyr,
forcats, and ggplot2. Our goal is to answer the question: does education
level impact a person’s steak preference?
Start by loading the data. The resulting dataframe will require a bit
of tidying before it can be used. The data is from a steak survey at https://github.com/fivethirtyeight/data/tree/master/steak-survey.
steak_url <- 'https://raw.githubusercontent.com/Megabuster/Data607/refs/heads/main/data/tidyverse/steak-risk-survey.csv'
raw_steak <- read.csv(steak_url, header = TRUE, sep = ',')
Using Dplyr
Using rename
from dplyr in tidyverse, we can rename the
otherwise long and unwieldy column names. Filter
, also from
dplyr, for respondents that say they eat steak because we are looking
for their opinions. Subset
from dplyr allows us to then
remove the eat_steak column as it no longer provides meaningful
information.
steak_df <- raw_steak %>%
rename(
respondent_id = 1,
lottery_risk = 2,
smoke = 3,
alcohol = 4,
gamble = 5,
skydiving = 6,
drive_limit = 7,
cheat = 8,
eat_steak = 9,
steak_prepared = 10,
gender = 11,
age = 12,
income = 13,
education = 14,
location = 15
) %>%
filter(eat_steak == 'Yes') %>%
subset(select = -eat_steak)
Checking education levels, we can see that most respondents have at
least some college background. Only 1 person was listed as having below
a high school degree. There are also 5 preparation styles for steak in
this dataset.
as.data.frame(table(steak_df$education))
## Var1 Freq
## 1 22
## 2 Bachelor degree 132
## 3 Graduate degree 109
## 4 High school degree 32
## 5 Less than high school degree 1
## 6 Some college or Associate degree 134
as.data.frame(table(steak_df$steak_prepared))
## Var1 Freq
## 1 Medium 132
## 2 Medium rare 166
## 3 Medium Well 74
## 4 Rare 23
## 5 Well 35
Using Forcats
We will use fct_lump_lowfreq
from forcats to group the
unknown and less than high school degree respondents together as they
represent a small amount of the respondents. Ggplot
from
ggplot2 within tidyverse can be used to show how these counts
compare.
steak_df <- steak_df %>%
mutate(education = fct_lump_min(education, table(steak_df$education)['High school degree']))
steak_df %>%
ggplot(aes(y = education)) +
geom_bar()

We can then check to see which education and steak style combinations
are most common.
steak_df %>% count(steak_prepared) %>% arrange(desc(n))
## steak_prepared n
## 1 Medium rare 166
## 2 Medium 132
## 3 Medium Well 74
## 4 Well 35
## 5 Rare 23
steak_df %>%
group_by(education) %>%
count(steak_prepared) %>%
arrange(desc(n))
## # A tibble: 25 × 3
## # Groups: education [5]
## education steak_prepared n
## <fct> <chr> <int>
## 1 Some college or Associate degree Medium rare 51
## 2 Bachelor degree Medium rare 47
## 3 Graduate degree Medium rare 46
## 4 Bachelor degree Medium 45
## 5 Some college or Associate degree Medium 39
## 6 Graduate degree Medium 33
## 7 Some college or Associate degree Medium Well 26
## 8 Bachelor degree Medium Well 21
## 9 Graduate degree Medium Well 18
## 10 High school degree Medium rare 14
## # ℹ 15 more rows
Medium rare and medium were the most preferred steak preparation
styles. This was apparent when counting out each combination as medium
rare was the most common order for some college/associate degree,
bachelor degree, and graduate degree. This implies that people who go to
college prefer medium rare the most.
Using Ggplot
This result can be better understood with visualizations from
ggplot
.
steak_df %>%
group_by(education) %>%
filter(steak_prepared == 'Medium rare') %>%
ggplot(aes(y = education)) +
stat_count() +
labs(title = 'Medium rare steak choosers by education level')

The counts for medium rare look similar to the education distribution
of the whole. We should check to make sure the high amount of medium
rare selectors that have gone to college is not just because of the
population distribution.
medium_rare_count <- steak_df %>%
group_by(education) %>%
filter(steak_prepared == 'Medium rare') %>%
count() %>%
group_by(education) %>%
summarise(total = sum(n))
education_count <- steak_df %>%
count(education) %>%
group_by(education) %>%
summarise(total = sum(n))
education_count$mr_count = medium_rare_count$total
education_count <- education_count %>%
mutate(percentage = mr_count / total) %>%
arrange(desc(percentage))
education_count
## # A tibble: 5 × 4
## education total mr_count percentage
## <fct> <int> <int> <dbl>
## 1 High school degree 32 14 0.438
## 2 Graduate degree 109 46 0.422
## 3 Some college or Associate degree 134 51 0.381
## 4 Bachelor degree 132 47 0.356
## 5 Other 23 8 0.348
education_count %>%
ggplot(aes(x = percentage, y = education)) +
geom_col() +
labs(title = 'Medium rare steak choosers by education level percentage')

These last results are much closer, but also paint a different
picture. Among the respondents of this data set, high school graduates
were the most likely to select medium rare as their favorite way of
having steak. Each group hovered around 35-40% in favor of medium
rare.
Conclusions
It is common to be able to use multiple tidyverse packages within a
workflow. Using dplyr, forcats, and ggplot, we were able to tidy the
original steak data using dplyr, reorganized education values using
forcats, and plotted education for medium rare steak choosers using
ggplot.
We hoped to answer if education level impacted a person’s steak
preference. Initially it appeared that medium rare steaks were more
popular with those with college experience or a college degree. However,
that was because the sample population was skewed toward college. When
we accounted for this skew, it turned out that high school graduates had
the highest rate of preferring medium rare, closely followed by those
with a graduate degree.
LS0tDQp0aXRsZTogIlRpZHl2ZXJzZSBDcmVhdGUiDQphdXRob3I6ICJMYXdyZW5jZSBZdSINCmRhdGU6ICJgciBTeXMuRGF0ZSgpYCINCm91dHB1dDogb3BlbmludHJvOjpsYWJfcmVwb3J0DQotLS0NCg0KYGBge3IgbG9hZC1wYWNrYWdlcywgbWVzc2FnZT1GQUxTRX0NCmxpYnJhcnkodGlkeXZlcnNlKQ0KbGlicmFyeShvcGVuaW50cm8pDQoNCmBgYA0KDQojIyMgU2FtcGxlIFZpZ25ldHRlDQoNClRoaXMgaXMgYW4gZXhhbXBsZSBvZiBob3cgdG8gdXNlIHRpZHl2ZXJzZSBwYWNrYWdlcyBzdWNoIGFzIGRwbHlyLCBmb3JjYXRzLCBhbmQgZ2dwbG90Mi4gT3VyIGdvYWwgaXMgdG8gYW5zd2VyIHRoZSBxdWVzdGlvbjogZG9lcyBlZHVjYXRpb24gbGV2ZWwgaW1wYWN0IGEgcGVyc29uJ3Mgc3RlYWsgcHJlZmVyZW5jZT8gIA0KDQpTdGFydCBieSBsb2FkaW5nIHRoZSBkYXRhLiBUaGUgcmVzdWx0aW5nIGRhdGFmcmFtZSB3aWxsIHJlcXVpcmUgYSBiaXQgb2YgdGlkeWluZyBiZWZvcmUgaXQgY2FuIGJlIHVzZWQuIFRoZSBkYXRhIGlzIGZyb20gYSBzdGVhayBzdXJ2ZXkgYXQgaHR0cHM6Ly9naXRodWIuY29tL2ZpdmV0aGlydHllaWdodC9kYXRhL3RyZWUvbWFzdGVyL3N0ZWFrLXN1cnZleS4gDQoNCmBgYHtyfQ0Kc3RlYWtfdXJsIDwtICdodHRwczovL3Jhdy5naXRodWJ1c2VyY29udGVudC5jb20vTWVnYWJ1c3Rlci9EYXRhNjA3L3JlZnMvaGVhZHMvbWFpbi9kYXRhL3RpZHl2ZXJzZS9zdGVhay1yaXNrLXN1cnZleS5jc3YnDQpyYXdfc3RlYWsgPC0gcmVhZC5jc3Yoc3RlYWtfdXJsLCBoZWFkZXIgPSBUUlVFLCBzZXAgPSAnLCcpDQpgYGANCg0KIyMjIFVzaW5nIERwbHlyDQoNClVzaW5nIGByZW5hbWVgIGZyb20gZHBseXIgaW4gdGlkeXZlcnNlLCB3ZSBjYW4gcmVuYW1lIHRoZSBvdGhlcndpc2UgbG9uZyBhbmQgdW53aWVsZHkgY29sdW1uIG5hbWVzLiBgRmlsdGVyYCwgYWxzbyBmcm9tIGRwbHlyLCBmb3IgcmVzcG9uZGVudHMgdGhhdCBzYXkgdGhleSBlYXQgc3RlYWsgYmVjYXVzZSB3ZSBhcmUgbG9va2luZyBmb3IgdGhlaXIgb3BpbmlvbnMuIGBTdWJzZXRgIGZyb20gZHBseXIgYWxsb3dzIHVzIHRvIHRoZW4gcmVtb3ZlIHRoZSBlYXRfc3RlYWsgY29sdW1uIGFzIGl0IG5vIGxvbmdlciBwcm92aWRlcyBtZWFuaW5nZnVsIGluZm9ybWF0aW9uLg0KDQpgYGB7cn0NCnN0ZWFrX2RmIDwtIHJhd19zdGVhayAlPiUgDQogIHJlbmFtZSgNCiAgICByZXNwb25kZW50X2lkID0gMSwNCiAgICBsb3R0ZXJ5X3Jpc2sgPSAyLA0KICAgIHNtb2tlID0gMywNCiAgICBhbGNvaG9sID0gNCwNCiAgICBnYW1ibGUgPSA1LA0KICAgIHNreWRpdmluZyA9IDYsDQogICAgZHJpdmVfbGltaXQgPSA3LA0KICAgIGNoZWF0ID0gOCwgDQogICAgZWF0X3N0ZWFrID0gOSwNCiAgICBzdGVha19wcmVwYXJlZCA9IDEwLA0KICAgIGdlbmRlciA9IDExLA0KICAgIGFnZSA9IDEyLA0KICAgIGluY29tZSA9IDEzLA0KICAgIGVkdWNhdGlvbiA9IDE0LA0KICAgIGxvY2F0aW9uID0gMTUNCiAgKSAlPiUNCiAgZmlsdGVyKGVhdF9zdGVhayA9PSAnWWVzJykgJT4lDQogIHN1YnNldChzZWxlY3QgPSAtZWF0X3N0ZWFrKQ0KYGBgDQoNCkNoZWNraW5nIGVkdWNhdGlvbiBsZXZlbHMsIHdlIGNhbiBzZWUgdGhhdCBtb3N0IHJlc3BvbmRlbnRzIGhhdmUgYXQgbGVhc3Qgc29tZSBjb2xsZWdlIGJhY2tncm91bmQuIE9ubHkgMSBwZXJzb24gd2FzIGxpc3RlZCBhcyBoYXZpbmcgYmVsb3cgYSBoaWdoIHNjaG9vbCBkZWdyZWUuIFRoZXJlIGFyZSBhbHNvIDUgcHJlcGFyYXRpb24gc3R5bGVzIGZvciBzdGVhayBpbiB0aGlzIGRhdGFzZXQuDQpgYGB7cn0NCmFzLmRhdGEuZnJhbWUodGFibGUoc3RlYWtfZGYkZWR1Y2F0aW9uKSkNCmFzLmRhdGEuZnJhbWUodGFibGUoc3RlYWtfZGYkc3RlYWtfcHJlcGFyZWQpKQ0KYGBgDQoNCiMjIyBVc2luZyBGb3JjYXRzDQoNCldlIHdpbGwgdXNlIGBmY3RfbHVtcF9sb3dmcmVxYCBmcm9tIGZvcmNhdHMgdG8gZ3JvdXAgdGhlIHVua25vd24gYW5kIGxlc3MgdGhhbiBoaWdoIHNjaG9vbCBkZWdyZWUgcmVzcG9uZGVudHMgdG9nZXRoZXIgYXMgdGhleSByZXByZXNlbnQgYSBzbWFsbCBhbW91bnQgb2YgdGhlIHJlc3BvbmRlbnRzLiBgR2dwbG90YCBmcm9tIGdncGxvdDIgd2l0aGluIHRpZHl2ZXJzZSBjYW4gIGJlIHVzZWQgdG8gc2hvdyBob3cgdGhlc2UgY291bnRzIGNvbXBhcmUuDQoNCmBgYHtyfQ0Kc3RlYWtfZGYgPC0gc3RlYWtfZGYgJT4lIA0KICBtdXRhdGUoZWR1Y2F0aW9uID0gZmN0X2x1bXBfbWluKGVkdWNhdGlvbiwgdGFibGUoc3RlYWtfZGYkZWR1Y2F0aW9uKVsnSGlnaCBzY2hvb2wgZGVncmVlJ10pKSANCnN0ZWFrX2RmICU+JQ0KICBnZ3Bsb3QoYWVzKHkgPSBlZHVjYXRpb24pKSArDQogIGdlb21fYmFyKCkNCmBgYA0KDQpXZSBjYW4gdGhlbiBjaGVjayB0byBzZWUgd2hpY2ggZWR1Y2F0aW9uIGFuZCBzdGVhayBzdHlsZSBjb21iaW5hdGlvbnMgYXJlIG1vc3QgY29tbW9uLg0KDQpgYGB7cn0NCnN0ZWFrX2RmICU+JSBjb3VudChzdGVha19wcmVwYXJlZCkgJT4lIGFycmFuZ2UoZGVzYyhuKSkNCnN0ZWFrX2RmICU+JSANCiAgZ3JvdXBfYnkoZWR1Y2F0aW9uKSAlPiUgDQogIGNvdW50KHN0ZWFrX3ByZXBhcmVkKSAlPiUgDQogIGFycmFuZ2UoZGVzYyhuKSkNCmBgYA0KDQpNZWRpdW0gcmFyZSBhbmQgbWVkaXVtIHdlcmUgdGhlIG1vc3QgcHJlZmVycmVkIHN0ZWFrIHByZXBhcmF0aW9uIHN0eWxlcy4gVGhpcyB3YXMgYXBwYXJlbnQgd2hlbiBjb3VudGluZyBvdXQgZWFjaCBjb21iaW5hdGlvbiBhcyBtZWRpdW0gcmFyZSB3YXMgdGhlIG1vc3QgY29tbW9uIG9yZGVyIGZvciBzb21lIGNvbGxlZ2UvYXNzb2NpYXRlIGRlZ3JlZSwgYmFjaGVsb3IgZGVncmVlLCBhbmQgZ3JhZHVhdGUgZGVncmVlLiBUaGlzIGltcGxpZXMgdGhhdCBwZW9wbGUgd2hvIGdvIHRvIGNvbGxlZ2UgcHJlZmVyIG1lZGl1bSByYXJlIHRoZSBtb3N0LiANCg0KIyMjIFVzaW5nIEdncGxvdA0KDQpUaGlzIHJlc3VsdCBjYW4gYmUgYmV0dGVyIHVuZGVyc3Rvb2Qgd2l0aCB2aXN1YWxpemF0aW9ucyBmcm9tIGBnZ3Bsb3RgLiANCg0KYGBge3J9DQpzdGVha19kZiAlPiUgDQogIGdyb3VwX2J5KGVkdWNhdGlvbikgJT4lIA0KICBmaWx0ZXIoc3RlYWtfcHJlcGFyZWQgPT0gJ01lZGl1bSByYXJlJykgJT4lDQogIGdncGxvdChhZXMoeSA9IGVkdWNhdGlvbikpICsNCiAgc3RhdF9jb3VudCgpICsNCiAgbGFicyh0aXRsZSA9ICdNZWRpdW0gcmFyZSBzdGVhayBjaG9vc2VycyBieSBlZHVjYXRpb24gbGV2ZWwnKQ0KYGBgDQoNClRoZSBjb3VudHMgZm9yIG1lZGl1bSByYXJlIGxvb2sgc2ltaWxhciB0byB0aGUgZWR1Y2F0aW9uIGRpc3RyaWJ1dGlvbiBvZiB0aGUgd2hvbGUuIFdlIHNob3VsZCBjaGVjayB0byBtYWtlIHN1cmUgdGhlIGhpZ2ggYW1vdW50IG9mIG1lZGl1bSByYXJlIHNlbGVjdG9ycyB0aGF0IGhhdmUgZ29uZSB0byBjb2xsZWdlIGlzIG5vdCBqdXN0IGJlY2F1c2Ugb2YgdGhlIHBvcHVsYXRpb24gZGlzdHJpYnV0aW9uLiANCmBgYHtyfQ0KbWVkaXVtX3JhcmVfY291bnQgPC0gc3RlYWtfZGYgJT4lIA0KICBncm91cF9ieShlZHVjYXRpb24pICU+JSANCiAgZmlsdGVyKHN0ZWFrX3ByZXBhcmVkID09ICdNZWRpdW0gcmFyZScpICU+JSANCiAgY291bnQoKSAlPiUNCiAgZ3JvdXBfYnkoZWR1Y2F0aW9uKSAlPiUgDQogIHN1bW1hcmlzZSh0b3RhbCA9IHN1bShuKSkNCg0KZWR1Y2F0aW9uX2NvdW50IDwtIHN0ZWFrX2RmICU+JSANCiAgY291bnQoZWR1Y2F0aW9uKSAlPiUNCiAgZ3JvdXBfYnkoZWR1Y2F0aW9uKSAlPiUgDQogIHN1bW1hcmlzZSh0b3RhbCA9IHN1bShuKSkNCiAgDQplZHVjYXRpb25fY291bnQkbXJfY291bnQgPSBtZWRpdW1fcmFyZV9jb3VudCR0b3RhbA0KDQplZHVjYXRpb25fY291bnQgPC0gZWR1Y2F0aW9uX2NvdW50ICU+JQ0KICBtdXRhdGUocGVyY2VudGFnZSA9IG1yX2NvdW50IC8gdG90YWwpICU+JQ0KICBhcnJhbmdlKGRlc2MocGVyY2VudGFnZSkpDQoNCmVkdWNhdGlvbl9jb3VudA0KDQplZHVjYXRpb25fY291bnQgJT4lDQogIGdncGxvdChhZXMoeCA9IHBlcmNlbnRhZ2UsIHkgPSBlZHVjYXRpb24pKSArDQogIGdlb21fY29sKCkgKyANCiAgbGFicyh0aXRsZSA9ICdNZWRpdW0gcmFyZSBzdGVhayBjaG9vc2VycyBieSBlZHVjYXRpb24gbGV2ZWwgcGVyY2VudGFnZScpDQpgYGANCg0KVGhlc2UgbGFzdCByZXN1bHRzIGFyZSBtdWNoIGNsb3NlciwgYnV0IGFsc28gcGFpbnQgYSBkaWZmZXJlbnQgcGljdHVyZS4gQW1vbmcgdGhlIHJlc3BvbmRlbnRzIG9mIHRoaXMgZGF0YSBzZXQsIGhpZ2ggc2Nob29sIGdyYWR1YXRlcyB3ZXJlIHRoZSBtb3N0IGxpa2VseSB0byBzZWxlY3QgbWVkaXVtIHJhcmUgYXMgdGhlaXIgZmF2b3JpdGUgd2F5IG9mIGhhdmluZyBzdGVhay4gRWFjaCBncm91cCBob3ZlcmVkIGFyb3VuZCAzNS00MCUgaW4gZmF2b3Igb2YgbWVkaXVtIHJhcmUuIA0KDQojIyMgQ29uY2x1c2lvbnMNCg0KSXQgaXMgY29tbW9uIHRvIGJlIGFibGUgdG8gdXNlIG11bHRpcGxlIHRpZHl2ZXJzZSBwYWNrYWdlcyB3aXRoaW4gYSB3b3JrZmxvdy4gVXNpbmcgZHBseXIsIGZvcmNhdHMsIGFuZCBnZ3Bsb3QsIHdlIHdlcmUgYWJsZSB0byB0aWR5IHRoZSBvcmlnaW5hbCBzdGVhayBkYXRhIHVzaW5nIGRwbHlyLCByZW9yZ2FuaXplZCBlZHVjYXRpb24gdmFsdWVzIHVzaW5nIGZvcmNhdHMsIGFuZCBwbG90dGVkIGVkdWNhdGlvbiBmb3IgbWVkaXVtIHJhcmUgc3RlYWsgY2hvb3NlcnMgdXNpbmcgZ2dwbG90LiANCg0KV2UgaG9wZWQgdG8gYW5zd2VyIGlmIGVkdWNhdGlvbiBsZXZlbCBpbXBhY3RlZCBhIHBlcnNvbidzIHN0ZWFrIHByZWZlcmVuY2UuIEluaXRpYWxseSBpdCBhcHBlYXJlZCB0aGF0IG1lZGl1bSByYXJlIHN0ZWFrcyB3ZXJlIG1vcmUgcG9wdWxhciB3aXRoIHRob3NlIHdpdGggY29sbGVnZSBleHBlcmllbmNlIG9yIGEgY29sbGVnZSBkZWdyZWUuIEhvd2V2ZXIsIHRoYXQgd2FzIGJlY2F1c2UgdGhlIHNhbXBsZSBwb3B1bGF0aW9uIHdhcyBza2V3ZWQgdG93YXJkIGNvbGxlZ2UuIFdoZW4gd2UgYWNjb3VudGVkIGZvciB0aGlzIHNrZXcsIGl0IHR1cm5lZCBvdXQgdGhhdCBoaWdoIHNjaG9vbCBncmFkdWF0ZXMgaGFkIHRoZSBoaWdoZXN0IHJhdGUgb2YgcHJlZmVycmluZyBtZWRpdW0gcmFyZSwgY2xvc2VseSBmb2xsb3dlZCBieSB0aG9zZSB3aXRoIGEgZ3JhZHVhdGUgZGVncmVlLg0K