tuesdata <- tidytuesdayR::tt_load(2024, week = 20)
coffee_survey <- tuesdata$coffee_survey
This report analyzes coffee consumption data provided in the tidytuesday dataset.Viewers fill out a survey about the 4 coffees they order from Cometeer.
head(coffee_survey)
## # A tibble: 6 × 57
## submission_id age cups where_drink brew brew_other purchase purchase_other
## <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 gMR29l 18-2… <NA> <NA> <NA> <NA> <NA> <NA>
## 2 BkPN0e 25-3… <NA> <NA> Pod/… <NA> <NA> <NA>
## 3 W5G8jj 25-3… <NA> <NA> Bean… <NA> <NA> <NA>
## 4 4xWgGr 35-4… <NA> <NA> Coff… <NA> <NA> <NA>
## 5 QD27Q8 25-3… <NA> <NA> Pour… <NA> <NA> <NA>
## 6 V0LPeM 55-6… <NA> <NA> Pod/… <NA> <NA> <NA>
## # ℹ 49 more variables: favorite <chr>, favorite_specify <chr>, additions <chr>,
## # additions_other <chr>, dairy <chr>, sweetener <chr>, style <chr>,
## # strength <chr>, roast_level <chr>, caffeine <chr>, expertise <dbl>,
## # coffee_a_bitterness <dbl>, coffee_a_acidity <dbl>,
## # coffee_a_personal_preference <dbl>, coffee_a_notes <chr>,
## # coffee_b_bitterness <dbl>, coffee_b_acidity <dbl>,
## # coffee_b_personal_preference <dbl>, coffee_b_notes <chr>, …
data_filtered <- coffee_survey[!is.na(coffee_survey$gender) & !is.na(coffee_survey$cups) & !is.na(coffee_survey$education_level)& !is.na(coffee_survey$ethnicity_race), ]
Compared to the other genders, men have the highest self-rating for coffee expertise.
aggregate(expertise~gender,FUN=mean,data=data_filtered)
For the sake of simplicity when drawing graphs with age as the horizontal coordinate.
data_filtered$age <- str_replace_all(data_filtered$age, " years old", "")
The participants from each group surveyed consumed an average of about 2 cups of coffee per day.
ggplot(data_filtered, aes(x = age, y = as.numeric(cups), fill = age)) +
geom_boxplot() +
scale_fill_brewer(palette = "Set2") +
labs(title = "Boxplot of Coffee Consumption by Age Group",
x = "Age Group",
y = "Cups of Coffee per Day") +
theme_bw()
Women drink less coffee on average than the other genders.
ggplot(data_filtered, aes(x = gender, y = as.numeric(cups), fill = gender)) +
geom_boxplot() +
scale_fill_brewer(palette = "Set2") +
labs(title = "Boxplot of Coffee Consumption by Gender Group",
x = "Gender Group",
y = "Cups of Coffee per Day") +
theme_bw()
Many people’s coffee consumption is concentrated between 1 and 2 cups per day.
ggplot(data_filtered, aes(x = as.numeric(cups), fill = ethnicity_race)) +
geom_density(alpha = 0.5) +
scale_fill_brewer(palette = "Dark2") +
labs(title = "Density Plot of Coffee Consumption",
x = "Cups per Day",
y = "Density") +
theme_minimal()
Those with more education (e.g., masters, PhDs) were more likely to be spread over 3 cups/day.
ggplot(data_filtered, aes(x = as.numeric(cups), fill = education_level)) +
geom_density(alpha = 0.5) +
scale_fill_brewer(palette = "Dark2") +
labs(title = "Density Plot of Coffee Consumption",
x = "Cups per Day",
y = "Density") +
theme_classic()
Scores of 6-7 were the highest, and the distribution was right-skewed, indicating that most people felt they had a general knowledge of coffee.
ggplot(data_filtered, aes(x = as.numeric(expertise))) +
geom_histogram(binwidth = 1, fill = "steelblue", color = "black", alpha = 0.7) +
geom_density(aes(y = ..count..), color = "gray", linewidth = 0.5) +
labs(title = "Participants' self-assessed coffee expertise",
x = "expertise assessement",
y = "Frequency") +
theme_bw()
Coffee consumption is highest among 25-44 year olds, and lower among the young and the elderly
ggplot(data_filtered, aes(x = age, y = as.numeric(cups))) +
geom_jitter(width = 0.2, height = 0.2, color = "gray", alpha = 0.5) + # Jitter points for better visibility
labs(title = "Age vs Coffee Consumption",
x = "Age",
y = "Cups of Coffee per Day") +
theme_bw()
Men are usually more inclined to confidently assess their coffee expertise
df_filtered <- subset(data_filtered, gender %in% c("Male", "Female"))
df_filtered$expertise_cat <- cut(df_filtered$expertise,
breaks = c(0, 5, Inf),
labels = c("Low_expertise", "High_expertise"))
ggplot(df_filtered, aes(x = gender, fill = expertise_cat)) +
geom_bar() +
labs(title = "Distribution of Expertise Categories by Gender",
x = "Gender",
y = "Count",
fill = "Expertise Category" ) +
theme_minimal()
1.Coffee consumption patterns of different groups: most people drink 1-3 cups per day, with 2 cups being the most common; consumption habits vary by race and education level, but the overall trend is similar; age correlates strongly with consumption, with 25-44 year olds being the main group of drinkers, usually at 2-3 cups/day. People <18 and >65 years old drink less coffee.
2.Self-assessment of coffee expertise: Men are more likely than women to consider themselves coffee experts, with a higher percentage of high expertise in the male group and a larger percentage of low expertise in the female. However, since more men than women participated in the survey, there may be a gender bias in the survey.