Analysis of Cereal Nutrition

Author

Patrick Berry

Introduction

This data contains 76 cereals and their nutritional information, such as protein, carbs, fat, sugar, and fiber. It also records information about what shelf the cereal is placed on in a store, the manufacturer, and if it is hot or cold. Each row represents a different cereal.

library(tidyverse) # For Tidyverse functions
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(skimr)     # For better summary statistics
Cereal <- read_csv("https://myxavier-my.sharepoint.com/:x:/g/personal/berryp1_xavier_edu/EWp2YMBmFMRFkSUVJ_qbHZ4B6TU4_qrI56EaObOH7qBfbg?download=1")
Rows: 77 Columns: 16
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr  (3): name, mfr, type
dbl (13): calories, protein, fat, sodium, fiber, carbo, sugars, potass, vita...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Research Question

Do cereals with the word wheat have lower calories than those without the word wheat? I find this interesting because these keywords portray a cereal as being healthy, so I wonder if that branding translates.

Analysis

I will need to split the cereals into two separate categories. One will be those with the word wheat and one without the word wheat. I will do this by mutating a column that outputs True if it has wheat in the name and False if it does not. Next, i will create 2 box plots comparing calories with and without wheat

Cereal %>%
  mutate(has_wheat = str_detect(name, regex("wheat", ignore_case = TRUE))) %>% 
  ggplot(aes(x = factor(has_wheat, labels = c("Other", "Includes Wheat")), y = calories)) + 
  geom_boxplot() + 
  labs(
    title = "Comparison of Calorie Content in Cereals with and without 'Wheat' in the Name",
    x = "Cereal Name",
    y = "Calories")

This boxplot shows that cereals with the word “Wheat” in their name tend to have lower calories per serving. The boxplot on the right has a lower median, Q1, and Q3 compared to the other cereals. This makes sense since “Wheat” implies a healthier alternative. Cereals with “Wheat” also have a smaller IQR, indicating less variation in their calorie content.