This data contains 76 cereals and their nutritional information, such as protein, carbs, fat, sugar, and fiber. It also records information about what shelf the cereal is placed on in a store, the manufacturer, and if it is hot or cold. Each row represents a different cereal.
library(tidyverse) # For Tidyverse functions
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.1 ✔ tibble 3.2.1
✔ lubridate 1.9.4 ✔ tidyr 1.3.1
✔ purrr 1.0.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Rows: 77 Columns: 16
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (3): name, mfr, type
dbl (13): calories, protein, fat, sodium, fiber, carbo, sugars, potass, vita...
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Research Question
Do cereals with the word wheat have lower calories than those without the word wheat? I find this interesting because these keywords portray a cereal as being healthy, so I wonder if that branding translates.
Analysis
I will need to split the cereals into two separate categories. One will be those with the word wheat and one without the word wheat. I will do this by mutating a column that outputs True if it has wheat in the name and False if it does not. Next, i will create 2 box plots comparing calories with and without wheat
Cereal %>%mutate(has_wheat =str_detect(name, regex("wheat", ignore_case =TRUE))) %>%ggplot(aes(x =factor(has_wheat, labels =c("Other", "Includes Wheat")), y = calories)) +geom_boxplot() +labs(title ="Comparison of Calorie Content in Cereals with and without 'Wheat' in the Name",x ="Cereal Name",y ="Calories")
This boxplot shows that cereals with the word “Wheat” in their name tend to have lower calories per serving. The boxplot on the right has a lower median, Q1, and Q3 compared to the other cereals. This makes sense since “Wheat” implies a healthier alternative. Cereals with “Wheat” also have a smaller IQR, indicating less variation in their calorie content.