What Makes a Cereal Healthy? An Analysis of Nutritional Factors in Breakfast Cereals

Author

Sandeep Thapa Chhetri

Introduction

Breakfast cereals are a common food choice, but not all cereals are equally healthy. Some cereals have high sugar, calories, fat, and sodium, while others have more fiber and protein.

For this project, I analyzed a breakfast cereal dataset using R in Posit Cloud. The goal of this project is to understand which nutrition factors make a cereal healthier.

Since this dataset does not include a rating variable, I created a health score using nutrition values. Cereals with higher fiber and protein receive a better score, while cereals with higher sugar, calories, fat, and sodium receive a lower score.

Research Question

  1. Do cereals with more sugar tend to be less healthy?
  2. Is there a relationship between calories and health score?
  3. Do cereals with more fiber have better health scores?
  4. Which cereals are the healthiest based on nutrition values?

Load Packages

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.2.1     ✔ readr     2.2.0
✔ forcats   1.0.1     ✔ stringr   1.6.0
✔ ggplot2   4.0.3     ✔ tibble    3.3.1
✔ lubridate 1.9.5     ✔ tidyr     1.3.2
✔ purrr     1.2.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Load the Dataset

cereal <- read_csv("Cereal (1) (1).csv")
Rows: 30 Columns: 10
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (2): Name, Company
dbl (8): Serving, Calories, Fat, Sodium, Carbs, Fiber, Sugars, Protein

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

The dataset includes 30 cereals with nutrition details like calories, sugar, fiber, fat, and protein, which are used to analyze their healthiness.

Clean and Prepare the Data

cereal_clean <- cereal |>
  mutate(
    Health_Score =
      scale(Fiber)[,1] +
      scale(Protein)[,1] -
      scale(Sugars)[,1] -
      scale(Calories)[,1] -
      scale(Fat)[,1] -
      scale(Sodium)[,1],
    
    Health_Category = case_when(
      Health_Score >= quantile(Health_Score, 0.67, na.rm = TRUE) ~ "Healthier",
      Health_Score <= quantile(Health_Score, 0.33, na.rm = TRUE) ~ "Less Healthy",
      TRUE ~ "Moderate"
    )
  )


head(cereal_clean)
# A tibble: 6 × 12
  Name          Company Serving Calories   Fat Sodium Carbs Fiber Sugars Protein
  <chr>         <chr>     <dbl>    <dbl> <dbl>  <dbl> <dbl> <dbl>  <dbl>   <dbl>
1 AppleJacks    K          1         117   0.6    143    27   0.5   15       1  
2 Boo Berry     G          1         118   0.8    211    27   0.1   14       1  
3 Cap'n Crunch  Q          0.75      144   2.1    269    31   1.1   16       1.3
4 Cinnamon Toa… G          0.75      169   4.4    408    32   1.7   13.3     2.7
5 Cocoa Blasts  Q          1         130   1.2    135    29   0.8   16       1  
6 Cocoa Puffs   G          1         117   1      171    26   0.8   14       1  
# ℹ 2 more variables: Health_Score <dbl>, Health_Category <chr>

The table shows different cereals with their nutrition values and calculated health scores, where cereals with higher sugar, calories, and fat tend to have lower (more negative) health scores.

Summary of the Data

cereal_clean |>
  summarize(
    number_of_cereals = n(),
    average_calories = mean(Calories, na.rm = TRUE),
    average_sugar = mean(Sugars, na.rm = TRUE),
    average_fiber = mean(Fiber, na.rm = TRUE),
    average_protein = mean(Protein, na.rm = TRUE),
    average_fat = mean(Fat, na.rm = TRUE),
    average_sodium = mean(Sodium, na.rm = TRUE)
  )
# A tibble: 1 × 7
  number_of_cereals average_calories average_sugar average_fiber average_protein
              <int>            <dbl>         <dbl>         <dbl>           <dbl>
1                30             134.          10.4          1.80            2.48
# ℹ 2 more variables: average_fat <dbl>, average_sodium <dbl>

The summary shows that cereals have an average of about 134 calories, 10.4 sugar, and 220 sodium, with relatively low fiber and protein, suggesting many cereals are not very high in nutritional value.

Visualization 1: Distribution of Sugar

ggplot(cereal_clean, aes(x = Sugars)) +
  geom_histogram(bins = 10, color = "white") +
  labs(
    title = "Distribution of Sugar in Breakfast Cereals",
    x = "Sugar",
    y = "Number of Cereals"
  ) +
  theme_minimal()

Explanation

The histogram shows that most cereals have moderate to high sugar levels, with many cereals clustered around 10 to 15 grams of sugar, indicating that sugar content is generally high in breakfast cereals.

Visualization 2: Sugar and Health Score

ggplot(cereal_clean, aes(x = Sugars, y = Health_Score)) +
  geom_point(size = 3) +
  geom_smooth(method = "lm", se = FALSE) +
  labs(
    title = "Relationship Between Sugar and Health Score",
    x = "Sugar",
    y = "Health Score"
  ) +
  theme_minimal()
`geom_smooth()` using formula = 'y ~ x'

Explanation

The scatterplot shows a negative relationship between sugar and health score, meaning cereals with higher sugar tend to have lower health scores and are generally less healthy.

Visualization 3: Calories and Health Score

ggplot(cereal_clean, aes(x = Calories, y = Health_Score)) +
  geom_point(size = 3) +
  geom_smooth(method = "lm", se = FALSE) +
  labs(
    title = "Relationship Between Calories and Health Score",
    x = "Calories",
    y = "Health Score"
  ) +
  theme_minimal()
`geom_smooth()` using formula = 'y ~ x'

Explanation

The scatterplot shows a negative relationship between calories and health score, meaning cereals with higher calories tend to have lower health scores and are generally less healthy.

Visualization 4: Fiber and Health Score

ggplot(cereal_clean, aes(x = Fiber, y = Health_Score)) +
  geom_point(size = 3) +
  geom_smooth(method = "lm", se = FALSE) +
  labs(
    title = "Relationship Between Fiber and Health Score",
    x = "Fiber",
    y = "Health Score"
  ) +
  theme_minimal()
`geom_smooth()` using formula = 'y ~ x'

Explanation

The scatterplot shows a slight positive relationship between fiber and health score, meaning cereals with more fiber tend to have higher health scores and are generally healthier.

Visualization 5: Health Categories

ggplot(cereal_clean, aes(x = Health_Category)) +
  geom_bar() +
  labs(
    title = "Cereals Grouped by Health Category",
    x = "Health Category",
    y = "Number of Cereals"
  ) +
  theme_minimal()

Explanation

The bar chart shows that cereals are evenly distributed across the three health categories—healthier, moderate, and less healthy—indicating a balanced mix of cereal types in the dataset.

Visualization 6: Top 10 Healthiest Cereals

top_10_healthiest <- cereal_clean |>
  arrange(desc(Health_Score)) |>
  slice_head(n = 10)

ggplot(top_10_healthiest, aes(x = reorder(Name, Health_Score), y = Health_Score)) +
  geom_col() +
  coord_flip() +
  labs(
    title = "Top 10 Healthiest Cereals",
    x = "Cereal Name",
    y = "Health Score"
  ) +
  theme_minimal()

Explanation

The bar chart shows the top 10 healthiest cereals based on the health score, with Frosted Mini-Wheats having the highest score, followed by Special K and Wheaties, indicating these cereals have better nutritional values compared to others.

Visualization 7: Top 10 Least Healthy Cereals

top_10_least_healthy <- cereal_clean |>
  arrange(Health_Score) |>
  slice_head(n = 10)

ggplot(top_10_least_healthy, aes(x = reorder(Name, Health_Score), y = Health_Score)) +
  geom_col() +
  coord_flip() +
  labs(
    title = "Top 10 Least Healthy Cereals",
    x = "Cereal Name",
    y = "Health Score"
  ) +
  theme_minimal()

Explanation

The bar chart shows the top 10 least healthy cereals based on the health score, with Cinnamon Toast Crunch having the lowest score, followed by Reese’s Puffs and Cap’n Crunch, indicating these cereals have higher sugar, fat, and calories and are less healthy.

Best Cereals Based on Health Score

cereal_clean |>
  arrange(desc(Health_Score)) |>
  select(Name, Calories, Protein, Fat, Sodium, Fiber, Sugars, Health_Score, Health_Category) |>
  slice_head(n = 10)
# A tibble: 10 × 9
   Name  Calories Protein   Fat Sodium Fiber Sugars Health_Score Health_Category
   <chr>    <dbl>   <dbl> <dbl>  <dbl> <dbl>  <dbl>        <dbl> <chr>          
 1 Fros…      175     5     0.8      5   5     10           5.48 Healthier      
 2 Spec…      117     7     0.4    224   0.8    4           4.89 Healthier      
 3 Whea…      107     3     1      218   3      4           3.24 Healthier      
 4 Corn…      101     2     0.1    202   0.8    3           2.80 Healthier      
 5 Total      129     4     0.9    256   3.7    6.7         2.74 Healthier      
 6 King…       80     1.3   0.7    173   0.9    4           2.64 Healthier      
 7 Kix         87     1.5   0.5    205   0.8    2.3         2.60 Healthier      
 8 Prod…      100     2     0.4    207   1      4           2.41 Healthier      
 9 Rice…       94     1.6   0.2    234   0.1    1.6         2.13 Healthier      
10 Mult…      108     2     1.2    201   2.8    6           2.11 Healthier      

The table shows the top 10 healthiest cereals based on the health score, with Frosted Mini-Wheats having the highest score, mainly due to higher fiber and protein and lower fat and sugar compared to other cereals.

Conclusion

This project shows that cereals with less sugar, calories, fat, and sodium are generally healthier, while cereals with more fiber and protein are better choices. Overall, nutrition values help in choosing healthier cereals.