What Makes a Cereal Healthy? An Analysis of Nutritional Factors in Breakfast Cereals
Author
Sandeep Thapa Chhetri
Introduction
Breakfast cereals are a common food choice, but not all cereals are equally healthy. Some cereals have high sugar, calories, fat, and sodium, while others have more fiber and protein.
For this project, I analyzed a breakfast cereal dataset using R in Posit Cloud. The goal of this project is to understand which nutrition factors make a cereal healthier.
Since this dataset does not include a rating variable, I created a health score using nutrition values. Cereals with higher fiber and protein receive a better score, while cereals with higher sugar, calories, fat, and sodium receive a lower score.
Research Question
Do cereals with more sugar tend to be less healthy?
Is there a relationship between calories and health score?
Do cereals with more fiber have better health scores?
Which cereals are the healthiest based on nutrition values?
Load Packages
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.2.1 ✔ readr 2.2.0
✔ forcats 1.0.1 ✔ stringr 1.6.0
✔ ggplot2 4.0.3 ✔ tibble 3.3.1
✔ lubridate 1.9.5 ✔ tidyr 1.3.2
✔ purrr 1.2.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Load the Dataset
cereal <-read_csv("Cereal (1) (1).csv")
Rows: 30 Columns: 10
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (2): Name, Company
dbl (8): Serving, Calories, Fat, Sodium, Carbs, Fiber, Sugars, Protein
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
The dataset includes 30 cereals with nutrition details like calories, sugar, fiber, fat, and protein, which are used to analyze their healthiness.
The table shows different cereals with their nutrition values and calculated health scores, where cereals with higher sugar, calories, and fat tend to have lower (more negative) health scores.
The summary shows that cereals have an average of about 134 calories, 10.4 sugar, and 220 sodium, with relatively low fiber and protein, suggesting many cereals are not very high in nutritional value.
Visualization 1: Distribution of Sugar
ggplot(cereal_clean, aes(x = Sugars)) +geom_histogram(bins =10, color ="white") +labs(title ="Distribution of Sugar in Breakfast Cereals",x ="Sugar",y ="Number of Cereals" ) +theme_minimal()
Explanation
The histogram shows that most cereals have moderate to high sugar levels, with many cereals clustered around 10 to 15 grams of sugar, indicating that sugar content is generally high in breakfast cereals.
Visualization 2: Sugar and Health Score
ggplot(cereal_clean, aes(x = Sugars, y = Health_Score)) +geom_point(size =3) +geom_smooth(method ="lm", se =FALSE) +labs(title ="Relationship Between Sugar and Health Score",x ="Sugar",y ="Health Score" ) +theme_minimal()
`geom_smooth()` using formula = 'y ~ x'
Explanation
The scatterplot shows a negative relationship between sugar and health score, meaning cereals with higher sugar tend to have lower health scores and are generally less healthy.
Visualization 3: Calories and Health Score
ggplot(cereal_clean, aes(x = Calories, y = Health_Score)) +geom_point(size =3) +geom_smooth(method ="lm", se =FALSE) +labs(title ="Relationship Between Calories and Health Score",x ="Calories",y ="Health Score" ) +theme_minimal()
`geom_smooth()` using formula = 'y ~ x'
Explanation
The scatterplot shows a negative relationship between calories and health score, meaning cereals with higher calories tend to have lower health scores and are generally less healthy.
Visualization 4: Fiber and Health Score
ggplot(cereal_clean, aes(x = Fiber, y = Health_Score)) +geom_point(size =3) +geom_smooth(method ="lm", se =FALSE) +labs(title ="Relationship Between Fiber and Health Score",x ="Fiber",y ="Health Score" ) +theme_minimal()
`geom_smooth()` using formula = 'y ~ x'
Explanation
The scatterplot shows a slight positive relationship between fiber and health score, meaning cereals with more fiber tend to have higher health scores and are generally healthier.
Visualization 5: Health Categories
ggplot(cereal_clean, aes(x = Health_Category)) +geom_bar() +labs(title ="Cereals Grouped by Health Category",x ="Health Category",y ="Number of Cereals" ) +theme_minimal()
Explanation
The bar chart shows that cereals are evenly distributed across the three health categories—healthier, moderate, and less healthy—indicating a balanced mix of cereal types in the dataset.
The bar chart shows the top 10 healthiest cereals based on the health score, with Frosted Mini-Wheats having the highest score, followed by Special K and Wheaties, indicating these cereals have better nutritional values compared to others.
The bar chart shows the top 10 least healthy cereals based on the health score, with Cinnamon Toast Crunch having the lowest score, followed by Reese’s Puffs and Cap’n Crunch, indicating these cereals have higher sugar, fat, and calories and are less healthy.
The table shows the top 10 healthiest cereals based on the health score, with Frosted Mini-Wheats having the highest score, mainly due to higher fiber and protein and lower fat and sugar compared to other cereals.
Conclusion
This project shows that cereals with less sugar, calories, fat, and sodium are generally healthier, while cereals with more fiber and protein are better choices. Overall, nutrition values help in choosing healthier cereals.