The objective of this analysis was to explore the nutritional efficiency of the cereal sample contained in the 80 Cereals dataset, available on Kaggle https://www.kaggle.com/datasets/crawford/80-cereals , based on the main macronutrients: carbohydrates, proteins, fiber, and fats.
The data were segmented according to the level of calories per gram, forming two groups: Group A, composed of cereals with calories per gram equal to or above the standardized mean, and Group B, composed of cereals with values below the standardized mean. This classification was performed using z-score standardization.
The main advantage of z-score standardization is that it allows each observation to be interpreted in terms of its relative position around the center of the distribution, measured in standard deviation units. This makes it possible to assess how high or low a given value is while accounting for the dataset’s variability, resulting in more consistent comparisons across observations.
During the analysis process, problematic cases were identified in the sugar variable, including records with negative values, which are incompatible with the nature of the variable. Considering that nutritional information represents a competitive advantage or disadvantage for food products, and since it was not possible to accurately determine the correct values for these observations, these cases were excluded from the sample. Imputation using the mean, zero, or any other method could introduce substantial bias into the analysis, including in subsequent applications involving Machine Learning algorithms.
After removing these inconsistent observations, the sample was reduced to 73 observations. In addition, a single hot cereal was excluded, resulting in a final sample composed exclusively of cold cereals.
The results show statistically significant differences between the mean protein and fat levels of Groups A and B. Group B presents, on average, a higher protein content and a lower fat content. In contrast, carbohydrate levels remain similar between the two groups, with no statistically significant difference.
Furthermore, in the relative protein density analysis, the cereal 100% Bran demonstrated the highest nutritional efficiency outside the zero-sugar and zero-fat group. Within that specific group, All-Bran with Extra Fiber showed the best performance.
Table 1 – Zero sugar and fat cereals ranked by protein content | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
Name | Calories | Protein | Fat | Sodium | fiber | Carbohydrates | Sugars | Potassium | Vitamins | Calories per gram | Group |
All-Bran with Extra Fiber | 50 | 4 | 0 | 140 | 14 | 8 | 0 | 330 | 25 | 1.763668 | B |
Shredded Wheat 'n'Bran | 90 | 3 | 0 | 0 | 4 | 19 | 0 | 140 | 0 | 3.174603 | B |
Shredded Wheat spoon size | 90 | 3 | 0 | 0 | 3 | 20 | 0 | 120 | 0 | 3.174603 | B |
Puffed Wheat | 50 | 2 | 0 | 0 | 1 | 10 | 0 | 50 | 0 | 3.527337 | B |
Shredded Wheat | 80 | 2 | 0 | 0 | 3 | 16 | 0 | 95 | 0 | 3.399843 | B |
Puffed Rice | 50 | 1 | 0 | 0 | 0 | 13 | 0 | 15 | 0 | 3.527337 | B |
Source: Prepared by Bruno Araújo | |||||||||||
As we can see, the cereal All-Bran with Extra Fiber presented the best nutritional profile among the zero-sugar and zero-fat groups. Although the difference in protein content is small, the cereal stands out for having the lowest caloric density.
Group A showed greater variability in protein density, as indicated by the size of the box. In contrast, Group B exhibited lower variability, but with the presence, in this case positively, of many outliers.
Table 2 – Protein density relative to carbohydrates: Top cereals | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
Name | Calories | Protein | Fat | Sodium | fiber | Carbohydrates | Sugars | Potassium | Vitamins | Calories per gram | Group | Protein density (%) |
100% Bran | 70 | 4 | 1 | 130 | 10.0 | 5.0 | 6 | 280 | 25 | 2.469136 | B | 80.00 |
All-Bran | 70 | 4 | 1 | 260 | 9.0 | 7.0 | 5 | 320 | 25 | 2.469136 | B | 57.14 |
All-Bran with Extra Fiber | 50 | 4 | 0 | 140 | 14.0 | 8.0 | 0 | 330 | 25 | 1.763668 | B | 50.00 |
100% Natural Bran | 120 | 3 | 5 | 15 | 2.0 | 8.0 | 8 | 135 | 0 | 4.232804 | A | 37.50 |
Special K | 110 | 6 | 0 | 230 | 1.0 | 16.0 | 3 | 55 | 25 | 3.880071 | A | 37.50 |
Cheerios | 110 | 6 | 2 | 290 | 2.0 | 17.0 | 1 | 105 | 25 | 3.880071 | A | 35.29 |
Life | 100 | 4 | 2 | 150 | 2.0 | 12.0 | 6 | 95 | 25 | 3.527337 | B | 33.33 |
Cracklin' Oat Bran | 110 | 3 | 3 | 140 | 4.0 | 10.0 | 7 | 160 | 25 | 3.880071 | A | 30.00 |
Quaker Oat Squares | 100 | 4 | 1 | 135 | 2.0 | 14.0 | 6 | 110 | 25 | 3.527337 | B | 28.57 |
Raisin Nut Bran | 100 | 3 | 2 | 140 | 2.5 | 10.5 | 8 | 140 | 25 | 3.527337 | B | 28.57 |
Source: Prepared by Bruno Araújo | ||||||||||||
Table 2 – Protein density relative to fat: Top cereals | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
Name | Calories | Protein | Fat | Sodium | fiber | Carbohydrates | Sugars | Potassium | Vitamins | Calories per gram | Group | Protein density (%) |
100% Bran | 70 | 4 | 1 | 130 | 10.0 | 5.0 | 6 | 280 | 25 | 2.469136 | B | 400 |
All-Bran | 70 | 4 | 1 | 260 | 9.0 | 7.0 | 5 | 320 | 25 | 2.469136 | B | 400 |
Quaker Oat Squares | 100 | 4 | 1 | 135 | 2.0 | 14.0 | 6 | 110 | 25 | 3.527337 | B | 400 |
Cheerios | 110 | 6 | 2 | 290 | 2.0 | 17.0 | 1 | 105 | 25 | 3.880071 | A | 300 |
Grape Nuts Flakes | 100 | 3 | 1 | 140 | 3.0 | 15.0 | 5 | 85 | 25 | 3.527337 | B | 300 |
Honey Nut Cheerios | 110 | 3 | 1 | 250 | 1.5 | 11.5 | 10 | 90 | 25 | 3.880071 | A | 300 |
Just Right Fruit & Nut | 140 | 3 | 1 | 170 | 2.0 | 20.0 | 9 | 95 | 100 | 3.798670 | A | 300 |
Post Nat. Raisin Bran | 120 | 3 | 1 | 200 | 6.0 | 11.0 | 14 | 260 | 25 | 3.182560 | B | 300 |
Raisin Bran | 120 | 3 | 1 | 210 | 5.0 | 14.0 | 12 | 240 | 25 | 3.182560 | B | 300 |
Total Raisin Bran | 140 | 3 | 1 | 190 | 4.0 | 15.0 | 14 | 230 | 100 | 3.292181 | B | 300 |
Source: Prepared by Bruno Araújo | ||||||||||||
he hypothesis to be tested for each macronutrient aims to verify, through an independent samples t-test, whether there is a statistically significant difference between the means of Groups A and B.
For each macronutrient, the following system of hypotheses is established:
H₀ (null hypothesis): the mean of Group A is equal to the mean of Group B.
H₁ (alternative hypothesis): the means of Groups A and B are different.
The decision criterion will be based on the p-value obtained from the test. Considering a 5% significance level (α = 0.05), if p < 0.05, the null hypothesis will be rejected, leading to the conclusion that there is a statistically significant difference between the group means.
Table 3 – Sample Size | |
|---|---|
Group | Size |
A | 39 |
B | 34 |
Source: Prepared by Bruno Araújo | |
## # A tibble: 2 × 2
## group mean
## <chr> <dbl>
## 1 A 2.23
## 2 B 2.79
##
## Welch Two Sample t-test
##
## data: protein by group
## t = -2.3932, df = 62.629, p-value = 0.01971
## alternative hypothesis: true difference in means between group A and group B is not equal to 0
## 95 percent confidence interval:
## -1.03379362 -0.09290322
## sample estimates:
## mean in group A mean in group B
## 2.230769 2.794118
There is evidence of a statistically significant difference between
the mean protein levels of Groups A and B. Group B presents a higher
average protein content.
## # A tibble: 2 × 2
## group mean
## <chr> <dbl>
## 1 A 15.2
## 2 B 14.2
##
## Welch Two Sample t-test
##
## data: carbo by group
## t = 1.0389, df = 70.844, p-value = 0.3024
## alternative hypothesis: true difference in means between group A and group B is not equal to 0
## 95 percent confidence interval:
## -0.8715554 2.7674830
## sample estimates:
## mean in group A mean in group B
## 15.15385 14.20588
We do not reject the null hypothesis that the means are equal.
## # A tibble: 2 × 2
## group mean
## <chr> <dbl>
## 1 A 1.28
## 2 B 0.676
##
## Welch Two Sample t-test
##
## data: fat by group
## t = 2.7298, df = 65.188, p-value = 0.00814
## alternative hypothesis: true difference in means between group A and group B is not equal to 0
## 95 percent confidence interval:
## 0.1625566 1.0486048
## sample estimates:
## mean in group A mean in group B
## 1.2820513 0.6764706
There is statistically significant evidence that the mean fat content of Group A is higher than that of Group B.