Summary

The objective of this analysis was to explore the nutritional efficiency of the cereal sample contained in the 80 Cereals dataset, available on Kaggle https://www.kaggle.com/datasets/crawford/80-cereals , based on the main macronutrients: carbohydrates, proteins, fiber, and fats.

The data were segmented according to the level of calories per gram, forming two groups: Group A, composed of cereals with calories per gram equal to or above the standardized mean, and Group B, composed of cereals with values below the standardized mean. This classification was performed using z-score standardization.

The main advantage of z-score standardization is that it allows each observation to be interpreted in terms of its relative position around the center of the distribution, measured in standard deviation units. This makes it possible to assess how high or low a given value is while accounting for the dataset’s variability, resulting in more consistent comparisons across observations.

During the analysis process, problematic cases were identified in the sugar variable, including records with negative values, which are incompatible with the nature of the variable. Considering that nutritional information represents a competitive advantage or disadvantage for food products, and since it was not possible to accurately determine the correct values for these observations, these cases were excluded from the sample. Imputation using the mean, zero, or any other method could introduce substantial bias into the analysis, including in subsequent applications involving Machine Learning algorithms.

After removing these inconsistent observations, the sample was reduced to 73 observations. In addition, a single hot cereal was excluded, resulting in a final sample composed exclusively of cold cereals.

The results show statistically significant differences between the mean protein and fat levels of Groups A and B. Group B presents, on average, a higher protein content and a lower fat content. In contrast, carbohydrate levels remain similar between the two groups, with no statistically significant difference.

Furthermore, in the relative protein density analysis, the cereal 100% Bran demonstrated the highest nutritional efficiency outside the zero-sugar and zero-fat group. Within that specific group, All-Bran with Extra Fiber showed the best performance.

Zero sugar and fat

Table 1 – Zero sugar and fat cereals ranked by protein content

Name

Calories

Protein

Fat

Sodium

fiber

Carbohydrates

Sugars

Potassium

Vitamins

Calories per gram

Group

All-Bran with Extra Fiber

50

4

0

140

14

8

0

330

25

1.763668

B

Shredded Wheat 'n'Bran

90

3

0

0

4

19

0

140

0

3.174603

B

Shredded Wheat spoon size

90

3

0

0

3

20

0

120

0

3.174603

B

Puffed Wheat

50

2

0

0

1

10

0

50

0

3.527337

B

Shredded Wheat

80

2

0

0

3

16

0

95

0

3.399843

B

Puffed Rice

50

1

0

0

0

13

0

15

0

3.527337

B

Source: Prepared by Bruno Araújo


As we can see, the cereal All-Bran with Extra Fiber presented the best nutritional profile among the zero-sugar and zero-fat groups. Although the difference in protein content is small, the cereal stands out for having the lowest caloric density.


Protein Density Analysis

Carbohydrates


Group A showed greater variability in protein density, as indicated by the size of the box. In contrast, Group B exhibited lower variability, but with the presence, in this case positively, of many outliers.


Table 2 – Protein density relative to carbohydrates: Top cereals

Name

Calories

Protein

Fat

Sodium

fiber

Carbohydrates

Sugars

Potassium

Vitamins

Calories per gram

Group

Protein density (%)

100% Bran

70

4

1

130

10.0

5.0

6

280

25

2.469136

B

80.00

All-Bran

70

4

1

260

9.0

7.0

5

320

25

2.469136

B

57.14

All-Bran with Extra Fiber

50

4

0

140

14.0

8.0

0

330

25

1.763668

B

50.00

100% Natural Bran

120

3

5

15

2.0

8.0

8

135

0

4.232804

A

37.50

Special K

110

6

0

230

1.0

16.0

3

55

25

3.880071

A

37.50

Cheerios

110

6

2

290

2.0

17.0

1

105

25

3.880071

A

35.29

Life

100

4

2

150

2.0

12.0

6

95

25

3.527337

B

33.33

Cracklin' Oat Bran

110

3

3

140

4.0

10.0

7

160

25

3.880071

A

30.00

Quaker Oat Squares

100

4

1

135

2.0

14.0

6

110

25

3.527337

B

28.57

Raisin Nut Bran

100

3

2

140

2.5

10.5

8

140

25

3.527337

B

28.57

Source: Prepared by Bruno Araújo


Fats




Table 2 – Protein density relative to fat: Top cereals

Name

Calories

Protein

Fat

Sodium

fiber

Carbohydrates

Sugars

Potassium

Vitamins

Calories per gram

Group

Protein density (%)

100% Bran

70

4

1

130

10.0

5.0

6

280

25

2.469136

B

400

All-Bran

70

4

1

260

9.0

7.0

5

320

25

2.469136

B

400

Quaker Oat Squares

100

4

1

135

2.0

14.0

6

110

25

3.527337

B

400

Cheerios

110

6

2

290

2.0

17.0

1

105

25

3.880071

A

300

Grape Nuts Flakes

100

3

1

140

3.0

15.0

5

85

25

3.527337

B

300

Honey Nut Cheerios

110

3

1

250

1.5

11.5

10

90

25

3.880071

A

300

Just Right Fruit & Nut

140

3

1

170

2.0

20.0

9

95

100

3.798670

A

300

Post Nat. Raisin Bran

120

3

1

200

6.0

11.0

14

260

25

3.182560

B

300

Raisin Bran

120

3

1

210

5.0

14.0

12

240

25

3.182560

B

300

Total Raisin Bran

140

3

1

190

4.0

15.0

14

230

100

3.292181

B

300

Source: Prepared by Bruno Araújo

Tests of Mean Differences in Macronutrients

he hypothesis to be tested for each macronutrient aims to verify, through an independent samples t-test, whether there is a statistically significant difference between the means of Groups A and B.

For each macronutrient, the following system of hypotheses is established:

The decision criterion will be based on the p-value obtained from the test. Considering a 5% significance level (α = 0.05), if p < 0.05, the null hypothesis will be rejected, leading to the conclusion that there is a statistically significant difference between the group means.

Table 3 – Sample Size

Group

Size

A

39

B

34

Source: Prepared by Bruno Araújo


Protein

## # A tibble: 2 × 2
##   group  mean
##   <chr> <dbl>
## 1 A      2.23
## 2 B      2.79


## 
##  Welch Two Sample t-test
## 
## data:  protein by group
## t = -2.3932, df = 62.629, p-value = 0.01971
## alternative hypothesis: true difference in means between group A and group B is not equal to 0
## 95 percent confidence interval:
##  -1.03379362 -0.09290322
## sample estimates:
## mean in group A mean in group B 
##        2.230769        2.794118


There is evidence of a statistically significant difference between the mean protein levels of Groups A and B. Group B presents a higher average protein content.

Carbohydrates

## # A tibble: 2 × 2
##   group  mean
##   <chr> <dbl>
## 1 A      15.2
## 2 B      14.2


## 
##  Welch Two Sample t-test
## 
## data:  carbo by group
## t = 1.0389, df = 70.844, p-value = 0.3024
## alternative hypothesis: true difference in means between group A and group B is not equal to 0
## 95 percent confidence interval:
##  -0.8715554  2.7674830
## sample estimates:
## mean in group A mean in group B 
##        15.15385        14.20588

We do not reject the null hypothesis that the means are equal.

Fats

## # A tibble: 2 × 2
##   group  mean
##   <chr> <dbl>
## 1 A     1.28 
## 2 B     0.676

## 
##  Welch Two Sample t-test
## 
## data:  fat by group
## t = 2.7298, df = 65.188, p-value = 0.00814
## alternative hypothesis: true difference in means between group A and group B is not equal to 0
## 95 percent confidence interval:
##  0.1625566 1.0486048
## sample estimates:
## mean in group A mean in group B 
##       1.2820513       0.6764706

There is statistically significant evidence that the mean fat content of Group A is higher than that of Group B.