The purpose of this document is to provide an overview of data analysis and visualization for the different types of cereals.
Type;
Manufacturer;
The data set used in this overview was taken from: https://www.kaggle.com/crawford/80-cereals/data
| Name | Manufacturer | Type | Calories | Protein | Fat | Sodium | Fibre | Carbohydrates | Sugar | Potassium | Vitamins | Shelf | Weight | Cups | Rating |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 100% Bran | N | C | 70 | 4 | 1 | 130 | 10.0 | 5.0 | 6 | 280 | 25 | 3 | 1 | 0.33 | 68.40297 |
| 100% Natural Bran | Q | C | 120 | 3 | 5 | 15 | 2.0 | 8.0 | 8 | 135 | 0 | 3 | 1 | 1.00 | 33.98368 |
| All-Bran | K | C | 70 | 4 | 1 | 260 | 9.0 | 7.0 | 5 | 320 | 25 | 3 | 1 | 0.33 | 59.42551 |
| All-Bran with Extra Fiber | K | C | 50 | 4 | 0 | 140 | 14.0 | 8.0 | 0 | 330 | 25 | 3 | 1 | 0.50 | 93.70491 |
| Almond Delight | R | C | 110 | 2 | 2 | 200 | 1.0 | 14.0 | 8 | -1 | 25 | 3 | 1 | 0.75 | 34.38484 |
| Apple Cinnamon Cheerios | G | C | 110 | 2 | 2 | 180 | 1.5 | 10.5 | 10 | 70 | 25 | 1 | 1 | 0.75 | 29.50954 |
## Name Manufacturer Type Calories
## 100% Bran : 1 A: 1 C:74 Min. : 50.0
## 100% Natural Bran : 1 G:22 H: 3 1st Qu.:100.0
## All-Bran : 1 K:23 Median :110.0
## All-Bran with Extra Fiber: 1 N: 6 Mean :106.9
## Almond Delight : 1 P: 9 3rd Qu.:110.0
## Apple Cinnamon Cheerios : 1 Q: 8 Max. :160.0
## (Other) :71 R: 8
## Protein Fat Sodium Fibre
## Min. :1.000 Min. :0.000 Min. : 0.0 Min. : 0.000
## 1st Qu.:2.000 1st Qu.:0.000 1st Qu.:130.0 1st Qu.: 1.000
## Median :3.000 Median :1.000 Median :180.0 Median : 2.000
## Mean :2.545 Mean :1.013 Mean :159.7 Mean : 2.152
## 3rd Qu.:3.000 3rd Qu.:2.000 3rd Qu.:210.0 3rd Qu.: 3.000
## Max. :6.000 Max. :5.000 Max. :320.0 Max. :14.000
##
## Carbohydrates Sugar Potassium Vitamins
## Min. :-1.0 Min. :-1.000 Min. : -1.00 Min. : 0.00
## 1st Qu.:12.0 1st Qu.: 3.000 1st Qu.: 40.00 1st Qu.: 25.00
## Median :14.0 Median : 7.000 Median : 90.00 Median : 25.00
## Mean :14.6 Mean : 6.922 Mean : 96.08 Mean : 28.25
## 3rd Qu.:17.0 3rd Qu.:11.000 3rd Qu.:120.00 3rd Qu.: 25.00
## Max. :23.0 Max. :15.000 Max. :330.00 Max. :100.00
##
## Shelf Weight Cups Rating
## Min. :1.000 Min. :0.50 Min. :0.250 Min. :18.04
## 1st Qu.:1.000 1st Qu.:1.00 1st Qu.:0.670 1st Qu.:33.17
## Median :2.000 Median :1.00 Median :0.750 Median :40.40
## Mean :2.208 Mean :1.03 Mean :0.821 Mean :42.67
## 3rd Qu.:3.000 3rd Qu.:1.00 3rd Qu.:1.000 3rd Qu.:50.83
## Max. :3.000 Max. :1.50 Max. :1.500 Max. :93.70
##
Question 1: Which Manufacturer have cereals with the most fat?
It can be observed on the histogram, that Manufacturer K(Kelloggs) has the most fat content.
Question 2: What are the type of cereals the different manufacturers product?
It can be observed on the histogram, that Type C(Cold) cereals are manufacturered the most.We can also see that the Manufacturers for N (Nabisco) and Q (Quaker Oats ) product both hot and cold cereals.
Question 3: Which type of cereal persons prefer?
The Boxplot compares the rating of Cereals by the different types. It can be observed that Hot type of cereals have a Minimum rating = 51, Q1 rating = 53 , Median rating = 55, Q3 rating = 60 and Maximum rating = 65 with a right skew. Cold type Cereals have a Minimum rating = 18, Q1 rating = 33 , Median rating = 40, Q3 rating = 50 and Maximum rating = 95 (including the 1 outliner) with a right skew.
Question 4: How much Calories one can get per serving?
The scatter plot shows, the amount of Calories you can get from a One cup by Manufactures.
Question 5: Which cereal is the unhealthiest?
It is observed in the scatter plot, there is no relationship between hot and cold cereals, additionally it can be observed that cold cereals has the most fat and sugar content.
Question 6: Which type of cereal will give you more energy(protein)?
It can be observed on the histograms, that eating Manufacturer K(Kelloggs) cold Cereal you will get more energy.
Question 7: Which Manufacturer product Cereals with the most Sodium?
The Box plot compares the amount of Potassium that are in the different type of Cereals. It can be observed that Hot type of cereals have a Minimum = 0 Potassium, Q1 = 49 Potassium, Median = 98 Potassium, Q3 = 101 Potassium and Maximum = 110 Potassium with a left skew. Cold type Cereals have a Minimum = 0 Potassium, Q1 = 30 Potassium, Median = 80 Potassium, Q3 = 110 Potassium and Maximum = 330 Potassium (including the 4 outliners) with a right skew.
Question 8: What is the average amount of Carbohydrates?
It can be observed on the histogram, that the average amount of Carbohydrates one can get from eating cereal hot or cold is 14.5974026.
Question 9: What is the total amount of Fiber you can get from eating your cereal cold or hot?
It can be observed on the histogram, that the total amount of fiber you can get from eating you cereal cold is 74 and hot is 3.
Question 10: Which Manufacturer have cereals with the most Vitamins?
It can be observed on the histogram, that Manufacturer G (General Mills) is rich in vitamins.
##
## Call:
## lm(formula = Rating ~ Fat, data = train)
##
## Coefficients:
## (Intercept) Fat
## 47.725 -5.248
##
## Call:
## lm(formula = Rating ~ Fat, data = train)
##
## Residuals:
## Min 1Q Median 3Q Max
## -20.081 -7.102 -2.116 7.976 25.926
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 47.725 2.560 18.643 <2e-16 ***
## Fat -5.248 1.963 -2.673 0.0102 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 11.94 on 48 degrees of freedom
## Multiple R-squared: 0.1295, Adjusted R-squared: 0.1114
## F-statistic: 7.143 on 1 and 48 DF, p-value: 0.01025
## [1] -0.3599192
## Analysis of Variance Table
##
## Response: Rating
## Df Sum Sq Mean Sq F value Pr(>F)
## Fat 1 1018.3 1018.35 7.1434 0.01025 *
## Residuals 48 6842.8 142.56
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## [1] 393.8402
## [1] 399.5763
##
## Call:
## lm(formula = Rating ~ Sugar, data = train)
##
## Coefficients:
## (Intercept) Sugar
## 58.616 -2.324
##
## Call:
## lm(formula = Rating ~ Sugar, data = train)
##
## Residuals:
## Min 1Q Median 3Q Max
## -12.8051 -5.3921 -0.7764 4.7406 23.7296
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 58.6163 2.1726 26.980 < 2e-16 ***
## Sugar -2.3238 0.2688 -8.646 2.37e-11 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8.002 on 48 degrees of freedom
## Multiple R-squared: 0.609, Adjusted R-squared: 0.6008
## F-statistic: 74.75 on 1 and 48 DF, p-value: 2.367e-11
## [1] -0.7803697
## Analysis of Variance Table
##
## Response: Rating
## Df Sum Sq Mean Sq F value Pr(>F)
## Sugar 1 4787.2 4787.2 74.755 2.367e-11 ***
## Residuals 48 3073.9 64.0
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## [1] 353.8275
## [1] 359.5636
##
## Call:
## lm(formula = Rating ~ Calories, data = train)
##
## Coefficients:
## (Intercept) Calories
## 86.5206 -0.4161
##
## Call:
## lm(formula = Rating ~ Calories, data = train)
##
## Residuals:
## Min 1Q Median 3Q Max
## -18.3546 -5.1485 -0.0718 6.5752 23.7289
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 86.52055 6.95497 12.440 < 2e-16 ***
## Calories -0.41609 0.06465 -6.436 5.4e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 9.376 on 48 degrees of freedom
## Multiple R-squared: 0.4632, Adjusted R-squared: 0.452
## F-statistic: 41.42 on 1 and 48 DF, p-value: 5.399e-08
## [1] -0.6805819
## Analysis of Variance Table
##
## Response: Rating
## Df Sum Sq Mean Sq F value Pr(>F)
## Calories 1 3641.2 3641.2 41.417 5.399e-08 ***
## Residuals 48 4219.9 87.9
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## [1] 369.6713
## [1] 375.4073
The Models have a value of(anove, the lower the value the stronger it is);
There R-Square values are(the higher the R-square value the better the fit of the model);
Correlation(the closer to 1 or -1 the stronger the correlation);
AIC (the model with the lowest AIC score is preferred);
BIC (the model with the lowest BIC score is preferred);
| actuals.Name | actuals.Manufacturer | actuals.Type | actuals.Calories | actuals.Protein | actuals.Fat | actuals.Sodium | actuals.Fibre | actuals.Carbohydrates | actuals.Sugar | actuals.Potassium | actuals.Vitamins | actuals.Shelf | actuals.Weight | actuals.Cups | actuals.Rating | predicteds | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 13 | Cinnamon Toast Crunch | G | C | 120 | 1 | 3 | 210 | 0.0 | 13 | 9 | 45 | 25 | 2 | 1 | 0.75 | 19.82357 | 37.70189 |
| 33 | Grape Nuts Flakes | P | C | 100 | 3 | 1 | 140 | 3.0 | 15 | 5 | 85 | 25 | 3 | 1 | 0.88 | 52.07690 | 46.99720 |
| 22 | Crispix | K | C | 110 | 2 | 0 | 220 | 1.0 | 21 | 3 | 30 | 25 | 3 | 1 | 1.00 | 46.89564 | 51.64485 |
| 26 | Frosted Flakes | K | C | 110 | 1 | 0 | 200 | 1.0 | 14 | 11 | 25 | 25 | 1 | 1 | 0.75 | 31.43597 | 33.05424 |
| 73 | Triples | G | C | 110 | 2 | 1 | 250 | 0.0 | 21 | 3 | 60 | 25 | 3 | 1 | 0.75 | 39.10617 | 51.64485 |
| 58 | Quaker Oatmeal | Q | H | 100 | 5 | 2 | 0 | 2.7 | -1 | -1 | 110 | 0 | 1 | 1 | 0.67 | 50.82839 | 60.94015 |
From the comparisons it can be observed that, Model 2 is the best fit and the most accurate model of the dataset, for it has a stronger correlation(steeper curve and a higher R-square value).It can be predicted that the Rating goes up when there is More Calories.