Source http://www.worldactiononsalt.com/less/surveys/2016/190129.html Original data http://www.worldactiononsalt.com/less/surveys/2016/190144.pdf
require("tidyverse")
require("ggthemes")
df <- read_csv("cereal.csv")
Parsed with column specification:
cols(
Continent = col_character(),
Country = col_character(),
`Retailer/ Brand` = col_character(),
`Product name` = col_character(),
`Portion or serving size (g)` = col_integer(),
`Total Sugars (g) per serving` = col_double(),
`Salt (g) per serving` = col_double(),
`Total Sugars (g) per 100g` = col_double(),
`Salt (g) per 100g` = col_double()
)
summary(df)
Continent Country Retailer/ Brand Product name Portion or serving size (g) Total Sugars (g) per serving Salt (g) per serving Total Sugars (g) per 100g
Length:291 Length:291 Length:291 Length:291 Min. :25.0 Min. : 2.40 Min. :0.020 Min. : 8.0
Class :character Class :character Class :character Class :character 1st Qu.:30.0 1st Qu.: 4.20 1st Qu.:0.210 1st Qu.:14.0
Mode :character Mode :character Mode :character Mode :character Median :30.0 Median : 7.50 Median :0.280 Median :25.0
Mean :30.6 Mean : 7.32 Mean :0.275 Mean :23.9
3rd Qu.:30.0 3rd Qu.:10.15 3rd Qu.:0.340 3rd Qu.:31.1
Max. :40.0 Max. :17.00 Max. :0.580 Max. :56.7
Salt (g) per 100g
Min. :0.080
1st Qu.:0.700
Median :0.900
Mean :0.895
3rd Qu.:1.130
Max. :1.930
Are differences in sugar per 100g of the same product by country?
df %>%
group_by(Country, `Product name`) %>%
summarise( MeanSugarPer100g = mean(`Total Sugars (g) per 100g`)) %>%
ggplot(aes(reorder(`Product name`, MeanSugarPer100g), y=MeanSugarPer100g)) +
geom_boxplot() +
coord_flip() +
scale_x_discrete(name = "Product Name") + scale_y_continuous(name = "Mean Sugar Content per 100g") +
ggtitle("Mean Sugar Content of the Same Cereal in \nDifferent Countries") +
theme_economist()
`panel.margin` is deprecated. Please use `panel.spacing` property instead`legend.margin` must be specified using `margin()`. For the old behavior use legend.spacing
With mean and SD
df %>%
group_by(Country, `Product name`) %>%
summarise( MeanSugarPer100g = mean(`Total Sugars (g) per 100g`)) %>%
ggplot(aes(reorder(`Product name`, MeanSugarPer100g), y=MeanSugarPer100g)) +
geom_boxplot() +
stat_summary(fun.y=mean, geom="point", shape=1,
size=0.1, color="red") +
stat_summary(fun.data=mean_sdl, mult=1,
geom="pointrange", color="red", width=0.01) +
coord_flip() +
scale_x_discrete(name = "Product Name") + scale_y_continuous(name = "Mean Sugar Content per 100g") +
ggtitle("Mean Sugar Content of the Same Cereal in \nDifferent Countries") +
theme_economist()
Ignoring unknown parameters: mult, width`panel.margin` is deprecated. Please use `panel.spacing` property instead`legend.margin` must be specified using `margin()`. For the old behavior use legend.spacing
df %>%
group_by(Country, `Product name`) %>%
summarise( MeanSaltPer100g = mean(`Salt (g) per 100g`)) %>%
ggplot(aes(reorder(`Product name`, MeanSaltPer100g), y=MeanSaltPer100g)) +
geom_boxplot() +
coord_flip() +
scale_x_discrete(name = "Product Name") + scale_y_continuous(name = "Mean Salt Content per 100g") +
ggtitle("Mean Salt Content of the Same Cereal in \nDifferent Countries") +
theme_economist() # add geom_jitter(aes(colour = Country)) + to jitter
`panel.margin` is deprecated. Please use `panel.spacing` property instead`legend.margin` must be specified using `margin()`. For the old behavior use legend.spacing
With mean and SD
df %>%
group_by(Country, `Product name`) %>%
summarise( MeanSaltPer100g = mean(`Salt (g) per 100g`)) %>%
ggplot(aes(reorder(`Product name`, MeanSaltPer100g), y=MeanSaltPer100g)) +
geom_boxplot() +
stat_summary(fun.y=mean, geom="point", shape=1,
size=0.1, color="red") +
stat_summary(fun.data=mean_sdl, mult=1,
geom="pointrange", color="red", width=0.01) +
coord_flip() +
scale_x_discrete(name = "Product Name") + scale_y_continuous(name = "Mean Salt Content per 100g") +
ggtitle("Mean Salt Content of the Same Cereal in \nDifferent Countries") +
theme_economist() # add geom_jitter(aes(colour = Country)) + to jitter
Ignoring unknown parameters: mult, width`panel.margin` is deprecated. Please use `panel.spacing` property instead`legend.margin` must be specified using `margin()`. For the old behavior use legend.spacing
Is any brand with cereal with high levels of sugar and salt?
I will subset “All bran flakes”. “Cornflakes”, “Froesties” and “Nesquik”, since are available in most countries and will allow a comparison
Is any difference in the content of sugar per 100g by continent?
sugar
Call:
aov(formula = `Total Sugars (g) per 100g` ~ Continent, data = df2)
Terms:
Continent Residuals
Sum of Squares 284 12089
Deg. of Freedom 5 85
Residual standard error: 11.9
Estimated effects may be unbalanced
summary(sugar)
Df Sum Sq Mean Sq F value Pr(>F)
Continent 5 284 56.7 0.4 0.85
Residuals 85 12089 142.2
Conclusion: Accept H_o
df2 %>%
group_by(Country, `Product name`) %>%
summarise( MeanSugarPer100g = mean(`Total Sugars (g) per 100g`)) %>%
ggplot(aes(reorder(`Product name`, MeanSugarPer100g), y=MeanSugarPer100g)) +
geom_boxplot() +
stat_summary(fun.y=mean, geom="point", shape=1,
size=0.1, color="red") +
stat_summary(fun.data=mean_sdl, mult=1,
geom="pointrange", color="red", width=0.01) +
coord_flip() +
scale_x_discrete(name = "Product Name") + scale_y_continuous(name = "Mean Sugar Content per 100g") +
ggtitle("Mean Sugar Content of the Same Cereal in \nDifferent Countries") +
theme_economist()
Ignoring unknown parameters: mult, width`panel.margin` is deprecated. Please use `panel.spacing` property instead`legend.margin` must be specified using `margin()`. For the old behavior use legend.spacing
df2 %>%
group_by(Country, `Product name`) %>%
summarise( MeanSaltPer100g = mean(`Salt (g) per 100g`)) %>%
ggplot(aes(reorder(`Product name`, MeanSaltPer100g), y=MeanSaltPer100g)) +
geom_boxplot() +
stat_summary(fun.y=mean, geom="point", shape=1,
size=0.1, color="red") +
stat_summary(fun.data=mean_sdl, mult=1,
geom="pointrange", color="red", width=0.01) +
coord_flip() +
scale_x_discrete(name = "Product Name") + scale_y_continuous(name = "Mean Salt Content per 100g") +
ggtitle("Mean Salt Content of the Same Cereal in \nDifferent Countries") +
theme_economist() # add geom_jitter(aes(colour = Country)) + to jitter
Ignoring unknown parameters: mult, width`panel.margin` is deprecated. Please use `panel.spacing` property instead`legend.margin` must be specified using `margin()`. For the old behavior use legend.spacing
Is any difference in the content of salt per 100g by continent?
salt
Call:
aov(formula = `Salt (g) per 100g` ~ Continent, data = df2)
Terms:
Continent Residuals
Sum of Squares 2.74 8.74
Deg. of Freedom 5 85
Residual standard error: 0.321
Estimated effects may be unbalanced
summary(salt)
Df Sum Sq Mean Sq F value Pr(>F)
Continent 5 2.74 0.548 5.33 0.00026 ***
Residuals 85 8.74 0.103
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Conclusion: Reject H_o; conclude not all means equal (P = 0.00026) seems that there ARE significant differences in the content of salt for the same product in different continents
TukeyHSD(salt, "Continent")
Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = `Salt (g) per 100g` ~ Continent, data = df2)
$Continent
diff lwr upr p adj
Asia-Africa 0.25042 -0.3220 0.8228 0.797
Europe-Africa -0.00122 -0.5603 0.5578 1.000
North America-Africa 0.51182 -0.0970 1.1206 0.151
Oceania-Africa 0.21333 -0.4476 0.8743 0.935
South America-Africa 0.08000 -0.5809 0.7409 0.999
Europe-Asia -0.25164 -0.4919 -0.0114 0.035
North America-Asia 0.26140 -0.0789 0.6017 0.231
Oceania-Asia -0.03708 -0.4637 0.3896 1.000
South America-Asia -0.17042 -0.5971 0.2562 0.852
North America-Europe 0.51304 0.1956 0.8304 0.000
Oceania-Europe 0.21455 -0.1940 0.6231 0.645
South America-Europe 0.08122 -0.3273 0.4898 0.992
Oceania-North America -0.29848 -0.7729 0.1759 0.450
South America-North America -0.43182 -0.9062 0.0426 0.096
South America-Oceania -0.13333 -0.6730 0.4063 0.979
Is any brand with cereal with high levels of sugar and salt?