Cereal Sodium Sugar Type
1 Frosted Mini Wheats 0 11 A
2 Raisin Bran 340 18 A
3 All Bran 70 5 A
4 Apple Jacks 140 14 C
5 Captain Crunch 200 12 C
6 Cheerios 180 1 C
7 Cinnamon Toast Crunch 210 10 C
8 Crackling Oat Bran 150 16 A
9 Fiber One 100 0 A
10 Frosted Flakes 130 12 C
11 Froot Loops 140 14 C
12 Honey Bunches of Oats 180 7 A
13 Honey Nut Cheerios 190 9 C
14 Life 160 6 C
15 Rice Krispies 290 3 C
16 Honey Smacks 50 15 A
17 Special K 220 4 A
18 Wheaties 180 4 A
19 Corn Flakes 200 3 A
20 Honeycomb 210 11 C
Briefly describe the data
The data has 20 popular cereal brands and lists the sodium and sugar content of each. It also gives the “type” of each cereal, which is either “A” or “C”, though I am not sure what type actually refers to about the cereal.
Tidy Data (as needed)
I tidied the data by making the Type be a factor rather than a character string. Other than that no tidying was needed.
cereal <-mutate(cereal_raw, Type =factor(Type))
Univariate Visualizations
First, we can visualize the sodium content across cereals. On the first attempt, we see that using a bin width which is too small does not really capture the general shape of the data. So, I increased the bin width to 60 on the second attempt which did a better job of showing the overall concentration of sodium values around 200 and their diminishing spread above and below.
ggplot(cereal, aes(Sodium)) +geom_histogram(fill ="blue", color ="black") +labs(title ="Histogram of Sodium Content",x ="Sodium Content",y ="Frequency")
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
ggplot(cereal, aes(Sodium)) +geom_histogram(binwidth =60, fill ="blue", color ="black", alpha=0.5) +labs(title ="Histogram of Sodium Content",x ="Sodium Content",y ="Frequency")
We can also visualize the sugar content of the various cereals in the same fashion. We see that the sugar data is not very well concentrated around a single number, but is instead fairly spread out.
ggplot(cereal, aes(x = Sugar)) +geom_histogram(binwidth =2, fill ="blue", color ="black", alpha =0.5) +labs(title ="Histogram and Density Overlay of Sugar Content",x ="Sugar Content",y ="Frequency")
We can also visualize the distribution of the Type of the cereals using a barchart. We see that there are an exactly equal number of Type A’s and Type C’s.
# Assuming 'cereal' is your datasetggplot(cereal, aes(Type)) +geom_bar(fill ="skyblue", color ="black", alpha =0.7) +labs(title ="Number of Cereals by Type",x ="Cereal Type",y ="Count")
Bivariate Visualization(s)
We can create a bivariate graph which compares the sodium content of a cereal to its sugar content. We see that there is not a strong relationship between the two measures.
ggplot(cereal, aes(x = Sodium, y = Sugar, label = Cereal, color = Type)) +geom_point(size =3, alpha =0.7) +geom_text(hjust =-0.1, vjust =0.5, size =3) +labs(title ="Scatter Plot of Sodium vs Sugar Content with Type Labels",x ="Sodium Content",y ="Sugar Content") +scale_color_discrete(name ="Type")
We can also try to visualize the differences in sugar or sodium content between the types of cereal. We observe that for both sugar and sodium content, tpye C cereals are more well concentrated around the median than type A. In other words, they have significantly less spread.