MATH 138 - Lab 5: Cereal Box Plots

Learning Objectives:

Create boxplots to describe distributions of a variable
Explore subgroups within the data
Use numerical summaries to describe characteristics of the data

Step 1: Import your data

cereal<-read.csv("https://raw.githubusercontent.com/kitadasmalley/MATH138/main/HAWKES/Data/cerealDat.csv",
                   header=TRUE)

How many observations (rows) and variables (columns) are in this dataset?

Step 2: Look at the data

str(cereal)

List the variables. Make note of which variables are numeric and which are categorical.

Step 3: Find the mean and standard deviation

mean(cereal$Sugars)

sd(cereal$Sugars)

Step 4: Write a five-number summary

summary(cereal$Sugars)

What does the relationship between the mean and median tell you about the shape of the distribution?

Step 5: Draw a box plot

library(tidyverse)

ggplot(cereal, aes(y=Sugars))+
  geom_boxplot()

Sketch the box plot for sugars using the information from Step 4 to annotate the plot.

Step 6: Make a hypothesis

It is commonly thought that cereals are displayed in certain shelf locations at a market to draw the attention of children. Make a hypothesis about shelf location (Shelf) and the sugar content in a serving of cereal.

(BONUS) Step 7: Summaries for Subgroups

cereal%>%
  group_by(Shelf)%>%
  summarise(avgSug=mean(Sugars, na.rm=TRUE),
            medSug=median(Sugars, na.rm=TRUE))

Step 8: Draw a side-by-side box plot

ggplot(cereal, aes(x=Shelf, y=Sugars, fill=Shelf))+
  geom_boxplot()

Sketch the graph comparing shelf location and sugars.

Step 9: Summary

Summarize your findings about sugar content in cereals. Does your plot from Step 8 support your hypothesis in Step 6?