Tip: There are six possible classic colors of Smarties: brown, yellow, orange, red, green, and white. Ideally, each color is supposed to be equally likely. So, the expected proportion of white smarties is 1/6 or about 16.67%.
# Load data
library(readr)
smarties <- read_csv("~/Library/CloudStorage/OneDrive-UniversityofMaryland/AY-2025-26/Fall 2025/TA - INST 314/Lectures and Labs/Week 04/Warmup - Smarties/smarties.csv", show_col_types = FALSE)
class(smarties)
## [1] "spec_tbl_df" "tbl_df" "tbl" "data.frame"
# Visualize the data as a binomial distribution
hist(smarties$`Number of white smarties`,
breaks = 40,
main = "Distribution of white smarties counties in bags",
xlab = "Number of white smarties",
ylab = "")
# Visualize the data as a histogram of total smarties in each bag
hist(smarties$`Number of total smarties`,
breaks = 40,
main = "Distribution of total smarties counties in bags",
xlab = "Total number of smarties",
ylab = "")
# Calculate the proportion of white smarties in each bag
smarties$ProportionWhite <- smarties$`Number of white smarties` / smarties$`Number of total smarties`
# Convert the proportion of smarties column to a percent
smarties$PercentWhite <- smarties$ProportionWhite * 100
# View the summary statistics of the percent column
summary(smarties$PercentWhite)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 13.33 15.40 21.36 26.67 500.00
# Use ggplot2 to create a histogram of the proportion of white smarties in each bag using the percent column
library(ggplot2)
library(ggthemes)
ggplot(smarties, aes(x = PercentWhite)) +
geom_histogram(binwidth = 5, fill = "maroon", color = "black") +
xlim(0, 100) +
labs(x = "Proportion of White Smarties (%)",
y = "Count of Bags") +
ggtitle("Distribution of the proportion of white smarties") +
theme_minimal()
## Warning: Removed 1 row containing non-finite outside the scale range
## (`stat_bin()`).
## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_bar()`).
# What is our question about the data? Is the distribution of white
smarties in the bags different from the expected proportion of
16.6666666667%?
# Count the number of participants (or n?)
nrow(smarties)
## [1] 148
# Count the number of white smarties in all bags
sum(smarties$`Number of white smarties`)
## [1] 520
# Count the total number of smarties in all bags
sum(smarties$`Number of total smarties`)
## [1] 2730
# Calculate the overall percentage of white smarties in all bags
sum(smarties$`Number of white smarties`) / sum(smarties$`Number of total smarties`)*100
## [1] 19.04762
So, 19% of the smarties in our class demo were white.
Recall the R syntax for a binomial test
??binom.test
# Conduct a binomial test to see if the proportion of white smarties is different
# x = number of successes (white smarties)
# n = number of trials (total smarties)
# p = hypothesized probability or prop of white smarties (100/6 colors = 0.16 or 16%) See: https://www.smarties.com/faqs/ and the "We manufacture six (6) colors of candy tablets that are randomly mixed together"
binom.test(520, 2730, p = 0.16666666667, alternative = "two.sided")
##
## Exact binomial test
##
## data: 520 and 2730
## number of successes = 520, number of trials = 2730, p-value = 0.001009
## alternative hypothesis: true probability of success is not equal to 0.1666667
## 95 percent confidence interval:
## 0.1759010 0.2057181
## sample estimates:
## probability of success
## 0.1904762
# Note: 520/2730 = 0.1908406 or 19.08%. We are comparing this observed proportion of white smarties to the hypothesized proportion of 16.666666667%
Interpret the results of the binomial test
The p-value is much less than than 0.01 (p = 0.001009), so we reject the null hypothesis and conclude that the proportion of white smarties is significantly different from the hypothesized proportion of 16.6666666667%.
The 95% confidence interval for the proportion of white smarties is from 0.1757 to 0.2061. It means that we are 95% confident that the true proportion of white smarties in the population is between 17.57% and 20.61%.