Warmup: Practical Application of the Binomial Exact Test

Tip: There are six possible classic colors of Smarties: brown, yellow, orange, red, green, and white. Ideally, each color is supposed to be equally likely. So, the expected proportion of white smarties is 1/6 or about 16.67%.

  1. At the top of your R script, write your question and mind the tip above!
  2. Load your data into RStudio
  3. Summary counts - specify the column(s)
  4. Visualize counts - do you get a binomial distribution or not?, base R
  5. Calculate proportions - base R is OK
  6. Visualize proportions - base R is OK
  7. Conduct a binomial exact test (if you want to know the specific arguments or components in the parentheses to type, type and run ??binom.test in your console)
  8. Write out your response. Which hypothesis is supported or not? Null? Alternative? Why?
# Load data

library(readr)
smarties <- read_csv("~/Library/CloudStorage/OneDrive-UniversityofMaryland/AY-2025-26/Fall 2025/TA - INST 314/Lectures and Labs/Week 04/Warmup - Smarties/smarties.csv", show_col_types = FALSE)
class(smarties)
## [1] "spec_tbl_df" "tbl_df"      "tbl"         "data.frame"
# Visualize the data as a binomial distribution
hist(smarties$`Number of white smarties`,
        breaks = 40,
        main = "Distribution of white smarties counties in bags",
        xlab = "Number of white smarties",
        ylab = "")

# Visualize the data as a histogram of total smarties in each bag
hist(smarties$`Number of total smarties`,
        breaks = 40,
        main = "Distribution of total smarties counties in bags",
        xlab = "Total number of smarties",
        ylab = "")

# Calculate the proportion of white smarties in each bag

smarties$ProportionWhite <- smarties$`Number of white smarties` / smarties$`Number of total smarties`
# Convert the proportion of smarties column to a percent
smarties$PercentWhite <- smarties$ProportionWhite * 100
# View the summary statistics of the percent column
summary(smarties$PercentWhite)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.00   13.33   15.40   21.36   26.67  500.00
# Use ggplot2 to create a histogram of the proportion of white smarties in each bag using the percent column

library(ggplot2)
library(ggthemes)

ggplot(smarties, aes(x = PercentWhite)) +
  geom_histogram(binwidth = 5, fill = "maroon", color = "black") +
  xlim(0, 100) +
  labs(x = "Proportion of White Smarties (%)",
       y = "Count of Bags") +
  ggtitle("Distribution of the proportion of white smarties") +
  theme_minimal()
## Warning: Removed 1 row containing non-finite outside the scale range
## (`stat_bin()`).
## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_bar()`).

# What is our question about the data? Is the distribution of white smarties in the bags different from the expected proportion of 16.6666666667%?

# Count the number of participants (or n?)
nrow(smarties)
## [1] 148
# Count the number of white smarties in all bags
sum(smarties$`Number of white smarties`)
## [1] 520
# Count the total number of smarties in all bags
sum(smarties$`Number of total smarties`)
## [1] 2730
# Calculate the overall percentage of white smarties in all bags
sum(smarties$`Number of white smarties`) / sum(smarties$`Number of total smarties`)*100
## [1] 19.04762

So, 19% of the smarties in our class demo were white.

Recall the R syntax for a binomial test

??binom.test
# Conduct a binomial test to see if the proportion of white smarties is different

# x = number of successes (white smarties)
# n = number of trials (total smarties)
# p = hypothesized probability or prop of white smarties (100/6 colors = 0.16 or 16%) See: https://www.smarties.com/faqs/ and the "We manufacture six (6) colors of candy tablets that are randomly mixed together"

binom.test(520, 2730, p = 0.16666666667, alternative = "two.sided")
## 
##  Exact binomial test
## 
## data:  520 and 2730
## number of successes = 520, number of trials = 2730, p-value = 0.001009
## alternative hypothesis: true probability of success is not equal to 0.1666667
## 95 percent confidence interval:
##  0.1759010 0.2057181
## sample estimates:
## probability of success 
##              0.1904762
# Note: 520/2730 = 0.1908406 or 19.08%. We are comparing this observed proportion of white smarties to the hypothesized proportion of 16.666666667%

Interpret the results of the binomial test

The p-value is much less than than 0.01 (p = 0.001009), so we reject the null hypothesis and conclude that the proportion of white smarties is significantly different from the hypothesized proportion of 16.6666666667%.

The 95% confidence interval for the proportion of white smarties is from 0.1757 to 0.2061. It means that we are 95% confident that the true proportion of white smarties in the population is between 17.57% and 20.61%.