Scenario: Inference for Penguin Data using prop.test() and t.test()

In this project we use the Palmer Penguins data set and formal statistical inferential framework to answer questions about the practice formal statistical inference.

The project has two parts. In Part 1, we work through four complete examples together — a hypothesis test and a confidence interval using prop.test(), followed by a hypothesis test and a confidence interval using t.test(). In Part 2, you will carry out two analyses of your own: one involving a proportion and one involving a mean (or difference in means).

Part 1: Worked Examples

1A: Hypothesis Test for a Proportion — prop.test()

Question: Is the proportion of Chinstrap penguins in the dataset different from one-third (what we would expect if all three species were equally common)?

We count the number of Chinstrap penguins and the total number of penguins with non-missing species data, then run the test.

n_total     <- penguins %>% filter(!is.na(species)) %>% nrow()
n_chinstrap <- penguins %>% filter(species == "Chinstrap") %>% nrow()

n_total

## [1] 344

n_chinstrap

## [1] 68

Our Null Hypothesis is that p = 1/3 (Chinstrap penguins make up one-third of the population)
Our Alternative Hypothesis is that p =/= 1/3 (two-sided)

prop.test(x = n_chinstrap, n = n_total, p = 1/3,
          alternative = "two.sided", correct = FALSE)

## 
##  1-sample proportions test without continuity correction
## 
## data:  n_chinstrap out of n_total, null probability 1/3
## X-squared = 28.488, df = 1, p-value = 9.426e-08
## alternative hypothesis: true p is not equal to 0.3333333
## 95 percent confidence interval:
##  0.1590290 0.2429974
## sample estimates:
##         p 
## 0.1976744

The p-value is very small (well below 0.05), so we reject the null hypothesis. There is strong evidence that Chinstrap penguins do not make up one-third of the population sampled in this data set — they appear to be underrepresented relative to the other two species.

1B: Confidence Interval for a Proportion — prop.test()

Question: What is a plausible range for the true proportion of Chinstrap penguins in the population?

prop.test(x = n_chinstrap, n = n_total, conf.level = 0.95, correct = FALSE)

## 
##  1-sample proportions test without continuity correction
## 
## data:  n_chinstrap out of n_total, null probability 0.5
## X-squared = 125.77, df = 1, p-value < 2.2e-16
## alternative hypothesis: true p is not equal to 0.5
## 95 percent confidence interval:
##  0.1590290 0.2429974
## sample estimates:
##         p 
## 0.1976744

We are 95% confident that the true proportion of Chinstrap penguins is between roughly 0.1590290 and 0.2429974. Because this interval does not include 1/3, it is consistent with the conclusion from Part 1A that the three species are not equally represented.

1C: Hypothesis Test for a Mean — t.test()

Question: A field guide published in 2010 reports that the mean flipper length for Chinstrap penguins is 200 mm. Does the Palmer Station sample support this claim?

We isolate the Chinstrap flipper lengths, then check the normality assumption with a histogram before running the test.

chinstrap_flipper <- penguins %>%
  filter(species == "Chinstrap", !is.na(flipper_length_mm)) %>%
  pull(flipper_length_mm)

length(chinstrap_flipper)

## [1] 68

mean(chinstrap_flipper)

## [1] 195.8235

sd(chinstrap_flipper)

## [1] 7.131894

ggplot(data.frame(flipper = chinstrap_flipper), aes(x = flipper)) +
  geom_histogram(binwidth = 3, fill = "steelblue", color = "white") +
  labs(title = "Chinstrap Penguin Flipper Length",
       x = "Flipper Length (mm)", y = "Count") +
  theme_minimal()

The histogram shows no extreme skew or outliers. The normality assumption is reasonable.

Our null hypothesis is that the true population mean is equal to 200 mm. Our alternative hypothesis is that the true population mean is not equal to 200 mm. This is a two-sided alternative.

t.test(chinstrap_flipper, mu = 200, alternative = "two.sided", conf.level = 0.95)

## 
##  One Sample t-test
## 
## data:  chinstrap_flipper
## t = -4.829, df = 67, p-value = 8.31e-06
## alternative hypothesis: true mean is not equal to 200
## 95 percent confidence interval:
##  194.0972 197.5498
## sample estimates:
## mean of x 
##  195.8235

The p-value is very small (well below 0.05), so we reject the null hypothesis. The Palmer penguin dataset provides strong evidence that the mean Chinstrap flipper length differs from the 200 mm value reported in the field guide. The sample mean of approximately 195.8 mm suggests flipper lengths are shorter than previously reported.

1D: Confidence Interval for a Mean — t.test()

Question: What is a plausible range for the true mean flipper length of Chinstrap penguins?

t.test(chinstrap_flipper, conf.level = 0.95)

## 
##  One Sample t-test
## 
## data:  chinstrap_flipper
## t = 226.42, df = 67, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  194.0972 197.5498
## sample estimates:
## mean of x 
##  195.8235

We are 95% confident that the true mean flipper length for Chinstrap penguins is between approximately 194.1 mm and 197.5 mm. Notice that this interval does not include 200 mm, which is consistent with our conclusion in Part 1C.

Part 2: Your Turn

Now it is your turn. You will carry out two analyses: one using prop.test() and one using t.test(). For each analysis, choose a question from the options below, then follow the steps shown in Part 1.

2A: Analysis with prop.test()

Choose one of the following questions to investigate:

(Option 1) Is there evidence that more than half of Gentoo penguins in the dataset are male?
(Option 2) Is there evidence that the proportion of penguins with body mass above 4200 g differs from 0.50?
(Option 3) A researcher claims that fewer than 30% of all penguins sampled are from Biscoe Island. Does the data support or contradict this claim?

State which option you chose: Option 1

State your null and alternative hypotheses:

#H0:Gentoo male penguins make up half of the data set. #HA:Gentoo male penguins make up more than half of the data set.

Set up the counts and run the test. Include a clear code chunk:

n_total     <- penguins %>% filter(!is.na(species)) %>% nrow()
n_Gentoo <- penguins %>% filter(species == "Gentoo") %>% nrow()
n_Gentoo     <- penguins %>% filter(!is.na(sex)) %>% nrow()
n_male <- penguins %>% filter(sex == "male") %>% nrow()

n_total

## [1] 344

n_Gentoo

## [1] 333

n_male

## [1] 168

Interpret the output. Your answer should address:

(1) What are x and n in your test?

*X is 168 the number of male Gentoo penguins while N is 333 the number of Gentoo penguins in the sample.

x = n_male n = n_Gentoo*

prop.test(x = 168, n = 333, p = 1/2, alternative = "greater", correct = FALSE)

## 
##  1-sample proportions test without continuity correction
## 
## data:  168 out of 333, null probability 1/2
## X-squared = 0.027027, df = 1, p-value = 0.4347
## alternative hypothesis: true p is greater than 0.5
## 95 percent confidence interval:
##  0.4595833 1.0000000
## sample estimates:
##         p 
## 0.5045045

(2) What is the p-value, and what does it tell you? (Use alpha = 0.05.)

The p-value is .4347 and because .4347 < .05 we failed to reject the null hypothesis meaning that male Gentoo penguins compose more than half of the data set.

(3) Report and interpret the 95% confidence interval from the output.

The 95% confidence interval from the output is (0.4595833 1.0000000). This data set identifies we are 95% confident that 45%-100% of this data set is composed of male Gentoo penguins.

2B: Analysis with t.test()

Choose one of the following questions to investigate:

(Option 1 — one-sample t-test) The literature suggests that Adelie penguins have a mean bill length of 39 mm. Does the Palmer Station sample support this claim? Use a two-sided test.
(Option 2 — one-sample t-test) Is there evidence that the mean body mass of Gentoo penguins exceeds 5000 g? Use a right-tailed test.
(Option 3 - one-sample t-test) Is there evidence that Gentoo penguins have a mean bill depth of less than 16mm?
(Option 4 — two-sample t-test) Does mean bill length differ between Adelie and Chinstrap penguins?
(Option 5 — two-sample t-test) Among Adelie penguins only, does mean flipper length differ between males and females?
(Option 6 — two-sample t-test) Does mean body mass differ between Gentoo and Chinstrap penguins? State which option you chose:

Option 3.

State your null and alternative hypotheses:

#H0:The mean of Gentoo penguins bill depth is 16mm. #HA:The mean of Gentoo penguins bill depth is less than 16mm

Use dplyr to isolate your data, make a histogram to check the normality assumption, and run the test:

# Step 1: Isolate your data with filter() and pull()
Gentoo_bill <- penguins %>%
  filter(species == "Gentoo", !is.na(bill_depth_mm)) %>%
  pull(bill_depth_mm)

Gentoo <- penguins %>%
  filter(species == "Gentoo", !is.na(bill_depth_mm))

length(Gentoo_bill)   # n

## [1] 123

mean(Gentoo_bill)     # y-bar

## [1] 14.98211

sd(Gentoo_bill)      # s

## [1] 0.9812198

# Step 2: Histogram to check normality
  
ggplot(data.frame(bill_depth_mm = Gentoo_bill), aes(x = bill_depth_mm)) +
  geom_histogram(binwidth = 3, fill = "steelblue", color = "white") +
  labs(title = "Gentoo Penguin Bill Depth",
       x = "Bill Depth (mm)", y = "Count") +
  theme_minimal()

# Step 3: Run t.test()


t.test(Gentoo_bill, mu = 16, alternative = "less", conf.level = 0.95)

## 
##  One Sample t-test
## 
## data:  Gentoo_bill
## t = -11.505, df = 122, p-value < 2.2e-16
## alternative hypothesis: true mean is less than 16
## 95 percent confidence interval:
##      -Inf 15.12875
## sample estimates:
## mean of x 
##  14.98211

Interpret the output. Your answer should address:

(1) What is the sample mean (or difference in sample means)?

The sample mean is 14.98211

(2) What is the t statistic and degrees of freedom?

*The t statistic is -11.505 and the degree of freedom is 122

t = -11.505, df = 122.*

(3) What is the p-value, and what is your decision at alpha = 0.05?

p-value < 2.2e-16 and based on alpha 0.05 we reject the null hypothesis because 2.2e-16 < .05

(4) Report and interpret the 95% confidence interval. Does it contain 0 (for a two-sample test) or your null value (for a one-sample test)? Is this consistent with your decision in (3)?

The confidence interval is -Inf 15.12875 meaning that we are 95% confident that the mean bill depth for Gentoo penguins is between negative infinity and 15.12 which is not in the range of our null value of 16.This is consistent with my decision on question 3.

Unit 9 Project

Chelsey Guzman

March 26 2026