Lab: Detecting Changes in Mean Annual Precipitation Using a t Test

Course context

This lab introduces hypothesis testing in a hydrologic setting using annual precipitation data. You will use simulated precipitation records from two stations to test whether mean annual precipitation has changed over time. The emphasis of this lab is on:

  1. Formulating null and alternative hypotheses
  2. Understanding the assumptions of a two sample t test
  3. Interpreting statistical results in a hydrologic context
  4. Connecting time series plots with hypothesis tests
library(dplyr)
library(shiny)
library(tidyverse)
Background

Hydrologists are often asked whether precipitation has changed over time at a particular location. While time series plots can suggest trends, statistical hypothesis tests allow us to formally test whether mean conditions differ between time periods. In this lab, you will analyze two hypothetical precipitation stations:

  1. Station A: Mean annual precipitation has increased over time
  2. Station B: Mean annual precipitation has not changed over time

Both stations exhibit year to year variability typical of precipitation records.

Data description

The data represent annual precipitation totals (mm). Each station record is divided into two periods of equal length:

Early Period: 25 years Recent Period: 25 years

All data are simulated to:

Have approximately equal variance between periods Be suitable for a two sample t test

A random seed is set so that all students obtain identical results.

set.seed(391) # ensures reproducibility


n <- 25 # number of years per period
sd_precip <- 90 # standard deviation (mm)


# Station A: increasing mean precipitation
A_early <- rnorm(n, mean = 800, sd = sd_precip)
A_recent <- rnorm(n, mean = 900, sd = sd_precip)


# Station B: no meaningful change in mean precipitation
B_early <- rnorm(n, mean = 800, sd = sd_precip)
B_recent <- rnorm(n, mean = 805, sd = sd_precip)
Questions
  1. What does set.seed() do, and why is it important in this lab?

  2. What are the functions for calculating the average and standard deviation?

mean(A_early); sd(A_early)
## [1] 800.0924
## [1] 85.61439
mean(A_recent); sd(A_recent)
## [1] 892.3874
## [1] 85.83152
mean(B_early); sd(B_early)
## [1] 796.4069
## [1] 73.3386
mean(B_recent); sd(B_recent)
## [1] 796.5841
## [1] 83.49773
Questions
  1. Are the variances for the early and recent periods similar for each station?

  2. Based on the means alone, which station appears to show a change in precipitation?

par(mfrow = c(2,1), mar = c(4,4,2,1))


plot(A_early, type = "l",
ylim = range(c(A_early, A_recent)),
main = "Station A: Annual Precipitation",
ylab = "Precipitation (mm)", xlab = "Year Index")
lines((n+1):(2*n), A_recent)


plot(B_early, type = "l",
ylim = range(c(B_early, B_recent)),
main = "Station B: Annual Precipitation",
ylab = "Precipitation (mm)", xlab = "Year Index")
lines((n+1):(2*n), B_recent)

Questions
  1. Does the visual appearance of Station A match the summary statistics?

  2. Why might visual inspection alone be insufficient to determine statistical change?

Part 4a: Hypothesis testing: Station A

Write the null and alternative hypothesis for Station A to test for a difference in mean annual precipitation

Null hypothesis (H₀):

Alternative hypothesis (H₁):

tA <- t.test(A_early, A_recent,
var.equal = TRUE,
alternative = "two.sided")
tA
## 
##  Two Sample t-test
## 
## data:  A_early and A_recent
## t = -3.8066, df = 48, p-value = 0.0003995
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -141.04516  -43.54493
## sample estimates:
## mean of x mean of y 
##  800.0924  892.3874
Questions
  1. Record your null and alternative hypotheses. What is the p-value for Station A?

  2. At 𝛼=0.05 do you reject or fail to reject the null hypothesis?

  3. Interpret the result in a hydrologic context.Would a different type of t-test be appropriate here, which one and why?

Part 4b: Hypothesis testing: Station B

Write the null and alternative hypothesis for Station B to test for a difference in mean annual precipitation. Use the code above, but change the variable to reflect B_early and B_recent.

  1. Record your null and alternative hypotheses. What is the p-value for Station B?

  2. How does this result differ from Station A?

  3. Why is a two-sided test appropriate here?

Part 5: Discussion
  1. Explain why equal variance is an important assumption for this t-test.

  2. Describe one hydrologic implication of incorrectly concluding that precipitation has increased.

  3. How might serial correlation in real precipitation data affect this analysis?