This lab introduces hypothesis testing in a hydrologic setting using annual precipitation data. You will use simulated precipitation records from two stations to test whether mean annual precipitation has changed over time. The emphasis of this lab is on:
library(dplyr)
library(shiny)
library(tidyverse)
Hydrologists are often asked whether precipitation has changed over time at a particular location. While time series plots can suggest trends, statistical hypothesis tests allow us to formally test whether mean conditions differ between time periods. In this lab, you will analyze two hypothetical precipitation stations:
Both stations exhibit year to year variability typical of precipitation records.
The data represent annual precipitation totals (mm). Each station record is divided into two periods of equal length:
Early Period: 25 years Recent Period: 25 years
All data are simulated to:
Have approximately equal variance between periods Be suitable for a two sample t test
A random seed is set so that all students obtain identical results.
set.seed(391) # ensures reproducibility
n <- 25 # number of years per period
sd_precip <- 90 # standard deviation (mm)
# Station A: increasing mean precipitation
A_early <- rnorm(n, mean = 800, sd = sd_precip)
A_recent <- rnorm(n, mean = 900, sd = sd_precip)
# Station B: no meaningful change in mean precipitation
B_early <- rnorm(n, mean = 800, sd = sd_precip)
B_recent <- rnorm(n, mean = 805, sd = sd_precip)
What does set.seed() do, and why is it important in this lab?
What are the functions for calculating the average and standard deviation?
mean(A_early); sd(A_early)
## [1] 800.0924
## [1] 85.61439
mean(A_recent); sd(A_recent)
## [1] 892.3874
## [1] 85.83152
mean(B_early); sd(B_early)
## [1] 796.4069
## [1] 73.3386
mean(B_recent); sd(B_recent)
## [1] 796.5841
## [1] 83.49773
Are the variances for the early and recent periods similar for each station?
Based on the means alone, which station appears to show a change in precipitation?
par(mfrow = c(2,1), mar = c(4,4,2,1))
plot(A_early, type = "l",
ylim = range(c(A_early, A_recent)),
main = "Station A: Annual Precipitation",
ylab = "Precipitation (mm)", xlab = "Year Index")
lines((n+1):(2*n), A_recent)
plot(B_early, type = "l",
ylim = range(c(B_early, B_recent)),
main = "Station B: Annual Precipitation",
ylab = "Precipitation (mm)", xlab = "Year Index")
lines((n+1):(2*n), B_recent)
Does the visual appearance of Station A match the summary statistics?
Why might visual inspection alone be insufficient to determine statistical change?
Write the null and alternative hypothesis for Station A to test for a difference in mean annual precipitation
Null hypothesis (H₀):
Alternative hypothesis (H₁):
tA <- t.test(A_early, A_recent,
var.equal = TRUE,
alternative = "two.sided")
tA
##
## Two Sample t-test
##
## data: A_early and A_recent
## t = -3.8066, df = 48, p-value = 0.0003995
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -141.04516 -43.54493
## sample estimates:
## mean of x mean of y
## 800.0924 892.3874
Record your null and alternative hypotheses. What is the p-value for Station A?
At 𝛼=0.05 do you reject or fail to reject the null hypothesis?
Interpret the result in a hydrologic context.Would a different type of t-test be appropriate here, which one and why?
Write the null and alternative hypothesis for Station B to test for a difference in mean annual precipitation. Use the code above, but change the variable to reflect B_early and B_recent.
Record your null and alternative hypotheses. What is the p-value for Station B?
How does this result differ from Station A?
Why is a two-sided test appropriate here?
Explain why equal variance is an important assumption for this t-test.
Describe one hydrologic implication of incorrectly concluding that precipitation has increased.
How might serial correlation in real precipitation data affect this analysis?