In this lab we will develop the principles of hypothesis testing means and pair-sample means.

Lab Objective 1: Write and organize statistical reports in a clear readable format.
Lab Objective 7: Construct confidence intervals and use them for hypothesis testing.
Lab Objective 8: Calculate and interpret p-values to test hypotheses.

1 Paired Data

Determine if the following statements are true or false. If false, explain.

  1. In a paired analysis we first take the difference of each pair of observations, and then we do inference on these differences.

A = True, B = False

  1. Consider two sets of data that are paired with each other. Each observation in one data set has a natural correspondence with exactly one observation from the other data set.

A = True, B = False

  1. Two data sets of different sizes can be analyzed as paired data.

A = True, B = False

2 NOAA data

Let’s consider a limited set of climate data, examining temperature differences in 1948 vs 2018. We sampled 197 locations from the National Oceanic and Atmospheric Administration’s (NOAA) historical data, where the data was available for both years of interest. We want to know: were there more days with temperatures exceeding 90°F in 2018 or in 1948?

The difference in number of days exceeding 90°F (number of days in 2018 - number of days in 1948) was calculated for each of the 197 locations. The average of these differences was 2.9 days with a standard deviation of 17.2 days. We are interested in determining whether these data provide strong evidence that there were more days in 2018 that exceeded 90°F from NOAA’s weather stations.

xbar = 2.9
n = 197
s = 17.2
  1. Is there a relationship between the observations collected in 1948 and 2018? Or are the observations in the two groups paired or independent? Explain.

\(\mu_1\) = mean number of days exceeding 90 degrees F in 1948 \(\mu_2\) = mean number of days exceeding 90 degrees F in 2018

  1. Write hypotheses for this research in symbols and in words.

H_o: the difference in the number of days exceeding 90 degrees F is zero (\(\mu_1 - \mu_2=0\)) H_A: the number of days exceeding 90 degrees F was more in 2018 than in 1948 (\(\mu_2 < \mu_1\) or \(\mu_2 - \mu_1 < 0\))

  1. Calculate the test statistic and find the p-value.
xbar = 2.9
n = 197
s = 17.2
SE = s/sqrt(n)
Tscore = (xbar - 0)/SE
Tscore
## [1] 2.366479
1 - pt( 2.366479, df = n-1)
## [1] 0.009466607
  1. Use alpha = 0.05 to evaluate the test, and interpret your conclusion in context.

Since p = 0.0095 < 0.05, we have strong evidence that the mean number of days exceeding 90 degrees F is more in 2018 than in 2019.

  1. What type of error might we have made? Explain in context what the error means.

(SKIP. We didn’t cover this this semester.)

  1. Based on the results of this hypothesis test, would you expect a confidence interval for the average difference between the number of days exceeding 90°F from 1948 and 2018 to include 0? Explain your reasoning.

No. Because we have strong evidence that the mean number of days is not zero. If the confidence interval included 0, then we would have evidence that the mean number of days could be zero.