library(IS606)
## 
## Welcome to CUNY IS606 Statistics and Probability for Data Analytics 
## This package is designed to support this course. The text book used 
## is OpenIntro Statistics, 3rd Edition. You can read this by typing 
## vignette('os3') or visit www.OpenIntro.org. 
##  
## The getLabs() function will return a list of the labs available. 
##  
## The demo(package='IS606') will list the demos that are available.
## 
## Attaching package: 'IS606'
## The following object is masked from 'package:utils':
## 
##     demo

5.19 Global warming, Part I. Is there strong evidence of global warming? Let’s consider a small scale example, comparing how temperatures have changed in the US from 1968 to 2008. The daily high temperature reading on January 1 was collected in 1968 and 2008 for 51 randomly selected locations in the continental US. Then the difference between the two readings (temperature in 2008 - temperature in 1968) was calculated for each of the 51 different locations. The average of these 51 values was 1.1 degrees with a standard deviation of 4.9 degrees. We are interested in determining whether these data provide strong evidence of temperature warming in the continental US.

(a) Is there a relationship between the observations collected in 1968 and 2008? Or are the observations in the two groups independent? Explain.

There is a relationship between the observations collected in 1968 and 2008. The 51 temperture readings taken in January 1st where from the same location in 1968 and 2008. The datasets for 1968 and 2008 are paired data.

(b) Write hypotheses for this research in symbols and in words.

H0: µ(diff) = 0 : There is no difference between the average high daily temperature on January 1, 1968 and January 1, 2008 in the continental U.S.

HA: µ(diff) > 0 : The average high daily temperature in the continental U.S. is higher on January 1, 2008 than on January 1, 1968.

Where µ(diff) = µ(2008) - µ(1968). We use a one-sided test since we are only interested in global warming and not global freezing.

(c) Check the conditions required to complete this test.

(1) The samples are independent - the sample was taken from a simple random sample and n = 51 is < 10% of the population (continental U.S. locations).
(2) The sample size is large, n >= 30. Since n=51 in this case, we satisfy this condition.

(3) The population data is not strongly skewed. This condition cannot be evaluated here since we don’t have the sample data. But since n=51 is relatively large and global warming is a global phenomenon (if it is true), the probabilty of the population data being strongly skewed is small.

(d) Calculate the test statistic and find the p-value.

n = 51   #sample size (locations)
x = 1.1  #sample mean (degrees)
s = 4.9  #sample SD (degrees)
     
#compute for standard error (SE)
SE  = s/sqrt(n)
# SE = 0.69 degrees

#compute for the T-score for x(diff) under the null condition that the actual mean difference is 0
T = (x - 0)/SE
# T score is 1.60
#compute degrees of freedom
df = 51 -1
#degrees of freedom is 50

#compute p-value
p_value = 1 - pt(T, df)
# p-value is 0.058

(e) What do you conclude? Interpret your conclusion in context.
Since the p-value of 0.058 is greater than our significance level of 0.05 we are unable to reject the null hypothesis. We can conclude that based on our data, global warming is not happening in the continental U.S.

(f) What type of error might we have made? Explain in context what the error means.
We could have made a Type II error - not rejecting the null hypothesis when it is false. This is possible since the p-value we got is just 0.008 higher than our significance level of 0.05. Since the effects of global warming to the planet can be catostrophic, it is important that we prevent a Type II error. A Type I error (believing that there is global warming when there is none), is not as critical as a Type II error in this case.

(g) Based on the results of this hypothesis test, would you expect a confidence interval for the average difference between the temperature measurements from 1968 and 2008 to include 0? Explain your reasoning.
Yes. Since we failed to reject the null hypothesis, we are saying that the difference between temperature measurements from 1968 to 2008 in the continental U.S. is 0.

#For our point sample -  

 LR = x - (qt(0.95,df) * SE)
 HR = x + (qt(0.95,df) * SE)
#(-0.05, 2.25)