Completed by Alex Blohm

Question 1

Question 2

No you never accept the null hypothesis: meaning that there is not enough evidence to reject the null and conclude a linear linear relationship.

Question 3a.

A regression analysis relating test scores (Y ) to training hours (X) produced the following fitted equation: Yˆ = 10 + 0.56X. Moreover, suppose that n = 18, X ̄n = 8 and SXX = 56. The residuals sum of square (SSE) for this model was found to be 22. a. Find the t test statistics for testing H0 : β1 = 0 vs H1 : β1 ̸= 0. Find the p-value to decide whether you can reject H0 at level α = 0.05. b. Find the 99% confidence interval for β1 and interpret. c. Find the 95% confidence interval for β0 and interpret.

MSE = 22/(18-2)
Sxx = 56
Sb1 = sqrt(MSE/Sxx)
THo = (0.56-0)/(Sb1)

qt(0.975,16)
## [1] 2.119905
PValue = 2*(1-pt(abs(THo), 16))

Since THo = 3.574 and the rejection region t(0.975, 16) = 2.1120 (Checked in table in book) is less than Tho, the p value will be small, and we will reject the null.

The p-value turned out to be 0.0025 which is small (less than 0.05), thus we reject the null hypothesis and conclude B1 is not =0 and there is a correlation between training hours and test scores.

Question 3b.

b1 = 0.56
MSE = 22/(18-2)
Sxx = 56
Sb1 = sqrt(MSE/Sxx)
LowerBound = b1-qt(1-(.01/2),16)*Sb1
UpperBound = b1+qt(1-(.01/2),16)*Sb1

We are 99% Confident that the interval LowerBound 0.102 to UpperBound 1.018 captures the true Slope (Beta1) of the regression line relating training hours and test scores.

*Note: This follows with our result from 3.a. since the p value is less than alpha = 0.01 (-99% confidence), and 0 is not in the confidence interval.

Question 3c.

b0 = 10
MSE = 22/(18-2)
Sxx = 56
Sb0 = sqrt(MSE*((1/18) + ((8^2)/Sxx)))
LowerBound = b0-qt(1-(.05/2),16)*Sb0
UpperBound = b0+qt(1-(.05/2),16)*Sb0

We are 95% Confident that the interval LowerBound 7.279 to UpperBound 12.721 captures the true test score if there were zero training hours (Beta0).

*Note that we do not know the scope of this data, therefore this value may not have any meaning.

Question 4.

From Homework 1 we know that b1 = 0.03882713 and b0 = 2.11404929

#Part a.
gpa = read.table("CH01PR19.txt")

n <- nrow(gpa)
X <- gpa[,2]
Y <- gpa[,1]
Xbar <- mean(X)
Ybar <- mean(Y)
Sxx = sum((X-Xbar)^2)
b1 <- sum((X-Xbar)*(Y-Ybar))/((n-1)*var(X))  
b0<- Ybar-b1*Xbar
c(b0,b1)
## [1] 2.11404929 0.03882713
SSE = sum((Y-(b0 + b1*X))^2)
MSE = SSE/(n-2)
Sb1 = sqrt(MSE/Sxx)
LowerBound = b1-qt(1-(.01/2),16)*Sb1
UpperBound = b1+qt(1-(.01/2),16)*Sb1
#We are 99% Confident that the interval LowerBound -0.372 to UpperBound 0.449 captures the true Slope (Beta1) of the regression line relating ACT scores to Freshman GPA.
#The interval does include 0 which would mean that the true slope could be 0 which would mean that there is not enough evidence to say that there is a correlation between ACT score and GPA.  However we will show this in part b.

#Part b.
#Ho: Beta1 = 0
#Ha: Beta1 not = 0

THo = (b1-0)/(Sb1)

qt(1-(0.01/2),n-2)
## [1] 2.618137
PValue = 2*(1-pt(abs(THo), n-2))

#Our rejection region T(.995,n-2) = 2.618 and our test statistic tb1 = 1.8, therefore there is not enough evidence to reject the null (the test statistic is not large enough) to reject the null.  We fail-to-reject the null and cannot conclude there is an association between ACT score and GPA. 

#The p value = 0.074 which is larger than alpha = 0.01.  A small p value would have told us to reject, but it was not small enough (smaller than alpha).

Question 5

5+5
## [1] 10
Orange
##    Tree  age circumference
## 1     1  118            30
## 2     1  484            58
## 3     1  664            87
## 4     1 1004           115
## 5     1 1231           120
## 6     1 1372           142
## 7     1 1582           145
## 8     2  118            33
## 9     2  484            69
## 10    2  664           111
## 11    2 1004           156
## 12    2 1231           172
## 13    2 1372           203
## 14    2 1582           203
## 15    3  118            30
## 16    3  484            51
## 17    3  664            75
## 18    3 1004           108
## 19    3 1231           115
## 20    3 1372           139
## 21    3 1582           140
## 22    4  118            32
## 23    4  484            62
## 24    4  664           112
## 25    4 1004           167
## 26    4 1231           179
## 27    4 1372           209
## 28    4 1582           214
## 29    5  118            30
## 30    5  484            49
## 31    5  664            81
## 32    5 1004           125
## 33    5 1231           142
## 34    5 1372           174
## 35    5 1582           177
age = Orange [,2]
circumf = Orange[,3]
n = nrow(Orange)
Xbar <- mean(age)
Ybar <- mean(circumf)
b1 <- sum((age-Xbar)*(circumf-Ybar))/((n-1)*var(age))
b0 <- Ybar-b1*Xbar 
c(b0,b1)
## [1] 17.3996502  0.1067703
plot(age, circumf, type = "p", pch=20, cex = 1, col=rainbow(25))

 xrange=c(min(age),max(age))
 lines(xrange,b0+b1*xrange,lwd=2)

 #Ho: Beta1 = 0
 #Ha: Beta1 not = 0
 #With alpha of 0.01
 
Sxx = sum((X-Xbar)^2)
SSE = sum((circumf-(b0 + b1*circumf))^2)
MSE = SSE/(n-2)
Sb1 = sqrt(MSE/Sxx)

THo = (b1-0)/(Sb1)

qt(1-(0.01/2),n-2)
## [1] 2.733277
PValue = 2*(1-pt(abs(THo), n-2))

#Our rejection region T(.995,n-2) = 2.733 is less than our test statistic THo = 10.206 which corresponds to the very small p value 9.68 x 10^-12.  Therefore we reject the null hypothesis and conclude that there is a correlation between age and circumference.
Residuals = circumf-(b0+b1*age)

WhereMax = which.max(abs(Residuals))
WhereMax
## [1] 21
age[WhereMax]
## [1] 1582
YHatOrange = b0+b1*age[WhereMax]
YHatOrange
## [1] 186.3103
#The absolute value of the largest residual is 21 and it happens at the 21st line and is 1582 days since 1968/12/31.  The Fitted value of this residual is 186.31mm (predicted circumference of tree trunk).