Completed by Alex Blohm
No you never accept the null hypothesis: meaning that there is not enough evidence to reject the null and conclude a linear linear relationship.
A regression analysis relating test scores (Y ) to training hours (X) produced the following fitted equation: Yˆ = 10 + 0.56X. Moreover, suppose that n = 18, X ̄n = 8 and SXX = 56. The residuals sum of square (SSE) for this model was found to be 22. a. Find the t test statistics for testing H0 : β1 = 0 vs H1 : β1 ̸= 0. Find the p-value to decide whether you can reject H0 at level α = 0.05. b. Find the 99% confidence interval for β1 and interpret. c. Find the 95% confidence interval for β0 and interpret.
MSE = 22/(18-2)
Sxx = 56
Sb1 = sqrt(MSE/Sxx)
THo = (0.56-0)/(Sb1)
qt(0.975,16)
## [1] 2.119905
PValue = 2*(1-pt(abs(THo), 16))
Since THo = 3.574 and the rejection region t(0.975, 16) = 2.1120 (Checked in table in book) is less than Tho, the p value will be small, and we will reject the null.
The p-value turned out to be 0.0025 which is small (less than 0.05), thus we reject the null hypothesis and conclude B1 is not =0 and there is a correlation between training hours and test scores.
b1 = 0.56
MSE = 22/(18-2)
Sxx = 56
Sb1 = sqrt(MSE/Sxx)
LowerBound = b1-qt(1-(.01/2),16)*Sb1
UpperBound = b1+qt(1-(.01/2),16)*Sb1
We are 99% Confident that the interval LowerBound 0.102 to UpperBound 1.018 captures the true Slope (Beta1) of the regression line relating training hours and test scores.
*Note: This follows with our result from 3.a. since the p value is less than alpha = 0.01 (-99% confidence), and 0 is not in the confidence interval.
b0 = 10
MSE = 22/(18-2)
Sxx = 56
Sb0 = sqrt(MSE*((1/18) + ((8^2)/Sxx)))
LowerBound = b0-qt(1-(.05/2),16)*Sb0
UpperBound = b0+qt(1-(.05/2),16)*Sb0
We are 95% Confident that the interval LowerBound 7.279 to UpperBound 12.721 captures the true test score if there were zero training hours (Beta0).
*Note that we do not know the scope of this data, therefore this value may not have any meaning.
From Homework 1 we know that b1 = 0.03882713 and b0 = 2.11404929
#Part a.
gpa = read.table("CH01PR19.txt")
n <- nrow(gpa)
X <- gpa[,2]
Y <- gpa[,1]
Xbar <- mean(X)
Ybar <- mean(Y)
Sxx = sum((X-Xbar)^2)
b1 <- sum((X-Xbar)*(Y-Ybar))/((n-1)*var(X))
b0<- Ybar-b1*Xbar
c(b0,b1)
## [1] 2.11404929 0.03882713
SSE = sum((Y-(b0 + b1*X))^2)
MSE = SSE/(n-2)
Sb1 = sqrt(MSE/Sxx)
LowerBound = b1-qt(1-(.01/2),16)*Sb1
UpperBound = b1+qt(1-(.01/2),16)*Sb1
#We are 99% Confident that the interval LowerBound -0.372 to UpperBound 0.449 captures the true Slope (Beta1) of the regression line relating ACT scores to Freshman GPA.
#The interval does include 0 which would mean that the true slope could be 0 which would mean that there is not enough evidence to say that there is a correlation between ACT score and GPA. However we will show this in part b.
#Part b.
#Ho: Beta1 = 0
#Ha: Beta1 not = 0
THo = (b1-0)/(Sb1)
qt(1-(0.01/2),n-2)
## [1] 2.618137
PValue = 2*(1-pt(abs(THo), n-2))
#Our rejection region T(.995,n-2) = 2.618 and our test statistic tb1 = 1.8, therefore there is not enough evidence to reject the null (the test statistic is not large enough) to reject the null. We fail-to-reject the null and cannot conclude there is an association between ACT score and GPA.
#The p value = 0.074 which is larger than alpha = 0.01. A small p value would have told us to reject, but it was not small enough (smaller than alpha).
5+5
## [1] 10
Orange
## Tree age circumference
## 1 1 118 30
## 2 1 484 58
## 3 1 664 87
## 4 1 1004 115
## 5 1 1231 120
## 6 1 1372 142
## 7 1 1582 145
## 8 2 118 33
## 9 2 484 69
## 10 2 664 111
## 11 2 1004 156
## 12 2 1231 172
## 13 2 1372 203
## 14 2 1582 203
## 15 3 118 30
## 16 3 484 51
## 17 3 664 75
## 18 3 1004 108
## 19 3 1231 115
## 20 3 1372 139
## 21 3 1582 140
## 22 4 118 32
## 23 4 484 62
## 24 4 664 112
## 25 4 1004 167
## 26 4 1231 179
## 27 4 1372 209
## 28 4 1582 214
## 29 5 118 30
## 30 5 484 49
## 31 5 664 81
## 32 5 1004 125
## 33 5 1231 142
## 34 5 1372 174
## 35 5 1582 177
age = Orange [,2]
circumf = Orange[,3]
n = nrow(Orange)
Xbar <- mean(age)
Ybar <- mean(circumf)
b1 <- sum((age-Xbar)*(circumf-Ybar))/((n-1)*var(age))
b0 <- Ybar-b1*Xbar
c(b0,b1)
## [1] 17.3996502 0.1067703
plot(age, circumf, type = "p", pch=20, cex = 1, col=rainbow(25))
xrange=c(min(age),max(age))
lines(xrange,b0+b1*xrange,lwd=2)
#Ho: Beta1 = 0
#Ha: Beta1 not = 0
#With alpha of 0.01
Sxx = sum((X-Xbar)^2)
SSE = sum((circumf-(b0 + b1*circumf))^2)
MSE = SSE/(n-2)
Sb1 = sqrt(MSE/Sxx)
THo = (b1-0)/(Sb1)
qt(1-(0.01/2),n-2)
## [1] 2.733277
PValue = 2*(1-pt(abs(THo), n-2))
#Our rejection region T(.995,n-2) = 2.733 is less than our test statistic THo = 10.206 which corresponds to the very small p value 9.68 x 10^-12. Therefore we reject the null hypothesis and conclude that there is a correlation between age and circumference.
Residuals = circumf-(b0+b1*age)
WhereMax = which.max(abs(Residuals))
WhereMax
## [1] 21
age[WhereMax]
## [1] 1582
YHatOrange = b0+b1*age[WhereMax]
YHatOrange
## [1] 186.3103
#The absolute value of the largest residual is 21 and it happens at the 21st line and is 1582 days since 1968/12/31. The Fitted value of this residual is 186.31mm (predicted circumference of tree trunk).