No, the conclusion does not imply there is no linear association between X and Y. If we fail to reject a hypothesis, it simply means we do not have enough evidence to reject the statement of the hypothesis or prove that its wrong. This means that we cannot guarantee that there is no linear association between X and Y.
Based on a p-value of \(0.025 < \alpha = 0.05\), we reject the null hypothesis that \(H_0: \beta_1 = 0\).
b_0 <- 10
b_1 <- 0.56
n <- 18
x_bar <- 8
sxx <- 56
sse <- 22
mse <-11/8 #SSE/n-2 found in the last hw
s_b1 <- sqrt(mse)/sqrt(sxx)
s_b1 # Standard deviation on b1
## [1] 0.1566958
t <- (b_1 - 0)/s_b1
t # Test statistic
## [1] 3.573804
p<- 2*pt(-abs(t), df=n-2)
p #P value of
## [1] 0.002535736
The 99% confidence interval is \(0.56 \pm 0.4576\), which is \((0.102358, 1.01764)\). This means we are 99% confident that \(\beta_1\) is within the constructed interval, i.e, if this experiment were to be repeated N times, where N is large enough, \(\beta_1\) is in approximately 99% of the intervals.
a <-0.01
m <- qt(1-(a/2), n-2) #Multiplier
#CI = b_1 +/- m*s_b1
b_1 + m*s_b1
## [1] 1.017674
b_1 - m*s_b1
## [1] 0.1023258
The 95% confidence interval is 10 +/- 2.721266, which is \((7.2787, 12.72127)\). This means we are 95% confident that \(\beta_0\) is within the constructed interval, i.e, if this experiment were to be repeated N times, where N is large enough, \(\beta_0\) is in approximately 95% of the intervals.
x_n <- 8
inner <- (1/n) + ((x_n)^2/sxx)
s_b0 <- sqrt(mse)*(sqrt(inner))
s_b0
## [1] 1.283673
s_b0 <-sqrt(mse*(1/n + (x_n^2)/sxx))
s_b0
## [1] 1.283673
a<-0.05
m <- qt(1-(a/2), n-2) #Multiplier
m*s_b0
## [1] 2.721266
#CI = b_1 +/- m*s_b1
b_0 + m*s_b0
## [1] 12.72127
b_0 - m*s_b0
## [1] 7.278734
The 99% confidence interval is between \((0.0053856,0.07226864)\). This means we are 99% confident that \(\beta_1\) is within the constructed interval, i.e, if this experiment were to be repeated N times, where N is large enough, \(\beta_1\) is in approximately 99% of the intervals.
This interval does not contain 0. The admissions director might be interested in whether the confidence interval includes 0, because that may suggest that the difference is not signifcant. If ACT scores have no impact on a student’s GPA, they do not have to list a minimum requirement and can increase the amount of applicants.
knitr::opts_chunk$set(echo = TRUE)
Data<-read.table("CH01PR19.txt")
head(Data)
## V1 V2
## 1 3.897 21
## 2 3.885 14
## 3 3.778 28
## 4 2.540 22
## 5 3.028 21
## 6 3.865 31
n <- nrow(Data)
X <- Data[,2]
Y <- Data[,1]
Xbar <- mean(X)
Ybar <- mean(Y)
b1 <- sum((X-Xbar)*(Y-Ybar))/((n-1)*var(X)) #Formula
b0 <- Ybar-b1*Xbar #Formula
Yhat <- b0+b1*X
Res <- Y-Yhat
Res[1:5] # First five residuals
## [1] 0.96758105 1.22737094 0.57679116 -0.42824608 0.09858105
MSE <- sum(Res^2)/(n-2)
sxx<- sum((X-Xbar)^2)
sxx
## [1] 2379.925
s_b1 <- sqrt(MSE)/sqrt(sxx)
s_b1
## [1] 0.01277302
a <-0.01
m <- qt(1-(a/2), n-2) #Multiplier
m
## [1] 2.618137
#CI = b_1 +/- m*s_b1
b1 + m*s_b1
## [1] 0.07226864
b1 - m*s_b1
## [1] 0.005385614
Hypothesis Test:
\(H_0: \beta_1 = 0\)
\(H_a: \beta_1 \neq 0\)
Decision Rule:
If \(\mid T_{H_0}\mid = \frac{\beta_1 - 0}{s(b_1)} > t(1-\frac{\alpha}{2}, n-2)\), then reject \(H_0\).
Conclusion:
At an \(\alpha = 0.01\), we reject the \(H_0: \beta_1 = 0\), as \(\mid T_{H_0}\mid > t\). This means there is sufficient evidence to reject the claim that \(H_0: \beta_1 = 0\). This implies that there exists a significant linear relationship between ACT score and GPA.
t <- (b1 - 0)/s_b1
m
## [1] 2.618137
abs(t)>m #Take the abolute value of t* and verify that it is larger than the multiplier to reject
## [1] TRUE
Our p-value of \(0.002916 < \alpha = 0.01\). Another way to reject the \(H_0\) is to verify that \(p < \alpha\). This is true, and supports our conclusion to reject the \(H_0\)
p<- 2*pt(-abs(t), df=n-2)
p
## [1] 0.002916604
oran <- Orange
head(oran)
## Tree age circumference
## 1 1 118 30
## 2 1 484 58
## 3 1 664 87
## 4 1 1004 115
## 5 1 1231 120
## 6 1 1372 142
n <- nrow(oran)
X <- oran[,2] #Age
Y <- oran[,3] #Circumference
Xbar <- mean(X) #Mean of Age
Ybar <- mean(Y) #Mean of Circumference
b1 <- sum((X-Xbar)*(Y-Ybar))/((n-1)*var(X)) #Formula
b0 <- Ybar-b1*Xbar #Formula
Yhat <- b0+b1*X
Res <- Y-Yhat
Res[1:5] # First five residuals
## [1] 0.001451402 -11.076487573 -1.295146086 -9.597056609 -28.833920400
MSE <- sum(Res^2)/(n-2)
The simple linear regression model for the Orange Data is \(\hat{Y} = 0.10677X + 17.39965\)
Hypothesis Test:
\(H_0: \beta_1 = 0\)
\(H_a: \beta_1 \neq 0\)
Decision Rule:
If \(\mid T_{H_0}\mid = \frac{\beta_1 - 0}{s(b_1)} > t(1-\frac{\alpha}{2}, n-2)\), then reject \(H_0\).
Conclusion:
At an \(\alpha = 0.01\), we reject the \(H_0: \beta_1 = 0\), as
This suggests there is sufficient evidence to reject the claim that \(H_0: \beta_1 = 0\). This implies that their is significant linear relationship between age and circumference.
sxx<- sum((X-Xbar)^2)
s_b1 <- sqrt(MSE)/sqrt(sxx)
a <-0.01
m <- qt(1-(a/2), n-2) #Multiplier
m
## [1] 2.733277
#CI = b_1 +/- m*s_b1
b1 + m*s_b1
## [1] 0.1293926
b1 - m*s_b1
## [1] 0.08414802
t <- (b1 - 0)/s_b1
t
## [1] 12.90023
abs(t)>m #Take the abolute value of t* and verify that it is larger than the multiplier to reject
## [1] TRUE
p<- 2*pt(-abs(t), df=n-2)
p
## [1] 1.930596e-14
The largest residual is 46.3103.
The response value of the observation is 140.
The fitted value of this observation is 186.3102.
Res
## [1] 0.001451402 -11.076487573 -1.295146086 -9.597056609 -28.833920400
## [6] -21.888536235 -41.310304499 3.001451402 -0.076487573 22.704853914
## [11] 31.402943391 23.166079600 39.111463765 16.689695501 0.001451402
## [16] -18.076487573 -13.295146086 -16.597056609 -33.833920400 -24.888536235
## [21] -46.310304499 2.001451402 -7.076487573 23.704853914 42.402943391
## [26] 30.166079600 45.111463765 27.689695501 0.001451402 -20.076487573
## [31] -7.295146086 0.402943391 -6.833920400 10.111463765 -9.310304499
max(abs(Res))
## [1] 46.3103
oran[which.max(abs(Res)),]
## Tree age circumference
## 21 3 1582 140
Yhat[21]
## [1] 186.3103