STAT 408 HW 2

Problem 1

Problem 2

No, the conclusion does not imply there is no linear association between X and Y. If we fail to reject a hypothesis, it simply means we do not have enough evidence to reject the statement of the hypothesis or prove that its wrong. This means that we cannot guarantee that there is no linear association between X and Y.

Problem 3

Part A.

Based on a p-value of \(0.025 < \alpha = 0.05\), we reject the null hypothesis that \(H_0: \beta_1 = 0\).

b_0 <- 10
b_1 <- 0.56
n <- 18
x_bar <- 8
sxx <- 56
sse <- 22
mse <-11/8 #SSE/n-2 found in the last hw 



s_b1 <- sqrt(mse)/sqrt(sxx)
s_b1 # Standard deviation on b1

## [1] 0.1566958

t <- (b_1 - 0)/s_b1
t # Test statistic

## [1] 3.573804

p<- 2*pt(-abs(t), df=n-2)
p #P value of

## [1] 0.002535736

Part B.

The 99% confidence interval is \(0.56 \pm 0.4576\), which is \((0.102358, 1.01764)\). This means we are 99% confident that \(\beta_1\) is within the constructed interval, i.e, if this experiment were to be repeated N times, where N is large enough, \(\beta_1\) is in approximately 99% of the intervals.

a <-0.01

m <- qt(1-(a/2), n-2) #Multiplier
#CI = b_1 +/- m*s_b1

b_1 + m*s_b1

## [1] 1.017674

b_1 - m*s_b1

## [1] 0.1023258

Part C.

The 95% confidence interval is 10 +/- 2.721266, which is \((7.2787, 12.72127)\). This means we are 95% confident that \(\beta_0\) is within the constructed interval, i.e, if this experiment were to be repeated N times, where N is large enough, \(\beta_0\) is in approximately 95% of the intervals.

x_n <- 8

inner <- (1/n) + ((x_n)^2/sxx)

s_b0 <- sqrt(mse)*(sqrt(inner))
s_b0

## [1] 1.283673

s_b0 <-sqrt(mse*(1/n + (x_n^2)/sxx))
s_b0

## [1] 1.283673

a<-0.05

m <- qt(1-(a/2), n-2) #Multiplier
m*s_b0

## [1] 2.721266

#CI = b_1 +/- m*s_b1

b_0 + m*s_b0

## [1] 12.72127

b_0 - m*s_b0

## [1] 7.278734

Problem 4

Part A.

The 99% confidence interval is between \((0.0053856,0.07226864)\). This means we are 99% confident that \(\beta_1\) is within the constructed interval, i.e, if this experiment were to be repeated N times, where N is large enough, \(\beta_1\) is in approximately 99% of the intervals.

This interval does not contain 0. The admissions director might be interested in whether the confidence interval includes 0, because that may suggest that the difference is not signifcant. If ACT scores have no impact on a student’s GPA, they do not have to list a minimum requirement and can increase the amount of applicants.

knitr::opts_chunk$set(echo = TRUE)
Data<-read.table("CH01PR19.txt")
head(Data)

##      V1 V2
## 1 3.897 21
## 2 3.885 14
## 3 3.778 28
## 4 2.540 22
## 5 3.028 21
## 6 3.865 31

n <- nrow(Data)
X <- Data[,2]
Y <- Data[,1]
Xbar <- mean(X)
Ybar <- mean(Y)
b1 <- sum((X-Xbar)*(Y-Ybar))/((n-1)*var(X)) #Formula
b0 <- Ybar-b1*Xbar #Formula

Yhat <- b0+b1*X
Res <- Y-Yhat
Res[1:5] # First five residuals

## [1]  0.96758105  1.22737094  0.57679116 -0.42824608  0.09858105

MSE <- sum(Res^2)/(n-2)

sxx<- sum((X-Xbar)^2)
sxx

## [1] 2379.925

s_b1 <- sqrt(MSE)/sqrt(sxx)
s_b1

## [1] 0.01277302

a <-0.01

m <- qt(1-(a/2), n-2) #Multiplier
m

## [1] 2.618137

#CI = b_1 +/- m*s_b1

b1 + m*s_b1

## [1] 0.07226864

b1 - m*s_b1

## [1] 0.005385614

Part B.

Hypothesis Test:

\(H_0: \beta_1 = 0\)

\(H_a: \beta_1 \neq 0\)

Decision Rule:

If \(\mid T_{H_0}\mid = \frac{\beta_1 - 0}{s(b_1)} > t(1-\frac{\alpha}{2}, n-2)\), then reject \(H_0\).

Conclusion:

At an \(\alpha = 0.01\), we reject the \(H_0: \beta_1 = 0\), as \(\mid T_{H_0}\mid > t\). This means there is sufficient evidence to reject the claim that \(H_0: \beta_1 = 0\). This implies that there exists a significant linear relationship between ACT score and GPA.

t <- (b1 - 0)/s_b1
m

## [1] 2.618137

abs(t)>m #Take the abolute value of t* and verify that it is larger than the multiplier to reject

## [1] TRUE

Part C.

Our p-value of \(0.002916 < \alpha = 0.01\). Another way to reject the \(H_0\) is to verify that \(p < \alpha\). This is true, and supports our conclusion to reject the \(H_0\)

p<- 2*pt(-abs(t), df=n-2)
p

## [1] 0.002916604

Problem 5 Data/Pre-Work

oran <- Orange
head(oran)

##   Tree  age circumference
## 1    1  118            30
## 2    1  484            58
## 3    1  664            87
## 4    1 1004           115
## 5    1 1231           120
## 6    1 1372           142

n <- nrow(oran)
X <- oran[,2] #Age
Y <- oran[,3] #Circumference

Xbar <- mean(X) #Mean of Age
Ybar <- mean(Y) #Mean of Circumference
b1 <- sum((X-Xbar)*(Y-Ybar))/((n-1)*var(X)) #Formula
b0 <- Ybar-b1*Xbar #Formula

Yhat <- b0+b1*X
Res <- Y-Yhat
Res[1:5] # First five residuals

## [1]   0.001451402 -11.076487573  -1.295146086  -9.597056609 -28.833920400

MSE <- sum(Res^2)/(n-2)

Part A

The simple linear regression model for the Orange Data is \(\hat{Y} = 0.10677X + 17.39965\)

Hypothesis Test:

\(H_0: \beta_1 = 0\)

\(H_a: \beta_1 \neq 0\)

Decision Rule:

If \(\mid T_{H_0}\mid = \frac{\beta_1 - 0}{s(b_1)} > t(1-\frac{\alpha}{2}, n-2)\), then reject \(H_0\).

Conclusion:

At an \(\alpha = 0.01\), we reject the \(H_0: \beta_1 = 0\), as

\(\mid T_{H_0}\mid > t\)
\(p-value = 1.930596*10^{-14} < \alpha=0.01\)
Confidence Interval \((0.08415, 0.12939)\) obtained does not contain 0 and we are 99% confident that if this experiment was repeated n times where n is large enough, \(\beta_0\) would lie in this interval

This suggests there is sufficient evidence to reject the claim that \(H_0: \beta_1 = 0\). This implies that their is significant linear relationship between age and circumference.

sxx<- sum((X-Xbar)^2)
s_b1 <- sqrt(MSE)/sqrt(sxx)

a <-0.01

m <- qt(1-(a/2), n-2) #Multiplier
m

## [1] 2.733277

#CI = b_1 +/- m*s_b1

b1 + m*s_b1

## [1] 0.1293926

b1 - m*s_b1

## [1] 0.08414802

t <- (b1 - 0)/s_b1
t

## [1] 12.90023

abs(t)>m #Take the abolute value of t* and verify that it is larger than the multiplier to reject

## [1] TRUE

p<- 2*pt(-abs(t), df=n-2)
p

## [1] 1.930596e-14

Part B.

The largest residual is 46.3103.

The response value of the observation is 140.

The fitted value of this observation is 186.3102.

Res

##  [1]   0.001451402 -11.076487573  -1.295146086  -9.597056609 -28.833920400
##  [6] -21.888536235 -41.310304499   3.001451402  -0.076487573  22.704853914
## [11]  31.402943391  23.166079600  39.111463765  16.689695501   0.001451402
## [16] -18.076487573 -13.295146086 -16.597056609 -33.833920400 -24.888536235
## [21] -46.310304499   2.001451402  -7.076487573  23.704853914  42.402943391
## [26]  30.166079600  45.111463765  27.689695501   0.001451402 -20.076487573
## [31]  -7.295146086   0.402943391  -6.833920400  10.111463765  -9.310304499

max(abs(Res))

## [1] 46.3103

oran[which.max(abs(Res)),]

##    Tree  age circumference
## 21    3 1582           140

Yhat[21]

## [1] 186.3103

STAT 408 HW 2

Kajal Chokshi

9/10/2018

Problem 1

Problem 2

Problem 3

Part A.

Part B.

Part C.

Problem 4

Part A.

Part B.

Part C.

Problem 5 Data/Pre-Work

Part A

Part B.